Rethink message list data structure #27

kcr · 2014-05-19T04:38:47Z

O(1) (or O(small)) insert at ends
O(1) iteration, startable from key-specified point;
finding the point should be at worst O(Log₂ n)
keep two lists? a tree? trees aren't O(1) insert...
and we know that keys are inserted in order

kcr · 2014-05-19T15:44:21Z

If anyone who knows more about data structures than I do would like to make suggestions, I'm all ears.

nelhage · 2014-05-19T15:55:18Z

Talk through the constraints some more, here? How much message list do we maintain locally vs. swap in and out from the server? When we say "find the point" what do we mean? Do messages have (sequential? ordered?) IDs and we have the ID of the point and need to find the actual message object? Can we just hold on to a reference to the point and update that as the point moves around?

Do we need to be able to delete messages?

I perhaps unsurprisingly have some ideas, but they depend somewhat on the design constraints.

kcr · 2014-05-19T17:29:24Z

Messages have ordered IDs, but you can "always" find a coordinate between them (probably float seconds since epoch?), the "point" meaning an arbitrary location. The way things are structured you can hold onto a pointer, but you shouldn't necessarily know that the pointer is entirely valid. (You should at least able to find where a message would be.

At the moment, I am assuming that you might end up with all your messages locally. (relaxing that constraint might make it okay to run with deques?)

nelhage · 2014-05-19T18:14:57Z

Hm. I suspect the right answer here is a search tree with really fat nodes (like a B-Tree or similar, but maybe with as many as 10s or 100s of kilobytes of data stored in each node). That'll get you log(n) position-finding, and log(n) inserts at either end in general, but most of them will be O(1) because you're just filling up a leaf node (you'd want to cache a pointer to each of the extreme leaves). You may be able to use the fact that you only insert in order to do even better than a lg(n) rebalance once a node fills up, but I'd have to think harder.

kcr · 2014-05-19T18:39:06Z

The other trick, now that I think about it, is that you always have a dividing line between "messages after this points are appended" and "messages before this point are prepended"

asedeno · 2021-05-13T14:35:52Z

I may poke at this later; leaving this here as a reference for myself: http://www.grantjenks.com/docs/sortedcontainers/

kcr added enhancement labels May 19, 2014

kcr added the help wanted label May 19, 2014

kcr closed this as completed May 19, 2014

kcr reopened this May 19, 2014

kcr added this to the meditation milestone May 24, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink message list data structure #27

Rethink message list data structure #27

kcr commented May 19, 2014

kcr commented May 19, 2014

nelhage commented May 19, 2014

kcr commented May 19, 2014

nelhage commented May 19, 2014

kcr commented May 19, 2014

asedeno commented May 13, 2021

Rethink message list data structure #27

Rethink message list data structure #27

Comments

kcr commented May 19, 2014

kcr commented May 19, 2014

nelhage commented May 19, 2014

kcr commented May 19, 2014

nelhage commented May 19, 2014

kcr commented May 19, 2014

asedeno commented May 13, 2021