Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink message list data structure #27

Open
kcr opened this issue May 19, 2014 · 6 comments
Open

Rethink message list data structure #27

kcr opened this issue May 19, 2014 · 6 comments

Comments

@kcr
Copy link
Collaborator

kcr commented May 19, 2014

O(1) (or O(small)) insert at ends
O(1) iteration, startable from key-specified point;
finding the point should be at worst O(Log₂ n)
keep two lists? a tree? trees aren't O(1) insert...
and we know that keys are inserted in order

@kcr
Copy link
Collaborator Author

kcr commented May 19, 2014

If anyone who knows more about data structures than I do would like to make suggestions, I'm all ears.

@nelhage
Copy link
Contributor

nelhage commented May 19, 2014

Talk through the constraints some more, here? How much message list do we maintain locally vs. swap in and out from the server? When we say "find the point" what do we mean? Do messages have (sequential? ordered?) IDs and we have the ID of the point and need to find the actual message object? Can we just hold on to a reference to the point and update that as the point moves around?

Do we need to be able to delete messages?

I perhaps unsurprisingly have some ideas, but they depend somewhat on the design constraints.

@kcr
Copy link
Collaborator Author

kcr commented May 19, 2014

Messages have ordered IDs, but you can "always" find a coordinate between them (probably float seconds since epoch?), the "point" meaning an arbitrary location. The way things are structured you can hold onto a pointer, but you shouldn't necessarily know that the pointer is entirely valid. (You should at least able to find where a message would be.

At the moment, I am assuming that you might end up with all your messages locally. (relaxing that constraint might make it okay to run with deques?)

@nelhage
Copy link
Contributor

nelhage commented May 19, 2014

Hm. I suspect the right answer here is a search tree with really fat nodes (like a B-Tree or similar, but maybe with as many as 10s or 100s of kilobytes of data stored in each node). That'll get you log(n) position-finding, and log(n) inserts at either end in general, but most of them will be O(1) because you're just filling up a leaf node (you'd want to cache a pointer to each of the extreme leaves). You may be able to use the fact that you only insert in order to do even better than a lg(n) rebalance once a node fills up, but I'd have to think harder.

@kcr
Copy link
Collaborator Author

kcr commented May 19, 2014

The other trick, now that I think about it, is that you always have a dividing line between "messages after this points are appended" and "messages before this point are prepended"

@kcr kcr closed this as completed May 19, 2014
@kcr kcr reopened this May 19, 2014
@kcr kcr added this to the meditation milestone May 24, 2014
@asedeno
Copy link

asedeno commented May 13, 2021

I may poke at this later; leaving this here as a reference for myself: http://www.grantjenks.com/docs/sortedcontainers/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants