-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ntuple] Improve RPagePool #16859
base: master
Are you sure you want to change the base?
[ntuple] Improve RPagePool #16859
Conversation
0350870
to
ce1da4f
Compare
Test Results 18 files 18 suites 4d 4h 10m 19s ⏱️ Results for commit 15b33c0. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Some stylistic comments inline.
It would be interesting to run the limits test, especially Limits_ManyFields
, Limits_ManyPages
, and Limits_ManyPagesOneEntry
. It's possible that this PR addresses the quadratic complexity seen in there.
Instead of mapping all synthezised zero pages to the same memory buffer, use real allocated and zeroed out pages. That makes sure no special logic is required when adding and removing pages to and from the page pool.
Allows for O(1) page lookup when a page is returned to the page pool, instead of the O(n) linear search.
Use a hash map to filter the pages in the page pool by column ID and on-disk type on access.
Co-authored-by: Jonas Hahnfeld <[email protected]>
91e1921
to
15b33c0
Compare
No changes to the "many pages" test. The "many fields" unit test got significantly faster. The overall complexity is still super-linear but much more benign. |
Ah right, now I remember that I had already profiled this before: The "many pages" tests are actually bound by Edit: Hm, the "many pages" tests will also hit the linear loop over the page set in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
FileRaii fileGuard("test_ntuple_limits_manyFields.root"); | ||
|
||
static constexpr int NumFields = 40'000; | ||
static constexpr int NumFields = 100'000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic that we can now process models with 100k fields in reasonable time!
Improve the lookup complexity for pages in the page pool from linear to constant in well-behaved cases, i.e. if there is a small number of pages per column and cluster. Some smaller cleanups around the RPage/RPagePool logic.