Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] Fix race in cluster pool #16931

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jblomer
Copy link
Contributor

@jblomer jblomer commented Nov 13, 2024

Fixes #16936

Also extends the RandomAccess unit test to increase the chance of hitting race conditions (x10 more random requests).

Copy link

github-actions bot commented Nov 13, 2024

Test Results

    19 files      19 suites   4d 12h 35m 15s ⏱️
 2 677 tests  2 676 ✅ 0 💤 1 ❌
48 946 runs  48 945 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit c118bff.

♻️ This comment has been updated with latest results.

@jblomer
Copy link
Contributor Author

jblomer commented Nov 14, 2024

Still something strange on mac14. Could be a memory leak. Investigating.

@jblomer
Copy link
Contributor Author

jblomer commented Nov 14, 2024

Doesn't seem to be a leak but just excessive memory consumption with the access pattern. That's a separate issue to address. It's most likely the page pool that stores, multiple times, the same uncompressed pages.

Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe add some explanation to the last commit's message why fIsExpired is not needed (anymore)? Is it because with the removal of the background unzipping earlier this year, the loaded clusters are directly given back to the main thread?

The fIsExpired flag was used to indicate to the I/O thread that a loaded
cluster is not needed anymore. This was useful at the time when the I/O
thread also took care of the parallel decompression. In this way, we could
avoid useless decompression of clusters.

However, now the parallel decompression is anyway triggered by the main
thread. The loop over fInFlightCluster in RClusterPool::GetCluster()
already avoids decompression if a cluster was expired.

Hence, the out-of-band signal from the main thread to the I/O thread is
not needed anymore.
@jblomer
Copy link
Contributor Author

jblomer commented Nov 14, 2024

Can you maybe add some explanation to the last commit's message why fIsExpired is not needed (anymore)? Is it because with the removal of the background unzipping earlier this year, the loaded clusters are directly given back to the main thread?

Sure, and yes. Updated the commit message to

The fIsExpired flag was used to indicate to the I/O thread that a loaded
cluster is not needed anymore. This was useful at the time when the I/O
thread also took care of the parallel decompression. In this way, we could
avoid useless decompression of clusters.

However, now the parallel decompression is anyway triggered by the main
thread. The loop over fInFlightCluster in RClusterPool::GetCluster()
already avoids decompression if a cluster was expired.

Hence, the out-of-band signal from the main thread to the I/O thread is
not needed anymore.

Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ntuple] RClusterPool can crash on non-existing cluster
2 participants