Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataloader: fix for large queries and run_isolated #4748

Merged
merged 5 commits into from
Dec 27, 2023

Conversation

rmosolgo
Copy link
Owner

@rmosolgo rmosolgo commented Dec 23, 2023

Addressing the issue from #4745 (comment) and #4747

TODO:

  • Backport to 2.1.10

@rmosolgo rmosolgo added this to the 2.2.2 milestone Dec 23, 2023
@rmosolgo
Copy link
Owner Author

rmosolgo commented Dec 23, 2023

My mistake was removing the checks for finished = ... in #4743 .

I needed to remove those because previously, suspended fibers were being left behind by Dataloader. (They would return true, be identified as finished, then ignored by Dataloader... but since they were still .alive?, Rails would continue to count them as living owners of ActiveRecord connections. But since they were suspended, they would never terminate, and then they'd hog those database connections.)

So I removed the check for finished = ... and continued running the fibers until they terminated. This worked for small queries, when the total number of created fibers is small, but it turns out to create mayhem when there is a large number of fibers.

The problem is that, when a .transfered fiber returns, control is not given to the fiber which .transfered it -- instead, control is given to the top-level fiber. This means that, in nested Dataloader situations (as when argument values are resolved during a query), control may pass willy-nilly from Fiber to Fiber, resulting in mayhem.

I think this is a bug in Ruby -- wouldn't you expect a fiber's return to transfer back to the Fiber that started it? But I may be wrong. So I opened an issue on the Ruby bug tracker: https://bugs.ruby-lang.org/issues/20081

The problem is, if that bug isn't fixed, then I don't know what to do with .transfer-based Fibers. Maybe I can just give up altogether, and say there are two options:

I see that Ruby 3.3 will have Fiber#kill which would work here (ruby/ruby#7823). I could use that method to manually kill Fibers when I'm done with them -- that way they wouldn't dangle and occupy ActiveRecord connections and they wouldn't return control to an unwanted Fiber.

@rmosolgo
Copy link
Owner Author

I tried Fiber#kill locally and found that it did not solve this problem because it demonstrated the same control flow issue, where .kill would transfer control to the top-level fiber, not the fiber that called it. I opened another bug with Ruby here: https://bugs.ruby-lang.org/issues/20089

@rmosolgo rmosolgo merged commit fe0ea8e into master Dec 27, 2023
3 of 12 checks passed
@rmosolgo rmosolgo deleted the dataloader-large-query-fix branch January 1, 2024 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant