-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: transition event loop to max live iters soft threshold #7701
Conversation
Signed-off-by: Matthew Fala <[email protected]>
Network Event DropSummaryToday, met with Leonardo to discuss his proposed solution to network related issues that this PR would resolve. Symptoms of Network Event DropIn testing, Leonardo sent a large amount of data Fluent Bit and noticed that some coroutines fail to be woken up after timeout is hit. He noticed that increasing FLB_ENGINE_LOOP_MAX_ITER resolves the problem:
Hypothesis of Network Coro Drop Problem Cause -- From Leonardo, addressed indirectly by this PREvey 1.5 seconds the a network timeout detection function is called possibly triggering Line 893 in 10274c9
followed by: Line 895 in 10274c9
In usual circumstances this causes the following to happen:
In the problematic case the following will occur (Note, only steps 4 and 5 reverse)
Analysis of the Cause
If the order switches and we free the event memory before processing it, then we may process a corrupted event. Solution Recommendation - By LeonardoOld responsibilities:
Solution: migrate the connection event deletion responsibility from the Engine to the co-routine (after handling timeouts). New responsibilities:
FeasibilityAssessed
Out of these, Read/write is the hardest to handle in terms of network errors:
NotesThis solution may not fully resolve the event drop issue. This PR can server as an alternate solution. PR Modification notes:It would be best to keep the priority event loop performance gains by ensuring that new tasks and flush events are not triggered prematurely before all events are processed. This can be done by allowing all events to be guaranteed to be processed before the end of an event loop cycle, except very low priority events. Leonardo notes that this convolutes the already complicated event loop and network solution alternatives should be fully explored first before resorting to event loop changes. ActionsWait |
As per our previous conversation I have opened PR #7728 which addresses the root cause of the issue and I'd appreciate it if you took a look at it. |
Alternate solution which changes the network code to support the existing system is merged: This PR is no longer needed |
Issue Flagged by Leonardo and previously spotted by team.
Problem
Event loop possibly does not process all injected events before end of an event loop cycle because of max iters causing cleanup to prematurely occur.
Old No Priority System
Before priority events were added, one mk_event_wait was called and all events were processed. Clean up starts after all events processed
Current Priority System
Now mk_event_wait_2, non-blocking, is called many times to pick up new events of higher priority. In one round of processing, if more than max iter events are processed the loop will break and the rest of the events will be processed in the next cycle. This is a problem. All picked up events should be finished processed within one cycle of the event loop to finish remaining work.
Problems with Current System
This is mainly a problem for Injected events related to networking. Injected events are used to squeeze some work into the scheduler before any of the cleanup functions run (network related cleanup functions, timeout related iirc). Capping the amount of iters can corrupt this model where the work is pushed after the cleanup functions which means it will be too late.
Solution & Updated System
Sloppy/Soft Prioritization
The solution is a simple one. We switch from a hard max iteration limit to a soft max iteration limit.
After reaching some limit on the event loop, the event loop stops picking up newly triggered events, but continues to process all already triggered events. It continues to accept incoming injected events.
The loop ends when there are no more events to be processed, meaning that there could be a few more cycles to complete the remaining already accepted work before exiting the event loop and moving on to the cleanup functions.
How the Solution Helps
This means that all accepted and injected work will be completed before any of the cleanup functions are run. Thus resolving problems.
Side Effects
Sloppy Prioritization
This change breaks the priority event loop abstraction slightly. If some low priority events are triggered after the live iteration limit has been reached, the priority event loop will not admit this work for the current round of processing but will continue to process lower priority events before the next round where high priority events are admitted.
The maintainers should carefully consider this.
Guarantees
Non-Guarantees
Other Options
Strict Prioritization via Event 0 Event Processing
A way to trade off is to make all injected events of Priority 0 and to only process priority 0 events after pausing event intake on the event loop.
How does this help: Since only top priority events are processed we GUARANTEE that events are processed in order of priority. However we also guarantee that high priority work admitted to the event loop or injected will be processed.
This may be a good compromise, so long as all injected events or work that must be completed in the round will be of Priority 0.
Guarantees
Non Guarantees
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.