-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrate zmq #403
integrate zmq #403
Conversation
Seems like a great addition, @aniketmaurya 🎉! |
thanks for taking a look @bhimrazy! yeah, zmq seem to improve the process communication time for sending the prediction results to the main process. This is gonna be very useful especially while serving streaming tokens from LLMs. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #403 +/- ##
===================================
- Coverage 89% 88% -1%
===================================
Files 30 30
Lines 1893 1976 +83
===================================
+ Hits 1683 1734 +51
- Misses 210 242 +32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good in general, but not abstracting away the interprocess communication mechanism makes the code more complex (you have a socket rather than a queue so that has more opaque semantics), also a lot of details are guarded by if .. else
conditions
I would take this opportunity to create a base class that takes care of the communication, and two concrete classes. You can then instantiate them in LitServer based on the one you want and you don't have to deal with conditionals and keep the code easily consumable.
Clear semantics is particularly important for people implementing loops, we need to keep it simple.
agree with your points @lantiga, I would put these inside the put_response method which can be used everywhere else. I will also create an encapsulation to hide these socket details for the next PR which would enable zmq for multiple workers. |
creating followup as per above suggestion. |
What does this PR do?
Before submitting
Faster process communication with
zmq
.TODO: A follow up PR to add a proxy and support multiple inference worker processes and multiple uvicorn processes.
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃