Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate zmq #403

Merged
merged 17 commits into from
Jan 10, 2025
Merged

integrate zmq #403

merged 17 commits into from
Jan 10, 2025

Conversation

aniketmaurya
Copy link
Collaborator

@aniketmaurya aniketmaurya commented Jan 8, 2025

What does this PR do?

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

Faster process communication with zmq.

TODO: A follow up PR to add a proxy and support multiple inference worker processes and multiple uvicorn processes.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@bhimrazy
Copy link
Contributor

bhimrazy commented Jan 8, 2025

Seems like a great addition, @aniketmaurya 🎉!
I remember llamastack also has zmq under the hood.
I'm looking forward to learning more about it and its uses over multiprocessing queues.
😊

@aniketmaurya
Copy link
Collaborator Author

thanks for taking a look @bhimrazy! yeah, zmq seem to improve the process communication time for sending the prediction results to the main process. This is gonna be very useful especially while serving streaming tokens from LLMs.

Copy link

codecov bot commented Jan 9, 2025

Codecov Report

Attention: Patch coverage is 61.06195% with 44 lines in your changes missing coverage. Please review.

Project coverage is 88%. Comparing base (747308a) to head (bf6969c).
Report is 2 commits behind head on main.

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #403   +/-   ##
===================================
- Coverage    89%    88%   -1%     
===================================
  Files        30     30           
  Lines      1893   1976   +83     
===================================
+ Hits       1683   1734   +51     
- Misses      210    242   +32     

Copy link
Collaborator

@lantiga lantiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good in general, but not abstracting away the interprocess communication mechanism makes the code more complex (you have a socket rather than a queue so that has more opaque semantics), also a lot of details are guarded by if .. else conditions

I would take this opportunity to create a base class that takes care of the communication, and two concrete classes. You can then instantiate them in LitServer based on the one you want and you don't have to deal with conditionals and keep the code easily consumable.

Clear semantics is particularly important for people implementing loops, we need to keep it simple.

@aniketmaurya
Copy link
Collaborator Author

aniketmaurya commented Jan 10, 2025

agree with your points @lantiga, I would put these inside the put_response method which can be used everywhere else. I will also create an encapsulation to hide these socket details for the next PR which would enable zmq for multiple workers.

@aniketmaurya aniketmaurya merged commit 43692d4 into main Jan 10, 2025
20 of 21 checks passed
@aniketmaurya aniketmaurya deleted the integrate-zmq branch January 10, 2025 14:00
@aniketmaurya
Copy link
Collaborator Author

creating followup as per above suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants