Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to identify accepted tokens when speculative decoding is enabled #172

Open
MerrillLi opened this issue Feb 20, 2025 · 0 comments
Open

Comments

@MerrillLi
Copy link

Problem: Currently, when using the API, it's difficult to identify whether the accepted tokens are generated through speculative decoding with a draft model. It would be helpful to have a way to track or flag when tokens are coming from speculative decoding so users can have more visibility and control over the model's output, especially when debugging or validating outputs.

Request: I would like to request the ability to identify whether tokens are generated by speculative decoding in the response when speculative decoding is enabled. Specifically, it would be useful if an additional flag or metadata field could be added to the API response to indicate that the token was generated by a draft model during speculative decoding.

Possible implementation:
Add a new field in the token response, such as is_speculative_decoding or generated_by_draft_model, which would return true or false.
Provide clear documentation on interpreting this field and when speculative decoding occurs. This would be a valuable addition to the model's API for users looking to better monitor and understand the underlying processes.

Thank you for considering this request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant