Support to identify accepted tokens when speculative decoding is enabled #172

MerrillLi · 2025-02-20T09:59:51Z

Problem: Currently, when using the API, it's difficult to identify whether the accepted tokens are generated through speculative decoding with a draft model. It would be helpful to have a way to track or flag when tokens are coming from speculative decoding so users can have more visibility and control over the model's output, especially when debugging or validating outputs.

Request: I would like to request the ability to identify whether tokens are generated by speculative decoding in the response when speculative decoding is enabled. Specifically, it would be useful if an additional flag or metadata field could be added to the API response to indicate that the token was generated by a draft model during speculative decoding.

Possible implementation:
Add a new field in the token response, such as is_speculative_decoding or generated_by_draft_model, which would return true or false.
Provide clear documentation on interpreting this field and when speculative decoding occurs. This would be a valuable addition to the model's API for users looking to better monitor and understand the underlying processes.

Thank you for considering this request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to identify accepted tokens when speculative decoding is enabled #172

Support to identify accepted tokens when speculative decoding is enabled #172

MerrillLi commented Feb 20, 2025

Support to identify accepted tokens when speculative decoding is enabled #172

Support to identify accepted tokens when speculative decoding is enabled #172

Comments

MerrillLi commented Feb 20, 2025