You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem: Currently, when using the API, it's difficult to identify whether the accepted tokens are generated through speculative decoding with a draft model. It would be helpful to have a way to track or flag when tokens are coming from speculative decoding so users can have more visibility and control over the model's output, especially when debugging or validating outputs.
Request: I would like to request the ability to identify whether tokens are generated by speculative decoding in the response when speculative decoding is enabled. Specifically, it would be useful if an additional flag or metadata field could be added to the API response to indicate that the token was generated by a draft model during speculative decoding.
Possible implementation:
Add a new field in the token response, such as is_speculative_decoding or generated_by_draft_model, which would return true or false.
Provide clear documentation on interpreting this field and when speculative decoding occurs. This would be a valuable addition to the model's API for users looking to better monitor and understand the underlying processes.
Thank you for considering this request.
The text was updated successfully, but these errors were encountered:
Problem: Currently, when using the API, it's difficult to identify whether the accepted tokens are generated through speculative decoding with a draft model. It would be helpful to have a way to track or flag when tokens are coming from speculative decoding so users can have more visibility and control over the model's output, especially when debugging or validating outputs.
Request: I would like to request the ability to identify whether tokens are generated by speculative decoding in the response when speculative decoding is enabled. Specifically, it would be useful if an additional flag or metadata field could be added to the API response to indicate that the token was generated by a draft model during speculative decoding.
Possible implementation:
Add a new field in the token response, such as is_speculative_decoding or generated_by_draft_model, which would return true or false.
Provide clear documentation on interpreting this field and when speculative decoding occurs. This would be a valuable addition to the model's API for users looking to better monitor and understand the underlying processes.
Thank you for considering this request.
The text was updated successfully, but these errors were encountered: