Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent 'query_pre_attn_scalar' Setting Between 9B and 27B Models #71

Open
kiddj opened this issue Jul 17, 2024 · 2 comments
Open
Assignees
Labels
bug Something isn't working stat:awaiting response Status - Awaiting response from author

Comments

@kiddj
Copy link

kiddj commented Jul 17, 2024

In the recent commit, I have noticed an inconsistency in the configuration of the query_pre_attn_scalar parameter between the 9B and 27B models in this repository.

Specifically:

In the 9B model, query_pre_attn_scalar is not explicitly set and appears to use the default value derived from head_dim (256, not 224 which can be derived by # hidden_size / # attention_heads).
In the 27B model, query_pre_attn_scalar is explicitly set to 144 (# hidden_size / # attention_heads).

Could you please provide some insight into the reasoning behind this difference? Is there a specific rationale for not setting query_pre_attn_scalar in the 9B model while explicitly setting it in the 27B model?

@gustheman
Copy link

This change is to align the model better with the official internal implementation and these new values should be the correct one following the technical report link to the technical report

@kiddj
Copy link
Author

kiddj commented Jul 17, 2024

I'm looking for clarification on why the query_pre_attn_scalar value was changed from 224 (d_model / # heads) to head_dim 256 specifically for the 9B model in the latest commit, while no changes were applied to the 27B model.

(27B model uses d_model / # heads which equals 144 instead of head_dim 128 for query_pre_attn_scalar.)

Could you please direct me to the section of the technical report or documentation where the rationale behind this decision is discussed?

@tilakrayal tilakrayal added stat:awaiting response Status - Awaiting response from author bug Something isn't working labels Jul 19, 2024
@Gopi-Uppari Gopi-Uppari self-assigned this Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stat:awaiting response Status - Awaiting response from author
Projects
None yet
Development

No branches or pull requests

4 participants