Inconsistent 'query_pre_attn_scalar' Setting Between 9B and 27B Models #71

kiddj · 2024-07-17T05:23:11Z

In the recent commit, I have noticed an inconsistency in the configuration of the query_pre_attn_scalar parameter between the 9B and 27B models in this repository.

Specifically:

In the 9B model, query_pre_attn_scalar is not explicitly set and appears to use the default value derived from head_dim (256, not 224 which can be derived by # hidden_size / # attention_heads).
In the 27B model, query_pre_attn_scalar is explicitly set to 144 (# hidden_size / # attention_heads).

Could you please provide some insight into the reasoning behind this difference? Is there a specific rationale for not setting query_pre_attn_scalar in the 9B model while explicitly setting it in the 27B model?

The text was updated successfully, but these errors were encountered:

gustheman · 2024-07-17T11:23:35Z

This change is to align the model better with the official internal implementation and these new values should be the correct one following the technical report link to the technical report

kiddj · 2024-07-17T15:25:12Z

I'm looking for clarification on why the query_pre_attn_scalar value was changed from 224 (d_model / # heads) to head_dim 256 specifically for the 9B model in the latest commit, while no changes were applied to the 27B model.

(27B model uses d_model / # heads which equals 144 instead of head_dim 128 for query_pre_attn_scalar.)

Could you please direct me to the section of the technical report or documentation where the rationale behind this decision is discussed?

tilakrayal added stat:awaiting response Status - Awaiting response from author bug Something isn't working labels Jul 19, 2024

Gopi-Uppari self-assigned this Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent 'query_pre_attn_scalar' Setting Between 9B and 27B Models #71

Inconsistent 'query_pre_attn_scalar' Setting Between 9B and 27B Models #71

kiddj commented Jul 17, 2024 •

edited

Loading

gustheman commented Jul 17, 2024

kiddj commented Jul 17, 2024 •

edited

Loading

Inconsistent 'query_pre_attn_scalar' Setting Between 9B and 27B Models #71

Inconsistent 'query_pre_attn_scalar' Setting Between 9B and 27B Models #71

Comments

kiddj commented Jul 17, 2024 • edited Loading

gustheman commented Jul 17, 2024

kiddj commented Jul 17, 2024 • edited Loading

kiddj commented Jul 17, 2024 •

edited

Loading

kiddj commented Jul 17, 2024 •

edited

Loading