Rewrite rules implementation for LLaMA-2/ LLaMA-3 #1811

kobby-kobbs · 2024-08-15T22:53:50Z

Summary

This PR introduces the implementation of LLaMA 3 and LLaMA 2 rewrite rules for the MLP and LLaMAAttention layers in transformers. The rules are designed to work with transformer versions 4.39 to 4.42, and they handle the optimization and fusion operations.

Key Changes

MLP RewriteRule:

A new rewrite rule for optimizing the LLaMA MLP layer (LlamaMLP) in transformer versions 4.39 to 4.42.
The optimization includes handling different input sizes (5 or 6) and performing matrix multiplication and activation operations to produce an optimized output.

GQA Llama RewriteRule:

Introduces a rewrite rule for the LLaMAAttention layer as well as the first attention (LlamaAttention) with support for specified number of inputs.
Two methods are implemented for handling 2D and 4D cache configurations during the Group Query Attention (GQA) process, enabling optimized matrix multiplication and attention operations.

codecov · 2024-08-15T22:56:32Z

Codecov Report

Attention: Patch coverage is 25.39062% with 191 lines in your changes missing coverage. Please review.

Project coverage is 73.50%. Comparing base (4c3a6be) to head (03ba9e9).
Report is 88 commits behind head on main.

Files	Patch %	Lines
...er/onnxruntime/transformers/multihead_attention.py	21.62%	174 Missing ⚠️
onnxscript/rewriter/function_rule.py	56.00%	10 Missing and 1 partial ⚠️
...ipt/rewriter/onnxruntime/transformers/layernorm.py	37.50%	5 Missing ⚠️
onnxscript/ir/_core.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1811      +/-   ##
==========================================
- Coverage   75.95%   73.50%   -2.45%     
==========================================
  Files         228      248      +20     
  Lines       24246    26893    +2647     
  Branches     4201     4915     +714     
==========================================
+ Hits        18416    19768    +1352     
- Misses       5035     6161    +1126     
- Partials      795      964     +169

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

initializer.py

onnxscript/rewriter/onnxruntime/transformers/multihead_attention.py

testingg.py

+    onnx_model = onnx.load(output_model_path, load_external_data=False)
+
+    # Apply the inliner
+    onnx_model = onnx.inliner.inline_local_functions(onnx_model)


justinchuby · 2024-08-15T23:00:46Z

initializer.py

@@ -0,0 +1,231 @@
+


I suggest excluding this file for now. We can focus on the rewriter rules for this PR.

github-advanced-security

lintrunner found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

testingg.py

justinchuby · 2024-08-16T00:17:35Z

Congrats on your first PR! 🎉 For autofix-able lint errors, you can follow https://github.com/microsoft/onnxscript#coding-style to run the autofix.

kobby-kobbs added 18 commits June 12, 2024 02:25

adding a rewriter test

230dd68

testing for rewrite rules application

d12336b

final update to testing script, use older model versions

d861af0

updating testing

7270b0a

updated testing

2657017

adding some updates to GQA

e6dd74a

new rules for llamaMLP and layernorm

0d33872

changes

e905e24

updates to attention sigmoid

c19eff2

added initializer

55a23b0

fixed mlp further

dbf22fa

making changes

77c130c

pre PR staging

03126ef

pre PR staging

60f66c8

final commit changes for PR review

9db6016

final commit changes for PR review

523795c

fixing errors for PR

5cac16b

fixing errors in PR

e598f39

github-advanced-security bot found potential problems Aug 15, 2024

View reviewed changes

justinchuby reviewed Aug 15, 2024

View reviewed changes

github-advanced-security bot found potential problems Aug 15, 2024

View reviewed changes

fixed errors in PR comments

2771a40

kobby-kobbs force-pushed the emmanuel_test branch from 09bb592 to 2771a40 Compare August 15, 2024 23:33

github-advanced-security bot found potential problems Aug 15, 2024

View reviewed changes

testingg.py Fixed Show fixed Hide fixed

fixed errors in PR comment

03ba9e9

kobby-kobbs force-pushed the emmanuel_test branch from 7894534 to 03ba9e9 Compare August 15, 2024 23:44

justinchuby added the topic: rewriter label Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite rules implementation for LLaMA-2/ LLaMA-3 #1811

Rewrite rules implementation for LLaMA-2/ LLaMA-3 #1811

kobby-kobbs commented Aug 15, 2024

codecov bot commented Aug 15, 2024 •

edited

Loading

justinchuby Aug 15, 2024

kobby-kobbs Aug 15, 2024

github-advanced-security bot left a comment

justinchuby commented Aug 16, 2024

Rewrite rules implementation for LLaMA-2/ LLaMA-3 #1811

Are you sure you want to change the base?

Rewrite rules implementation for LLaMA-2/ LLaMA-3 #1811

Conversation

kobby-kobbs commented Aug 15, 2024

Summary

Key Changes

codecov bot commented Aug 15, 2024 • edited Loading

Codecov Report

justinchuby Aug 15, 2024

Choose a reason for hiding this comment

kobby-kobbs Aug 15, 2024

Choose a reason for hiding this comment

github-advanced-security bot left a comment

Choose a reason for hiding this comment

justinchuby commented Aug 16, 2024

codecov bot commented Aug 15, 2024 •

edited

Loading