Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Profiling][Model][Doc] Support Llama3-8B and 70B on A100s #22

Merged
merged 6 commits into from
Jul 24, 2024

Commits on Jul 24, 2024

  1. Merged PR 1873: Support Llama3 8B and 70B for 32k context length on a…

    …100_pairwise_nvlink
    
    # Changelog
    
    * Support Llama3 8B and 70B https://llama.meta.com/llama3/
    * Max supported context length is 32k, only on 4xA100.
    * Pipeline parallel is not profiled yet for more than 4k.
    * Attention profiling enhancements:
    ** Reduce number of input combinations by removing those batches which require more kv cache blocks than available GPU memory.
    nitinkedia7 committed Jul 24, 2024
    Configuration menu
    Copy the full SHA
    1478250 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    41ab8ab View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2f92e46 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    8350006 View commit details
    Browse the repository at this point in the history
  5. format

    nitinkedia7 committed Jul 24, 2024
    Configuration menu
    Copy the full SHA
    4867101 View commit details
    Browse the repository at this point in the history
  6. minor

    nitinkedia7 committed Jul 24, 2024
    Configuration menu
    Copy the full SHA
    2649f64 View commit details
    Browse the repository at this point in the history