-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Deployment of Llama3.1-70b getting struck #2724
Comments
use latest version |
@zhyncs Currently using LMDEPLOY_VERSION=0.6.2 |
We tried latest version .. for 1 TP , we are getting cuda out of memory error .. We observed that when we did 2 TP , memory from 2nd GPU was not getting used . Please suggest . +-----------------------------------------------------------------------------------------+ |
May upgrade to v6.2.0.post1. |
`Fetching 42 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [00:00<00:00, 4813.00it/s] { [TM][INFO] TM_FUSE_SILU_ACT=1 [TM][INFO] [LlamaWeight::prepare] workspace size: 469762048 [WARNING] gemm_config.in is not found; using default GEMM algo
|
Any luck anyone? |
So you get the log just by starting the server without sending any requests? This is more likely to be caused by the bug in v0.6.2 (instead of v0.6.2.post1) |
Checklist
Describe the bug
We are trying to deploy llama3.1-70b on GCP with below specs .
GPU - 2 x NVIDIA A100 80GB
Machine Type - a2-ultragpu-2g (350GB Ram)
SSD - 2TB
Command we tried for deployment : lmdeploy serve api_server meta-llama/Llama-3.1-70B-Instruct --tp 2
During deployment , we get struck at
Fetching 42 files: 100%|████████████████████████████████████████████████████████████████████| 42/42 [00:00<00:00, 12190.21it/s]
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
It gets struck here for hours without any other error . We checked GPu , CPU usage as well .Please suggest
$ free -g total used free shared buff/cache available Mem: 334 0 200 0 133 330 Swap: 0 0 0
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100-SXM4-80GB On | 00000000:00:05.0 Off | 0 | | N/A 35C P0 94W / 400W | 76649MiB / 81920MiB | 100% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA A100-SXM4-80GB On | 00000000:00:06.0 Off | 0 | | N/A 34C P0 69W / 400W | 80733MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+
Reproduction
lmdeploy serve api_server meta-llama/Llama-3.1-70B-Instruct --tp 2
Environment
Error traceback
The text was updated successfully, but these errors were encountered: