Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let lighteval support sglang #552

Merged
merged 12 commits into from
Feb 18, 2025
Merged

Let lighteval support sglang #552

merged 12 commits into from
Feb 18, 2025

Conversation

Jayon02
Copy link
Contributor

@Jayon02 Jayon02 commented Feb 12, 2025

You can use sglang in lighteval tasks

lighteval sglang \ "pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \ "helm|bigbench:bbq_lite_json:age_disambig|0|0"

src/lighteval/pipeline.py Outdated Show resolved Hide resolved
src/lighteval/main_sglang.py Outdated Show resolved Hide resolved
src/lighteval/main_sglang.py Show resolved Hide resolved
src/lighteval/models/model_input.py Outdated Show resolved Hide resolved
src/lighteval/models/model_loader.py Outdated Show resolved Hide resolved
src/lighteval/models/sglang/sglang_model.py Outdated Show resolved Hide resolved
src/lighteval/models/sglang/sglang_model.py Outdated Show resolved Hide resolved
src/lighteval/models/sglang/sglang_model.py Outdated Show resolved Hide resolved
src/lighteval/models/sglang/sglang_model.py Outdated Show resolved Hide resolved
src/lighteval/models/sglang/sglang_model.py Show resolved Hide resolved
src/lighteval/models/sglang/sglang_model.py Outdated Show resolved Hide resolved
src/lighteval/models/sglang/sglang_model.py Outdated Show resolved Hide resolved
src/lighteval/pipeline.py Outdated Show resolved Hide resolved
commit 132290b
Author: Jayon02 <[email protected]>
Date:   Sat Feb 15 11:08:24 2025 +0000

    modify document

commit 601a755
Author: Jayon02 <[email protected]>
Date:   Sat Feb 15 10:22:43 2025 +0000

    pass pre commit check and modify document

commit 3e1fb88
Author: qiujiang chen <[email protected]>
Date:   Sat Feb 15 06:59:12 2025 +0000

    optimize input, adjust precision

commit 1a59076
Author: qiujiang chen <[email protected]>
Date:   Thu Feb 13 19:51:22 2025 +0000

    text files

commit 9dc62b7
Author: qiujiang chen <[email protected]>
Date:   Wed Feb 12 14:08:21 2025 +0000

    modify format
@Jayon02
Copy link
Contributor Author

Jayon02 commented Feb 15, 2025

  1. I have addressed all the issues mentioned above. I apologize for the problems that occurred during my development process.
  2. I add sglang as a potential backend for lighteval, offering high accuracy and efficiency.
  3. I provide an initial document describing how to use sglang in lighteval after installing sglang in your device .
  4. Here is an example of how to run lighteval with sglang, including adding model arguments and sampling arguments.
lighteval sglang \
"pretrained=Qwen/Qwen2-7B,dtype=float16,tp_size=2,generation_parameters={temperature: 1.0}" \
"leaderboard|truthfulqa:mc|0|0"

Please let me know if there is anything that could be improved. Thanks to all SGLang team members for their help.

@Jayon02
Copy link
Contributor Author

Jayon02 commented Feb 15, 2025

I conduct some experiments to compare the metrics of lighteval using sglang and vllm.

The experimental setup is as follows:

NVIDIA A100
Linux 5.15.0-126-generic
lighteval                         0.6.0.dev0
python                            3.13.0
torch                             2.5.1
vllm                              0.7.2
sglang                            0.4.2.post4
sgl-kernel                        0.0.3.post3
flashinfer-python                 0.2.0.post2+cu124torch2.5

However, I find that the parameter temperature is set to 1.0 in some vllm tests. This parameter increases the uncertainty of the model's generated results. Here are some results and t means temperature below:

  metric sglang value (t = 1.0) vllm value (t = 1.0) sglang value (t = 0.0) vllm value (t = 0.0)
helm:truthfulqa em 0.5306 0.5046 0.6346 0.6346
  qem 0.5367 0.5122 0.6346 0.6346
  pem 0.5321 0.5061 0.6346 0.6346
  pqem 0.6223 0.6070 0.7080 0.7080
  acc 0.2584 0.2584 0.2584 0.2584
helm:siqa em 0.1479 0.1428 0.1018 0.1008
  qem 0.1535 0.1515 0.1018 0.1008
  pem 0.1515 0.1469 0.1018 0.1008
  pqem 0.4135 0.4186 0.3649 0.3644
harness:bbh:hyperbaton em 0.092 0.080 0 0
  qem 0.108 0.108 0 0
  pem 0.504 0.420 0.824 0.824
  pqem 0.780 0.740 0.968 0.968
  perfect_em 0 0 0 0
helm:mmlu em 0.6373 0.6475 0.7110 0.7111
  qem 0.6375 0.6475 0.7110 0.7111
  pem 0.6373 0.6475 0.7110 0.7111
  pqem 0.7245 0.7339 0.7832 0.7832

I find that when temperature = 0.0, the difference between sglang and vllm is very small. I also independently launch vllm and sglang backend without lighteval. When temperature = 1.0, for the same input, the results generated by each backend are different each time. I observe that each backend selects one output from multiple available options, and both backends have the same set of possible outputs. When evaluating, I think we should consider all options, not just one. The results in the table when temperature = 1.0 is inaccurate。

In other experiments that only require using loglikelihood to compute metrics, such as lighteval:logiqa, and leaderboard:truthfulqa:mc, temperature is set to 0.0 and the results are nearly the same for both sglang and vllm.

I believe that for model evaluation, a more deterministic parameter setting should be chosen to better assess the model. So I think temperature = 0.0 is a more reasonable setting.

I am very eager to know your thoughts on the consideration of this parameter setting.

@zhaochenyang20
Copy link

@Jayon02 great work!

@@ -31,6 +33,7 @@ appropriate extras group.
| adapters | To evaluate adapters models (delta and peft) |
| tensorboardX | To upload your results to tensorboard |
| vllm | To use vllm as backend for inference |
| sglang | To use sglang as backend for inference |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did not modify the pyproject file to reflect this extra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I link sglang document for user to install here. SGLang has some limitation which can't be added by pyproject. If we add sglang = [sglang>=0.4.2.post], pip install lighteval[sglang] can't install dependency correctly. So I hope user can install sglang with sglang document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great ! Well in that case, it's not really an extra and I would not add it to the table

# Would we rather truncate the prompt to allow generation to go to max_new_tokens, at the risk
# of losing some meaning, or have some generations that are exceedingly short?
# The choice we go for here is to avoid truncating the prompt if we can, since it
# should have been managed by the prompt creator/few shot manager if requested by the user.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes, even in 0 shots the prompt is too long, then I think we should truncate the prompt and allow generation, since we also evaluate model on context length

Copy link
Member

@NathanHB NathanHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jayon02 Great work on this !! Thanks a lot for the contribution, I think the PR is good to go, you only need to dress a few nits I mentioned above :)

@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zhaochenyang20
Copy link

@Jayon02 This PR is great. Could you resolve / close the conversation if you solved it?

commit 1035aa5
Author: Jayon02 <[email protected]>
Date:   Tue Feb 18 01:31:21 2025 +0000

    modify document and fix bug

commit be58c7c
Author: Jayon02 <[email protected]>
Date:   Mon Feb 17 14:35:03 2025 +0000

    modify toml

commit 86e41c9
Merge: 132290b 50f3695
Author: Jayon02 <[email protected]>
Date:   Sun Feb 16 01:30:17 2025 +0000

    Merge branch 'main' into sglang

commit 132290b
Author: Jayon02 <[email protected]>
Date:   Sat Feb 15 11:08:24 2025 +0000

    modify document

commit 601a755
Author: Jayon02 <[email protected]>
Date:   Sat Feb 15 10:22:43 2025 +0000

    pass pre commit check and modify document

commit 3e1fb88
Author: qiujiang chen <[email protected]>
Date:   Sat Feb 15 06:59:12 2025 +0000

    optimize input, adjust precision

commit 1a59076
Author: qiujiang chen <[email protected]>
Date:   Thu Feb 13 19:51:22 2025 +0000

    text files

commit 9dc62b7
Author: qiujiang chen <[email protected]>
Date:   Wed Feb 12 14:08:21 2025 +0000

    modify format
@zhaochenyang20
Copy link

@Jayon02 @NathanHB This PR is approved. When can we merge it?

@NathanHB
Copy link
Member

Waiting for tests and should be ready !

@NathanHB NathanHB merged commit 086cf90 into huggingface:main Feb 18, 2025
3 checks passed
@zhaochenyang20
Copy link

@Jayon02 Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants