Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let lighteval support sglang #552

Merged
merged 12 commits into from
Feb 18, 2025
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
title: Add a custom metric
- local: use-vllm-as-backend
title: Use VLLM as backend
- local: use-sglang-as-backend
title: Use SGLang as backend
- local: evaluate-the-model-on-a-server-or-container
title: Evaluate on Server
- local: contributing-to-multilingual-evaluations
Expand Down
3 changes: 3 additions & 0 deletions docs/source/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ Lighteval has optional dependencies that you can install by specifying the
appropriate extras group.
`pip install lighteval[<group>]` or `pip install -e .[<group>]`.

If you want to use lighteval with `sglang`, try to follow [sglang install documentation](https://docs.sglang.ai/start/install.html).

| extra name | description |
|--------------|---------------------------------------------------------------------------|
| tgi | To use Text Generation Inference API to evaluate your model |
Expand All @@ -31,6 +33,7 @@ appropriate extras group.
| adapters | To evaluate adapters models (delta and peft) |
| tensorboardX | To upload your results to tensorboard |
| vllm | To use vllm as backend for inference |
| sglang | To use sglang as backend for inference |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did not modify the pyproject file to reflect this extra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I link sglang document for user to install here. SGLang has some limitation which can't be added by pyproject. If we add sglang = [sglang>=0.4.2.post], pip install lighteval[sglang] can't install dependency correctly. So I hope user can install sglang with sglang document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great ! Well in that case, it's not really an extra and I would not add it to the table

| s3 | To upload results to s3 |


Expand Down
61 changes: 61 additions & 0 deletions docs/source/use-sglang-as-backend.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Use SGLang as backend

Lighteval allows you to use `sglang` as backend allowing great speedups.
To use, simply change the `model_args` to reflect the arguments you want to pass to sglang.

```bash
lighteval sglang \
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \
"leaderboard|truthfulqa:mc|0|0"
```

`sglang` is able to distribute the model across multiple GPUs using data
parallelism and tensor parallelism.
You can choose the parallelism method by setting in the the `model_args`.

For example if you have 4 GPUs you can split it across using `tp_size`:

```bash
lighteval sglang \
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,tp_size=4" \
"leaderboard|truthfulqa:mc|0|0"
```

Or, if your model fits on a single GPU, you can use `dp_size` to speed up the evaluation:

```bash
lighteval sglang \
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,dp_size=4" \
"leaderboard|truthfulqa:mc|0|0"
```

## Use a config file

For more advanced configurations, you can use a config file for the model.
An example of a config file is shown below and can be found at `examples/model_configs/sglang_model_config.yaml`.

```bash
lighteval sglang \
"examples/model_configs/sglang_model_config.yaml" \
"leaderboard|truthfulqa:mc|0|0"
```

```yaml
model: # Model specific parameters
base_params:
model_args: "pretrained=HuggingFaceTB/SmolLM-1.7B,dtype=float16,chunked_prefill_size=4096,mem_fraction_static=0.9" # Model args that you would pass in the command line
generation: # Generation specific parameters
temperature: 0.3
repetition_penalty: 1.0
frequency_penalty: 0.0
presence_penalty: 0.0
top_k: -1
min_p: 0.0
top_p: 0.9
max_new_tokens: 256
stop_tokens: ["<EOS>", "<PAD>"]
```

> [!WARNING]
> In the case of OOM issues, you might need to reduce the context size of the
> model as well as reduce the `mem_fraction_static` and `chunked_prefill_size` parameter.
13 changes: 13 additions & 0 deletions examples/model_configs/sglang_model_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
model:
base_params:
model_args: "pretrained=HuggingFaceTB/SmolLM-1.7B,dtype=float16,chunked_prefill_size=4096,mem_fraction_static=0.9"
generation:
temperature: 0.3
repetition_penalty: 1.0
frequency_penalty: 0.0
presence_penalty: 0.0
top_k: -1
min_p: 0.0
top_p: 0.9
max_new_tokens: 256
stop_tokens: ["<EOS>", "<PAD>"]
2 changes: 2 additions & 0 deletions src/lighteval/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import lighteval.main_baseline
import lighteval.main_endpoint
import lighteval.main_nanotron
import lighteval.main_sglang
import lighteval.main_tasks
import lighteval.main_vllm

Expand Down Expand Up @@ -64,6 +65,7 @@
app.command(rich_help_panel="Evaluation Utils")(lighteval.main_baseline.baseline)
app.command(rich_help_panel="Evaluation Backends")(lighteval.main_nanotron.nanotron)
app.command(rich_help_panel="Evaluation Backends")(lighteval.main_vllm.vllm)
app.command(rich_help_panel="Evaluation Backends")(lighteval.main_sglang.sglang)
app.add_typer(
lighteval.main_endpoint.app,
name="endpoint",
Expand Down
159 changes: 159 additions & 0 deletions src/lighteval/main_sglang.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# MIT License

# Copyright (c) 2024 The SGLang Team

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import os
from typing import Optional

from typer import Argument, Option
from typing_extensions import Annotated


TOKEN = os.getenv("HF_TOKEN")
CACHE_DIR: str = os.getenv("HF_HOME", "/scratch")

HELP_PANEL_NAME_1 = "Common Parameters"
HELP_PANEL_NAME_2 = "Logging Parameters"
HELP_PANEL_NAME_3 = "Debug Parameters"
HELP_PANEL_NAME_4 = "Modeling Parameters"


def sglang(
# === general ===
model_args: Annotated[
str,
Argument(
help="Model arguments in the form key1=value1,key2=value2,... or path to yaml config file (see examples/model_configs/transformers_model.yaml)"
),
],
tasks: Annotated[str, Argument(help="Comma-separated list of tasks to evaluate on.")],
# === Common parameters ===
use_chat_template: Annotated[
bool, Option(help="Use chat template for evaluation.", rich_help_panel=HELP_PANEL_NAME_4)
] = False,
system_prompt: Annotated[
Optional[str], Option(help="Use system prompt for evaluation.", rich_help_panel=HELP_PANEL_NAME_4)
] = None,
dataset_loading_processes: Annotated[
int, Option(help="Number of processes to use for dataset loading.", rich_help_panel=HELP_PANEL_NAME_1)
] = 1,
custom_tasks: Annotated[
Optional[str], Option(help="Path to custom tasks directory.", rich_help_panel=HELP_PANEL_NAME_1)
] = None,
cache_dir: Annotated[
str, Option(help="Cache directory for datasets and models.", rich_help_panel=HELP_PANEL_NAME_1)
] = CACHE_DIR,
num_fewshot_seeds: Annotated[
int, Option(help="Number of seeds to use for few-shot evaluation.", rich_help_panel=HELP_PANEL_NAME_1)
] = 1,
load_responses_from_details_date_id: Annotated[
Optional[str], Option(help="Load responses from details directory.", rich_help_panel=HELP_PANEL_NAME_1)
] = None,
# === saving ===
output_dir: Annotated[
str, Option(help="Output directory for evaluation results.", rich_help_panel=HELP_PANEL_NAME_2)
] = "results",
push_to_hub: Annotated[
bool, Option(help="Push results to the huggingface hub.", rich_help_panel=HELP_PANEL_NAME_2)
] = False,
push_to_tensorboard: Annotated[
bool, Option(help="Push results to tensorboard.", rich_help_panel=HELP_PANEL_NAME_2)
] = False,
public_run: Annotated[
bool, Option(help="Push results and details to a public repo.", rich_help_panel=HELP_PANEL_NAME_2)
] = False,
results_org: Annotated[
Optional[str], Option(help="Organization to push results to.", rich_help_panel=HELP_PANEL_NAME_2)
] = None,
save_details: Annotated[
bool, Option(help="Save detailed, sample per sample, results.", rich_help_panel=HELP_PANEL_NAME_2)
] = False,
# === debug ===
max_samples: Annotated[
Optional[int], Option(help="Maximum number of samples to evaluate on.", rich_help_panel=HELP_PANEL_NAME_3)
] = None,
job_id: Annotated[
int, Option(help="Optional job id for future reference.", rich_help_panel=HELP_PANEL_NAME_3)
] = 0,
):
"""
Evaluate models using vllm as backend.
"""
import yaml

from lighteval.logging.evaluation_tracker import EvaluationTracker
from lighteval.models.model_input import GenerationParameters
from lighteval.models.sglang.sglang_model import SGLangModelConfig
from lighteval.pipeline import EnvConfig, ParallelismManager, Pipeline, PipelineParameters

TOKEN = os.getenv("HF_TOKEN")

env_config = EnvConfig(token=TOKEN, cache_dir=cache_dir)

evaluation_tracker = EvaluationTracker(
output_dir=output_dir,
save_details=save_details,
push_to_hub=push_to_hub,
push_to_tensorboard=push_to_tensorboard,
public=public_run,
hub_results_org=results_org,
)

pipeline_params = PipelineParameters(
launcher_type=ParallelismManager.SGLANG,
env_config=env_config,
job_id=job_id,
dataset_loading_processes=dataset_loading_processes,
custom_tasks_directory=custom_tasks,
override_batch_size=-1,
num_fewshot_seeds=num_fewshot_seeds,
max_samples=max_samples,
use_chat_template=use_chat_template,
system_prompt=system_prompt,
load_responses_from_details_date_id=load_responses_from_details_date_id,
)

if model_args.endswith(".yaml"):
with open(model_args, "r") as f:
config = yaml.safe_load(f)["model"]
model_args = config["base_params"]["model_args"]
generation_parameters = GenerationParameters.from_dict(config)
else:
generation_parameters = GenerationParameters()

model_args_dict: dict = {k.split("=")[0]: k.split("=")[1] if "=" in k else True for k in model_args.split(",")}
model_config = SGLangModelConfig(**model_args_dict, generation_parameters=generation_parameters)

pipeline = Pipeline(
tasks=tasks,
pipeline_parameters=pipeline_params,
evaluation_tracker=evaluation_tracker,
model_config=model_config,
)

pipeline.evaluate()

pipeline.show_results()

results = pipeline.get_results()

pipeline.save_and_push_results()

return results
37 changes: 26 additions & 11 deletions src/lighteval/models/model_input.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,20 @@
@dataclass
class GenerationParameters:
early_stopping: Optional[bool] = None # vllm, transformers
repetition_penalty: Optional[float] = None # vllm, transformers, tgi
frequency_penalty: Optional[float] = None # vllm, tgi
repetition_penalty: Optional[float] = None # vllm, transformers, tgi, sglang
frequency_penalty: Optional[float] = None # vllm, tgi, sglang
length_penalty: Optional[float] = None # vllm, transformers
presence_penalty: Optional[float] = None # vllm
presence_penalty: Optional[float] = None # vllm, sglang

max_new_tokens: Optional[int] = None # vllm, transformers, tgi, litellm
min_new_tokens: Optional[int] = None # vllm, transformers
max_new_tokens: Optional[int] = None # vllm, transformers, tgi, litellm, sglang
min_new_tokens: Optional[int] = None # vllm, transformers, sglang

seed: Optional[int] = None # vllm, tgi litellm
stop_tokens: Optional[list[str]] = None # vllm, transformers, tgi, litellm
temperature: Optional[float] = None # vllm, transformers, tgi, litellm
top_k: Optional[int] = None # vllm, transformers, tgi
min_p: Optional[float] = None # vllm, transformers
top_p: Optional[int] = None # vllm, transformers, tgi, litellm
seed: Optional[int] = None # vllm, tgi, litellm
stop_tokens: Optional[list[str]] = None # vllm, transformers, tgi, litellm, sglang
temperature: Optional[float] = None # vllm, transformers, tgi, litellm, sglang
top_k: Optional[int] = None # vllm, transformers, tgi, sglang
min_p: Optional[float] = None # vllm, transformers, sglang
top_p: Optional[int] = None # vllm, transformers, tgi, litellm, sglang
truncate_prompt: Optional[bool] = None # vllm, tgi

@classmethod
Expand Down Expand Up @@ -154,3 +154,18 @@ def to_tgi_ie_dict(self) -> dict:
"truncate": self.truncate_prompt,
}
return {k: v for k, v in args.items() if v is not None}

def to_sglang_dict(self) -> dict:
args = {
"max_new_tokens": self.max_new_tokens,
"temperature": self.temperature,
"stop": self.stop_tokens,
"top_p": self.top_p,
"top_k": self.top_k,
"min_p": self.min_p,
"frequency_penalty": self.frequency_penalty,
"presence_penalty": self.presence_penalty,
"repetition_penalty": self.repetition_penalty,
"min_new_tokens": self.min_new_tokens,
}
return {k: v for k, v in args.items() if v is not None}
14 changes: 14 additions & 0 deletions src/lighteval/models/model_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,19 @@
from lighteval.models.endpoints.openai_model import OpenAIClient, OpenAIModelConfig
from lighteval.models.endpoints.tgi_model import ModelClient, TGIModelConfig
from lighteval.models.litellm_model import LiteLLMClient, LiteLLMModelConfig
from lighteval.models.sglang.sglang_model import SGLangModel, SGLangModelConfig
from lighteval.models.transformers.adapter_model import AdapterModel, AdapterModelConfig
from lighteval.models.transformers.delta_model import DeltaModel, DeltaModelConfig
from lighteval.models.transformers.transformers_model import TransformersModel, TransformersModelConfig
from lighteval.models.vllm.vllm_model import VLLMModel, VLLMModelConfig
from lighteval.utils.imports import (
NO_LITELLM_ERROR_MSG,
NO_SGLANG_ERROR_MSG,
NO_TGI_ERROR_MSG,
NO_VLLM_ERROR_MSG,
is_litellm_available,
is_openai_available,
is_sglang_available,
is_tgi_available,
is_vllm_available,
)
Expand All @@ -62,6 +65,7 @@ def load_model( # noqa: C901
VLLMModelConfig,
OpenAIModelConfig,
LiteLLMModelConfig,
SGLangModelConfig,
],
env_config: EnvConfig,
) -> Union[TransformersModel, AdapterModel, DeltaModel, ModelClient, DummyModel]:
Expand Down Expand Up @@ -96,6 +100,9 @@ def load_model( # noqa: C901
if isinstance(config, VLLMModelConfig):
return load_model_with_accelerate_or_default(config=config, env_config=env_config)

if isinstance(config, SGLangModelConfig):
return load_sglang_model(config=config, env_config=env_config)

if isinstance(config, OpenAIModelConfig):
return load_openai_model(config=config, env_config=env_config)

Expand Down Expand Up @@ -159,3 +166,10 @@ def load_model_with_accelerate_or_default(

def load_dummy_model(config: DummyModelConfig, env_config: EnvConfig):
return DummyModel(config=config, env_config=env_config)


def load_sglang_model(config: SGLangModelConfig, env_config: EnvConfig):
if not is_sglang_available():
raise ImportError(NO_SGLANG_ERROR_MSG)

return SGLangModel(config=config, env_config=env_config)
Loading