-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add extended task for LiveCodeBench codegeneration #548
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi ! Thanks for the PR. |
Hi @plaguss @NathanHB will it be possible to run this eval without needing a YAML file? The reason I ask is that all of our codebases assume one can run Also, perhaps we can speed this up dramatically by using |
Hi Lewis, I coulnd't find a way of passing the generation parameters in the CLI, which seem relevant for this model. I can update the code to pass them through ARGS (it should be here unless there's already a better way, @NathanHB?) NEW: lighteval vllm \
"pretrained=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B,dtype=float16,data_parallel_size=4,max_model_length=32768,gpu_memory_utilisation=0.8,generation_parameters={temperature:0.7,top_p:5}" \
"extended|lcb:codegeneration|0|0" \
--custom-tasks src/lighteval/tasks/extended/lcb/main.py \
--output-dir $OUTPUT_DIR \
--save-details Now we could read the generation parameters from the model args following this pattern, let me know what you both think.
Sure, I run it with |
The 32B is still running due to an error, but the other values can be found here:
|
Great ! thanks for adding a way to pass generation params as args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this ! The results look great ! I was only wondering about dynamically changing the metric config at runtime, and if you could add some docs !
Otherwise ready to merge :)
@@ -134,9 +134,11 @@ def vllm( | |||
with open(model_args, "r") as f: | |||
config = yaml.safe_load(f)["model"] | |||
model_args = config["base_params"]["model_args"] | |||
metric_options = config.get("metric_options", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some docs for this ?
src/lighteval/pipeline.py
Outdated
if metric_data := self._metric_options.get(metric.metric_name, None): | ||
num_samples = metric_data.get("num_samples", None) | ||
if num_samples: | ||
task.num_samples.append(num_samples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has this been tested ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, it had 2 bugs in fact, thanks! now works as expected:
for metric in task.metrics:
if metric_data := self._metric_options.get(metric.metric_name, None):
num_samples = metric_data.get("num_samples", None)
if num_samples:
task.num_samples = [num_samples]
Adds a new extended task to run LiveCodeBench's codegeneration subset.
The results for
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
:with the yaml file like so:
Note: This is just an idea, not sure it's the best approach.
Additionally it adds a way of updating the number of samples required to run a metric via the yaml file:
Under the
metric_options
, an entry can be added with themetric_name
to be updated. It would just work withnum_samples
, but defined like this it shouldn't need more updates. Otherwise, thenum_samples
can be informed using themetric_name
.