Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for LLM evaluation codes #18

Open
2proveit opened this issue Jan 14, 2025 · 1 comment
Open

Request for LLM evaluation codes #18

2proveit opened this issue Jan 14, 2025 · 1 comment

Comments

@2proveit
Copy link

Thanks for your great work in enhancing LLMs' reasoning ability.
I tried to reproduce the result you posted in the technical report, but I observed a gap between my result and the result on paper, could you please open your evaluation code and some settings of your inference prompts.

@EliverQ
Copy link
Contributor

EliverQ commented Jan 26, 2025

Hello! We appreciate your interest.
You can find our inference prompts at this link, and our evaluation codes are available here. It's important to note that in the code for STILL-3-1.5B-Preview, we utilized a sampling decoding setup with a sampling temperature of 0.6 and a top-p sampling probability of 0.95. Each question was sampled 64 times, and the average score was computed. To reproduce our results in STILL-2, however, we recommend using a greedy search.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants