Request for LLM evaluation codes #18

2proveit · 2025-01-14T07:14:08Z

Thanks for your great work in enhancing LLMs' reasoning ability.
I tried to reproduce the result you posted in the technical report, but I observed a gap between my result and the result on paper, could you please open your evaluation code and some settings of your inference prompts.

EliverQ · 2025-01-26T11:29:53Z

Hello! We appreciate your interest.
You can find our inference prompts at this link, and our evaluation codes are available here. It's important to note that in the code for STILL-3-1.5B-Preview, we utilized a sampling decoding setup with a sampling temperature of 0.6 and a top-p sampling probability of 0.95. Each question was sampled 64 times, and the average score was computed. To reproduce our results in STILL-2, however, we recommend using a greedy search.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for LLM evaluation codes #18

Request for LLM evaluation codes #18

2proveit commented Jan 14, 2025

EliverQ commented Jan 26, 2025

Request for LLM evaluation codes #18

Request for LLM evaluation codes #18

Comments

2proveit commented Jan 14, 2025

EliverQ commented Jan 26, 2025