Skip to content

saucam/model_evals

Repository files navigation

Model Eval results

openllm Benchmark results

Model Details Average arc gsm8k hellaswag mmlu truthfulqa winogrande
Arithmo-Wizard-2-7B complete result 65.22 62.88 61.26 83.15 60.61 45.9 77.51
Meta-Llama-3-8B-Instruct complete result 68.27 62.2 75.59 78.84 65.82 51.71 75.45
Meta-Llama-3-8B complete result 62.84 57.51 50.87 82.09 65.04 43.93 77.58
Nereus-7B complete result 63.81 62.54 46.25 83.23 59.6 54.32 76.95
Orpomis-Prime-7B-dare complete result 67.37 64.68 59.74 85.12 62.21 53.72 78.77
Orpomis-Prime-7B-it complete result 55.67 61.26 24.34 79.61 51.55 43.68 73.56
Orpomis-Prime-7B complete result 55.27 60.67 24.41 79.12 52.43 41.02 73.95
PowerBot-8B complete result 67.62 60.92 69.07 83.24 66.62 45.11 80.74
Proteus-8B complete result 70.67 63.48 78.77 82.94 64.71 56.71 77.43
Saga-8B complete result 52.29 51.11 24.18 74.1 50.65 41.19 72.53
aqua-smaug-0.2-8B complete result 65.52 60.15 59.44 82.65 65.51 47.28 78.06
aqua-smaug-0.3-8B complete result 69.73 62.37 76.19 83.02 66.0 53.7 77.11
aqua-smaug-hermes-8B complete result 69.96 61.77 79.61 82.25 64.96 55.11 76.09
llama-airo-3 complete result 66.52 61.01 56.33 82.42 64.79 56.35 78.22
mera-mix-4x7B complete result 76.59 71.76 72.93 88.92 63.8 77.6 84.53
mistral-orpo-beta-NeuralBeagle14-7B-dare-ties complete result 70.06 67.32 61.79 85.89 54.17 81.14

nous Benchmark results

Model Details Average agieval bigbench gpt4all truthfulqa
Arithmo-Wizard-2-7B complete result 46.28 31.58 37.44 70.2 45.91
Nereus-7B complete result 52.12 42.8 39.17 72.21 54.32
Orpomis-Prime-7B-dare complete result 52.42 42.71 39.82 73.42 53.72
Orpomis-Prime-7B-it complete result 47.98 37.23 38.72 72.28 43.68
Orpomis-Prime-7B complete result 46.78 36.4 37.5 72.18 41.02
bert-tiny-finetuned-sms-spam-detection complete result 33.97 22.95 28.75 36.07 48.09
llama-airo-3 complete result 51.1 36.59 39.26 72.24 56.3

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages