Model | Details | Average | arc | gsm8k | hellaswag | mmlu | truthfulqa | winogrande |
---|---|---|---|---|---|---|---|---|
Arithmo-Wizard-2-7B | complete result | 65.22 | 62.88 | 61.26 | 83.15 | 60.61 | 45.9 | 77.51 |
Meta-Llama-3-8B-Instruct | complete result | 68.27 | 62.2 | 75.59 | 78.84 | 65.82 | 51.71 | 75.45 |
Meta-Llama-3-8B | complete result | 62.84 | 57.51 | 50.87 | 82.09 | 65.04 | 43.93 | 77.58 |
Nereus-7B | complete result | 63.81 | 62.54 | 46.25 | 83.23 | 59.6 | 54.32 | 76.95 |
Orpomis-Prime-7B-dare | complete result | 67.37 | 64.68 | 59.74 | 85.12 | 62.21 | 53.72 | 78.77 |
Orpomis-Prime-7B-it | complete result | 55.67 | 61.26 | 24.34 | 79.61 | 51.55 | 43.68 | 73.56 |
Orpomis-Prime-7B | complete result | 55.27 | 60.67 | 24.41 | 79.12 | 52.43 | 41.02 | 73.95 |
PowerBot-8B | complete result | 67.62 | 60.92 | 69.07 | 83.24 | 66.62 | 45.11 | 80.74 |
Proteus-8B | complete result | 70.67 | 63.48 | 78.77 | 82.94 | 64.71 | 56.71 | 77.43 |
Saga-8B | complete result | 52.29 | 51.11 | 24.18 | 74.1 | 50.65 | 41.19 | 72.53 |
aqua-smaug-0.2-8B | complete result | 65.52 | 60.15 | 59.44 | 82.65 | 65.51 | 47.28 | 78.06 |
aqua-smaug-0.3-8B | complete result | 69.73 | 62.37 | 76.19 | 83.02 | 66.0 | 53.7 | 77.11 |
aqua-smaug-hermes-8B | complete result | 69.96 | 61.77 | 79.61 | 82.25 | 64.96 | 55.11 | 76.09 |
llama-airo-3 | complete result | 66.52 | 61.01 | 56.33 | 82.42 | 64.79 | 56.35 | 78.22 |
mera-mix-4x7B | complete result | 76.59 | 71.76 | 72.93 | 88.92 | 63.8 | 77.6 | 84.53 |
mistral-orpo-beta-NeuralBeagle14-7B-dare-ties | complete result | 70.06 | 67.32 | 61.79 | 85.89 | 54.17 | 81.14 |
Model | Details | Average | agieval | bigbench | gpt4all | truthfulqa |
---|---|---|---|---|---|---|
Arithmo-Wizard-2-7B | complete result | 46.28 | 31.58 | 37.44 | 70.2 | 45.91 |
Nereus-7B | complete result | 52.12 | 42.8 | 39.17 | 72.21 | 54.32 |
Orpomis-Prime-7B-dare | complete result | 52.42 | 42.71 | 39.82 | 73.42 | 53.72 |
Orpomis-Prime-7B-it | complete result | 47.98 | 37.23 | 38.72 | 72.28 | 43.68 |
Orpomis-Prime-7B | complete result | 46.78 | 36.4 | 37.5 | 72.18 | 41.02 |
bert-tiny-finetuned-sms-spam-detection | complete result | 33.97 | 22.95 | 28.75 | 36.07 | 48.09 |
llama-airo-3 | complete result | 51.1 | 36.59 | 39.26 | 72.24 | 56.3 |