Matches sentence count

Test: Write N of X

Avg. Score
84.2%
Scenarios
5

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 3 Flash (Preview)100.0%$0.00193.2s100%
2Mistral Large 3100.0%$0.00136.2s100%
3Mistral Small 3.2 24B100.0%$0.00038.7s100%
4Llama 3.1 Nemotron 70B100.0%$0.000712.6s100%
5GPT-5 Mini100.0%$0.002210.8s100%
6Qwen 3.5 Plus (2026-02-15)100.0%$0.001414.5s100%
7GPT-5.299.8%$0.00818.8s99%
8GPT-5100.0%$0.009516.2s100%
9o4 Mini High99.9%$0.008118.2s99%
10Z.AI GLM 4.7 Flash100.0%$0.000935.2s100%
11GPT-5.199.8%$0.009514.1s98%
12Stealth: Aurora Alpha97.4%5.1s84%
13MoonshotAI: Kimi K2.5100.0%$0.004531.5s100%
14Mistral Small Creative96.8%$0.00032.6s72%
15ByteDance Seed 1.6100.0%$0.003636.9s100%
16Mistral Medium 3.197.2%$0.00148.1s71%
17DeepSeek V3 (2025-03-24)98.0%$0.000615.7s72%
18o4 Mini98.4%$0.006314.9s80%
19Gemini 2.5 Pro99.4%$0.01613.2s94%
20Grok 4.1 Fast95.6%$0.00055.6s64%
21DeepSeek-V2 Chat97.9%$0.000320.5s72%
22Claude Opus 4.699.9%$0.02112.9s99%
23Gemma 3 27B95.4%$0.000312.6s69%
24Gemma 3 4B95.0%$0.00015.0s61%
25Z.AI GLM 5100.0%$0.007448.6s100%
26GPT-5 Nano98.0%$0.001026.0s72%
27Ministral 3 14B92.1%$0.00043.7s57%
28Gemini 3 Pro (Preview)100.0%$0.02516.6s100%
29Claude Opus 4.595.9%$0.0178.5s67%
30Z.AI GLM 4.697.7%$0.003047.4s72%
31Claude Sonnet 4.694.7%$0.0118.7s59%
32Grok 4 Fast89.9%$0.00054.2s40%
33Claude Sonnet 4.592.0%$0.0107.5s56%
34DeepSeek V3.292.1%$0.000521.2s49%
35Z.AI GLM 4.7100.0%$0.00401.5m100%
36Gemini 3.1 Pro (Preview)100.0%$0.03429.2s100%
37DeepSeek V3 (2024-12-26)88.6%$0.000912.2s41%
38Llama 3.1 8B85.6%$0.00041.7s33%
39GPT-4.186.0%$0.00415.8s32%
40Claude 3.5 Sonnet93.2%$0.01038.4s54%
41DeepSeek V3.183.3%$0.00096.9s27%
42ByteDance Seed 1.6 Flash82.3%$0.00058.3s26%
43GPT-4.1 Mini79.4%$0.00085.1s28%
44Llama 3.1 70B80.5%$0.00173.1s26%
45Grok 488.0%$0.01115.0s35%
46Gemma 3 12B80.1%$0.00029.4s23%
47Claude Sonnet 484.2%$0.01210.0s31%
48Hermes 3 405B80.0%$0.000018.7s24%
49Writer: Palmyra X579.9%$0.00328.9s20%
50Ministral 3 8B74.1%$0.00033.0s19%
51Ministral 3 3B74.8%$0.00021.6s15%
52Gemini 2.5 Flash Lite73.3%$0.00031.7s18%
53GPT-4.1 Nano75.2%$0.00036.8s19%
54Claude 3.5 Haiku76.8%$0.00256.2s20%
55Qwen 3.5 397B A17B100.0%$0.0172.0m100%
56Z.AI GLM 4.574.8%$0.00128.5s16%
57GPT-4o, Aug. 6th (temp=1)74.2%$0.00603.2s18%
58GPT-4o, Aug. 6th (temp=0)74.0%$0.00583.1s18%
59Minimax M2.575.6%$0.001419.1s15%
60Claude Haiku 4.570.3%$0.00314.3s14%
61Gemini 2.5 Flash64.8%$0.00112.2s16%
62Qwen 2.5 72B66.7%$0.00076.3s11%
63Ministral 3B62.2%$0.00012.2s9%
64GPT-4o Mini (temp=0)79.5%$0.000453.1s20%
65GPT-4o Mini (temp=1)79.6%$0.000454.0s20%
66Ministral 8B60.4%$0.00022.8s9%
67WizardLM 2 8x22b71.5%$0.001738.3s14%
68Claude 3.7 Sonnet66.3%$0.0108.4s12%
69GPT-4o, May 13th (temp=0)71.8%$0.01126.0s17%
70Arcee AI: Trinity Large (Preview)57.2%$0.00005.1s5%
71Hermes 3 70B58.1%$0.00076.7s6%
72GPT-4o, May 13th (temp=1)74.0%$0.01133.0s17%
73Cohere Command R+ (Aug. 2024)60.4%$0.00594.3s6%
74Mistral Large 258.8%$0.00454.2s4%
75Claude 3 Haiku61.9%$0.000836.4s9%
76Mistral NeMO47.6%$0.00033.6s3%
77Rocinante 12B53.1%$0.000624.0s5%
78Arcee AI: Trinity Mini44.0%$0.00024.6s0%
79Claude Opus 480.2%$0.05418.2s24%
80Mistral Large63.8%$0.02052.3s12%
84.24%

Individual Scenarios

sentences

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Minimax M2.5100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100090.0%
Rocinante 12B1001001001001009898920078.9%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Minimax M2.5100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
WizardLM 2 8x22b10010010010010010010010010010099.9%
Llama 3.1 8B1001001001001001001001001009899.8%
Ministral 3B1001001001001001001001001009899.8%
Ministral 8B100100100100100100100100989899.7%
GPT-5.2100100100100100100100100989899.7%
Rocinante 12B100100100100100100989877287.6%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)10010010010010010010010010010099.9%
ByteDance Seed 1.6 Flash10010010010010010010010010010099.9%
Gemini 3 Flash (Preview)10010010010010010010010010010099.9%
Gemma 3 12B10010010010010010010010010010099.9%
Gemma 3 27B10010010010010010010010010010099.9%
DeepSeek V3 (2025-03-24)1001001001001001001001001009899.8%
GPT-5.11001001001001001001001001009899.8%
o4 Mini1001001001001001001001001009899.8%
Writer: Palmyra X51001001001001001001001001009899.8%
GPT-5.21001001001001001001001001009899.8%
GPT-4o, May 13th (temp=1)1001001001001001001001001009899.8%
Gemma 3 4B1001001001001001001001001009899.8%
Mistral Large 31001001001001001001001001009899.8%
Ministral 3 3B1001001001001001001001001009899.8%
Ministral 3 14B100100100100100100100100989899.6%
Mistral Medium 3.110010010010010010010098989899.5%
DeepSeek-V2 Chat1001001001001001009898989899.4%
Claude Sonnet 4.51001001001001001001001001009299.2%
Llama 3.1 70B1001001001001001001001001009299.2%
Minimax M2.510010010010010010010098989298.9%
Stealth: Aurora Alpha1001001001001001009898989298.7%
Llama 3.1 8B100100100100100100100100929298.4%
Qwen 2.5 72B10098989898989892929296.7%
Cohere Command R+ (Aug. 2024)10098989898989292777793.1%
Mistral Large 2100100989898929292777792.7%
Mistral Large10098989292929277777789.8%
DeepSeek V3 (2024-12-26)10010010010010010010010092289.4%
WizardLM 2 8x22b100100100989898989877987.8%
Ministral 3 8B10098989898989892542786.3%
Arcee AI: Trinity Large (Preview)10098929277777777777784.7%
Gemini 2.5 Flash Lite10010010010010098929254083.6%
Hermes 3 405B989892929292777777280.0%
Hermes 3 70B10010010010092927754542779.6%
Claude 3 Haiku10010010010077777754542776.7%
Claude 3.7 Sonnet9892929277775454542771.8%
Ministral 3B1001009898927777270067.1%
Rocinante 12B10010092927727200049.1%
Ministral 8B9277777754272790044.2%
Gemini 2.5 Flash989277542727922038.9%
Mistral NeMO777754545427999937.9%
Arcee AI: Trinity Mini100980000000019.8%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
Gemma 3 4B10010010010010010010010010010099.9%
o4 Mini High1001001001001001001001001009899.8%
Claude Opus 4.6100100100100100100100100989899.7%
Grok 4 Fast100100100100100100100100989899.6%
GPT-5.21001001001001001009898989899.4%
GPT-5.11001001001001001001001001009299.2%
Z.AI GLM 4.61001001001001001009898989298.7%
GPT-4o Mini (temp=1)1001001001001001009898929298.1%
GPT-4o Mini (temp=0)100100989898989898929297.5%
Claude Sonnet 4.5100100989898989892929296.9%
Claude Opus 41001001001001001009892777794.5%
Stealth: Aurora Alpha10010010010010010010092925493.8%
GPT-4.1100100100100100100989898990.4%
Claude Sonnet 410010010010010098989898990.3%
Llama 3.1 8B100100100100100989898772790.0%
DeepSeek V3.11001001001001001001009898089.7%
ByteDance Seed 1.6 Flash100100100100100100989892088.9%
Grok 4.1 Fast100100100100100100100100542788.0%
DeepSeek V3.21001001001001001001009877287.7%
Mistral Medium 3.110010010010010098989277086.6%
Gemma 3 27B1001001001009898989254985.0%
Mistral Small Creative100100100989898929254984.2%
Claude 3.5 Haiku10010010010098929277542784.1%
DeepSeek V3 (2024-12-26)10010010010098989292272783.6%
Grok 41001001001001001001001000080.0%
Writer: Palmyra X51001001001001001001001000080.0%
Claude 3.5 Sonnet1001001001001001001002727976.3%
Gemma 3 12B10010010010010010098542075.3%
Ministral 3 8B100100100100927777779974.2%
Llama 3.1 70B10010098929292775427273.5%
GPT-4o, Aug. 6th (temp=1)10010098927777775427971.3%
GPT-4o, May 13th (temp=1)100100100989277545427270.4%
GPT-4o, Aug. 6th (temp=0)10098989892775427272770.0%
Hermes 3 405B10010010010010010010000070.0%
WizardLM 2 8x22b1001009898929277279269.7%
GPT-4.1 Mini10010092929277542727066.2%
Z.AI GLM 4.510010010010098777790066.2%
Ministral 3 3B100100100100100925422064.9%
Ministral 3 14B100929277775454549060.9%
GPT-4o, May 13th (temp=0)9892929254542727272759.2%
Minimax M2.51001001001009892000059.1%
Claude Haiku 4.598927777772727279051.4%
Gemini 2.5 Flash9292545454272799041.8%
Rocinante 12B1001009254540000039.9%
Qwen 2.5 72B989854542727900036.8%
Ministral 3B100987727279000034.0%
GPT-4.1 Nano10077545499200030.4%
Claude 3 Haiku100100100000000030.0%
Mistral Large925454272727920029.2%
Ministral 8B10098542700000027.9%
Gemini 2.5 Flash Lite10010054200000025.5%
Hermes 3 70B10092000000011.1%
Cohere Command R+ (Aug. 2024)779220000009.0%
Mistral Large 292200000001.2%
Arcee AI: Trinity Large (Preview)92000000001.1%
Claude 3.7 Sonnet00000000000.0%
Mistral NeMO00000000000.0%
Arcee AI: Trinity Mini00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)1001001001001001001001001009899.8%
Mistral Small 3.2 24B1001001001001001001001001009899.8%
o4 Mini High100100100100100100100100989899.7%
Gemini 2.5 Pro1001001001001001009898987797.2%
Stealth: Aurora Alpha100100100100100989892777794.4%
Gemma 3 27B10010010010010010010092775492.3%
o4 Mini100100100100100100100100922791.9%
DeepSeek-V2 Chat100100100100100100100100100090.0%
Grok 4.1 Fast100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
GPT-5 Nano100100100100100100100100100090.0%
Claude 3.5 Sonnet100100100100100100100100100089.9%
Z.AI GLM 4.610010010010010010010010098089.8%
Claude Opus 4.5100100100989898925454079.4%
Gemma 3 4B10010010010010010092549075.5%
Claude Sonnet 4.61001009898989292540073.3%
DeepSeek V3.2100100100100100100100270072.7%
DeepSeek V3 (2024-12-26)1001001001001001009800069.8%
Claude Sonnet 4.598989898777754279063.8%
Grok 4100100100100100100000060.0%
Claude 3.7 Sonnet100100100100100100000059.9%
Gemini 2.5 Flash Lite10098989277545420057.5%
Hermes 3 405B1001001001001000000050.0%
Grok 4 Fast1001001001001000000050.0%
GPT-4.1 Nano10010010077770000045.5%
Gemini 2.5 Flash1001009898279000043.3%
Llama 3.1 8B10010010010000000040.0%
Ministral 8B10010010010000000040.0%
GPT-4.1100100989800000039.7%
GPT-4.1 Mini1001005427270000030.8%
Claude Sonnet 410092772790000030.6%
Llama 3.1 70B100100100000000030.0%
DeepSeek V3.1989277200000027.0%
Gemma 3 12B989854000000025.0%
ByteDance Seed 1.6 Flash100989990000022.6%
Minimax M2.51001000000000020.0%
Writer: Palmyra X5100980000000019.8%
Rocinante 12B9290000000010.1%
Ministral 3 8B10000000000010.0%
Ministral 3B10000000000010.0%
Ministral 3 3B922200000009.6%
Z.AI GLM 4.5772000000007.9%
Claude Opus 4549200000006.4%
Claude 3 Haiku270000000002.7%
Claude 3.5 Haiku20000000000.2%
GPT-4o Mini (temp=1)00000000000.0%
Arcee AI: Trinity Mini00000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Claude Haiku 4.500000000000.0%
Cohere Command R+ (Aug. 2024)00000000000.0%
WizardLM 2 8x22b00000000000.0%
GPT-4o, May 13th (temp=1)00000000000.0%
Mistral NeMO00000000000.0%
GPT-4o, May 13th (temp=0)00000000000.0%
Mistral Large00000000000.0%
Qwen 2.5 72B00000000000.0%
GPT-4o, Aug. 6th (temp=1)00000000000.0%
GPT-4o, Aug. 6th (temp=0)00000000000.0%
Mistral Large 200000000000.0%
Arcee AI: Trinity Large (Preview)00000000000.0%
Hermes 3 70B00000000000.0%