Matches paragraph count

Test: Write N of X

Avg. Score
96.2%
Scenarios
3

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Ministral 3 3B100.0%$0.00021.4s100%
2Gemini 2.5 Flash Lite100.0%$0.00031.3s100%
3Inception Mercury 2100.0%$0.00061.0s100%
4Inception Mercury100.0%$0.00041.5s100%
5Ministral 3 8B100.0%$0.00032.3s100%
6Mistral Small Creative100.0%$0.00032.4s100%
7GPT-5.4 Nano100.0%$0.00061.9s100%
8GPT-5.4 Nano (Reasoning)100.0%$0.00062.0s100%
9GPT-5.4 Nano (Reasoning, Low)100.0%$0.00062.0s100%
10Gemini 3.1 Flash Lite (Preview)100.0%$0.00081.8s100%
11Gemini 3.1 Flash Lite100.0%$0.00081.9s100%
12Gemini 2.5 Flash100.0%$0.00092.1s100%
13Gemini 3.1 Flash Lite (Reasoning)100.0%$0.00082.4s100%
14GPT-5.4 Mini100.0%$0.00151.7s100%
15Ministral 3 14B100.0%$0.00043.6s100%
16Mistral Small 4100.0%$0.00043.6s100%
17Stealth: Aurora Alpha100.0%3.6s100%
18GPT-5.4 Mini (Reasoning, Low)100.0%$0.00171.6s100%
19GPT-4.1 Nano100.0%$0.00024.2s100%
20Mistral Small 3.2 24B100.0%$0.00024.2s100%
21Grok 4 Fast100.0%$0.00054.0s100%
22GPT-5.4 Mini (Reasoning)100.0%$0.00172.2s100%
23GPT-4.1 Mini100.0%$0.00074.1s100%
24DeepSeek V4 Flash (Reasoning)100.0%$0.00025.2s100%
25Grok 4.1 Fast100.0%$0.00054.8s100%
26ByteDance Seed 1.6 Flash100.0%$0.00035.2s100%
27Llama 3.1 70B100.0%$0.00173.0s100%
28Gemini 3 Flash (Preview)100.0%$0.00173.5s100%
29Gemini 2.5 Flash Lite (Reasoning)100.0%$0.00065.4s100%
30Nemotron 3 Super100.0%$0.00006.5s100%
31LFM2 24B100.0%$0.00016.6s100%
32Mistral Small 4 (Reasoning)100.0%$0.00076.3s100%
33Nemotron 3 Nano100.0%$0.00027.3s100%
34Grok 4.20100.0%$0.00243.7s100%
35Gemma 3 12B100.0%$0.00017.7s100%
36Mistral Medium 3.1100.0%$0.00136.6s100%
37Mistral Large 3100.0%$0.00137.1s100%
38Gemma 4 26B100.0%$0.00029.1s100%
39Llama 3.1 Nemotron 70B100.0%$0.00078.7s100%
40Grok 4.20 (Beta, Reasoning)100.0%$0.00412.9s100%
41Qwen3 235B A22B Instruct 2507100.0%$0.00039.7s100%
42GPT-5 Mini100.0%$0.00167.6s100%
43Qwen 2.5 72B100.0%$0.00089.3s100%
44Gemma 3 27B100.0%$0.000210.5s100%
45Z.AI GLM 4.5100.0%$0.00119.1s100%
46Gemini 3.5 Flash (Reasoning, Minimal)100.0%$0.00502.5s100%
47DeepSeek V3 (2024-12-26)100.0%$0.00089.7s100%
48GPT-4.1100.0%$0.00384.8s100%
49Xiaomi MIMO v2.5 Pro100.0%$0.00169.1s100%
50Gemini 2.5 Flash (Reasoning)100.0%$0.00228.2s100%
51Qwen 3 32B100.0%$0.000411.6s100%
52GPT-OSS 120B100.0%$0.000312.3s100%
53Grok 4.20 (Reasoning)100.0%$0.00298.0s100%
54o4 Mini100.0%$0.00317.7s100%
55GPT-5.4100.0%$0.00455.8s100%
56Qwen 3.6 Flash100.0%$0.00289.4s100%
57GPT-5.2100.0%$0.00476.2s100%
58MiniMax M2.5100.0%$0.000913.2s100%
59Writer: Palmyra X5100.0%$0.00309.9s100%
60DeepSeek-V2 Chat100.0%$0.000314.7s100%
61Gemini 3 Flash (Preview, Reasoning)100.0%$0.00438.1s100%
62Z.AI GLM 4.5 Air100.0%$0.000714.3s100%
63GPT-5.1100.0%$0.00457.8s100%
64GPT-5.4 (Reasoning, Low)100.0%$0.00595.5s100%
65Gemma 4 31B100.0%$0.000315.8s100%
66Mistral Large 2100.0%$0.00527.4s100%
67Qwen 3.6 35B100.0%$0.002412.2s100%
68DeepSeek V4 Flash100.0%$0.000216.2s100%
69GPT-4o, Aug. 6th (temp=0)100.0%$0.00714.7s100%
70GPT-4o, Aug. 6th (temp=1)100.0%$0.00744.7s100%
71DeepSeek V3.2100.0%$0.000517.1s100%
72GPT-5.4 (Reasoning)100.0%$0.00666.8s100%
73Aion 2.0100.0%$0.002214.6s100%
74Qwen 3.5 Flash100.0%$0.001117.3s100%
75DeepSeek V4 Pro100.0%$0.001616.6s100%
76Z.AI GLM 5 Turbo100.0%$0.004312.0s100%
77Grok 4.3 (Reasoning)100.0%$0.003214.2s100%
78o4 Mini High100.0%$0.005011.8s100%
79GPT-5 Nano100.0%$0.000719.6s100%
80Qwen 3.5 Plus (2026-02-15)100.0%$0.001418.4s100%
81MiniMax M2.7100.0%$0.001618.2s100%
82Qwen 3.5 35B100.0%$0.004915.7s100%
83Claude Sonnet 4.6100.0%$0.00977.3s100%
84Claude Sonnet 4.5100.0%$0.01007.7s100%
85Z.AI GLM 4.7 Flash100.0%$0.000625.1s100%
86Claude Sonnet 4100.0%$0.0108.3s100%
87Qwen 3.5 122B100.0%$0.006715.7s100%
88GPT-5100.0%$0.007614.4s100%
89Claude Sonnet 4.6 (Reasoning)100.0%$0.0118.6s100%
90Gemini 3.5 Flash (Reasoning)100.0%$0.0136.5s100%
91MoonshotAI: Kimi K2.5100.0%$0.003424.5s100%
92GPT-5.5 (Reasoning)100.0%$0.0146.3s100%
93GPT-5.5100.0%$0.0145.9s100%
94Qwen 3.5 27B100.0%$0.005022.5s100%
95GPT-5.5 (Reasoning, Low)100.0%$0.0146.2s100%
96Gemini 2.5 Pro100.0%$0.01210.4s100%
97Grok 4100.0%$0.01116.4s100%
98Claude Opus 4.5100.0%$0.0168.1s100%
99Claude 3.5 Sonnet100.0%$0.008921.6s100%
100ByteDance Seed 1.6100.0%$0.003032.3s100%
101Qwen 3.5 Plus (2026-04-20)100.0%$0.004829.2s100%
102Qwen 3.6 27B100.0%$0.006428.1s100%
103Claude Opus 4.6100.0%$0.0179.8s100%
104Gemma 4 31B (Reasoning)100.0%$0.000538.9s100%
105Claude Opus 4.6 (Reasoning)100.0%$0.01810.2s100%
106GPT-4o Mini (temp=0)100.0%$0.000440.5s100%
107Grok 4.20 (Beta)96.7%$0.00201.7s64%
108Gemma 4 26B (Reasoning)100.0%$0.000542.2s100%
109DeepSeek V4 Pro (Reasoning)100.0%$0.003340.8s100%
110Gemini 3.1 Pro (Preview)100.0%$0.01815.8s100%
111Gemini 3 Pro (Preview)100.0%$0.01913.8s100%
112Qwen 3.5 9B100.0%$0.000647.8s100%
113Stealth: Hunter Alpha96.7%$0.000015.4s64%
114Claude Opus 4.7 (Reasoning)100.0%$0.0267.4s100%
115GPT-4o, May 13th (temp=1)100.0%$0.01231.1s100%
116Z.AI GLM 4.6100.0%$0.003346.7s100%
117Claude Opus 4.7100.0%$0.0267.6s100%
118Z.AI GLM 5.1100.0%$0.007540.6s100%
119ByteDance Seed 2.0 Lite100.0%$0.004447.1s100%
120GPT-4o Mini (temp=1)100.0%$0.000457.0s100%
121Claude 3 Haiku96.7%$0.000821.9s64%
122Qwen3.7 Max100.0%$0.01829.4s100%
123Xiaomi MIMO v2.593.3%$0.00105.9s50%
124Stealth: Healer Alpha93.3%$0.00007.9s50%
125GPT-4o, May 13th (temp=0)100.0%$0.01240.5s100%
126Z.AI GLM 5100.0%$0.005851.6s100%
127DeepSeek V3 (2025-03-24)93.3%$0.000610.5s50%
128Qwen 3.5 397B A17B100.0%$0.007654.5s100%
129Llama 3.1 8B90.0%$0.00041.2s40%
130DeepSeek V3.193.3%$0.000615.3s50%
131ByteDance Seed 2.0 Mini100.0%$0.00121.1m100%
132Mistral Large100.0%$0.02235.2s100%
133Qwen3.6 Max Preview100.0%$0.01450.6s100%
134MoonshotAI: Kimi K2.696.7%$0.005732.7s64%
135Cohere Command R+ (Aug. 2024)90.0%$0.00655.6s40%
136Grok 4.386.7%$0.00223.8s32%
137Hermes 3 405B86.7%$0.000015.6s32%
138Z.AI GLM 4.7100.0%$0.00321.5m100%
139Arcee AI: Trinity Mini80.0%$0.00012.4s20%
140Claude Opus 4100.0%$0.05116.8s100%
141WizardLM 2 8x22b90.0%$0.001745.4s40%
142Arcee AI: Trinity Large (Preview)73.3%$0.00006.9s12%
143Claude Haiku 4.570.0%$0.00325.0s8%
144Hermes 3 70B66.7%$0.00079.2s6%
145Gemma 3 4B53.3%$0.00013.8s0%
146Mistral NeMO40.0%$0.00032.7s0%
147Ministral 3B33.3%$0.00011.6s0%
148Ministral 8B33.3%$0.00022.1s0%
149Rocinante 12B36.7%$0.000621.3s0%
150Claude 3.7 Sonnet33.3%$0.0108.2s0%
96.16%

Individual Scenarios

paragraphs

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100090.0%
Rocinante 12B100100100100100100000060.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100090.0%
Claude 3 Haiku100100100100100100100100100090.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100090.0%
Llama 3.1 8B100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)1001001001001001001001000080.0%
Hermes 3 70B10010010010010010010000070.0%
Gemma 3 4B100100100100100100000060.0%
Rocinante 12B100100100000000030.0%
Mistral NeMO10000000000010.0%
Claude 3.7 Sonnet00000000000.0%
Ministral 8B00000000000.0%
Ministral 3B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100090.0%
Stealth: Hunter Alpha100100100100100100100100100090.0%
Grok 4.20 (Beta)100100100100100100100100100090.0%
Stealth: Healer Alpha1001001001001001001001000080.0%
Xiaomi MIMO v2.51001001001001001001001000080.0%
DeepSeek V3.11001001001001001001001000080.0%
Cohere Command R+ (Aug. 2024)1001001001001001001001000080.0%
Llama 3.1 8B1001001001001001001001000080.0%
Hermes 3 405B10010010010010010010000070.0%
WizardLM 2 8x22b10010010010010010010000070.0%
Grok 4.3100100100100100100000060.0%
Arcee AI: Trinity Mini10010010010000000040.0%
Hermes 3 70B100100100000000030.0%
Claude Haiku 4.51001000000000020.0%
Arcee AI: Trinity Large (Preview)1001000000000020.0%
Rocinante 12B1001000000000020.0%
Mistral NeMO10000000000010.0%
Claude 3.7 Sonnet00000000000.0%
Gemma 3 4B00000000000.0%
Ministral 8B00000000000.0%
Ministral 3B00000000000.0%