Matches Regex

Test: Data extraction

Avg. Score
90.2%
Scenarios
10

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Claude Sonnet 4100.0%$0.00031.5s100%
2GPT-4o Mini (temp=0)100.0%$0.00007.2s100%
3Mistral Small Creative97.0%$0.0000349ms66%
4Gemini 3 Flash (Preview, Reasoning)99.0%$0.00297.9s80%
5DeepSeek V3 (2025-03-24)97.0%$0.00002.3s66%
6GPT-4o Mini (temp=1)98.0%$0.000011.5s72%
7Gemma 4 31B96.0%$0.00005.1s61%
8GPT-5.4 Nano95.0%$0.0000720ms56%
9Claude Opus 4.795.0%$0.00071.0s56%
10Gemma 4 26B94.0%$0.00001.9s53%
11Claude Opus 495.0%$0.00176.1s56%
12Gemini 3.5 Flash (Reasoning)99.0%$0.0115.9s80%
13DeepSeek V3 (2024-12-26)93.0%$0.00001.3s49%
14Gemini 2.5 Flash Lite (Reasoning)93.0%$0.00044.2s49%
15GPT-5.4 Mini92.0%$0.0001650ms46%
16Gemma 4 26B (Reasoning)98.0%$0.000437.1s72%
17Rocinante 12B92.0%$0.00002.9s46%
18Mistral Small 4 (Reasoning)92.0%$0.00044.4s46%
19Hermes 3 70B91.0%$0.0000760ms43%
20GPT-4.191.0%$0.00021.2s43%
21Gemini 2.5 Flash (Reasoning)94.0%$0.00357.0s53%
22DeepSeek V3.191.0%$0.00013.0s43%
23ByteDance Seed 2.0 Lite93.0%$0.000911.1s49%
24Gemma 3 4B90.0%$0.0000256ms40%
25Gemini 2.5 Flash Lite90.0%$0.0000352ms40%
26Gemma 3 12B90.0%$0.0000443ms40%
27Ministral 3 14B90.0%$0.0000416ms40%
28Gemini 2.5 Flash90.0%$0.0000465ms40%
29Inception Mercury90.0%$0.0000529ms40%
30Mistral Small 3.2 24B90.0%$0.0000566ms40%
31Gemma 3 27B90.0%$0.0000678ms40%
32Llama 3.1 70B90.0%$0.0001522ms40%
33Mistral Medium 3.190.0%$0.0000634ms40%
34Gemini 3.1 Flash Lite (Preview)90.0%$0.0000703ms40%
35Qwen 2.5 72B90.0%$0.0000706ms40%
36Gemini 3.1 Flash Lite90.0%$0.0000767ms40%
37Llama 3.1 Nemotron 70B90.0%$0.0000814ms40%
38Gemini 3 Flash (Preview)90.0%$0.0000826ms40%
39Mistral Large 290.0%$0.0002437ms40%
40Inception Mercury 290.0%$0.0002481ms40%
41GPT-4.1 Nano90.0%$0.00001.0s40%
42Arcee AI: Trinity Large (Preview)90.0%$0.00001.1s40%
43Mistral Large 390.0%$0.0000989ms40%
44Gemini 3.1 Flash Lite (Reasoning)90.0%$0.00001.2s40%
45GPT-5.490.0%$0.0003660ms40%
46Claude Haiku 4.590.0%$0.00011.1s40%
47GPT-4.1 Mini90.0%$0.00001.4s40%
48DeepSeek-V2 Chat90.0%$0.00001.5s40%
49GPT-4o, Aug. 6th (temp=1)90.0%$0.0002955ms40%
50GPT-4o, Aug. 6th (temp=0)90.0%$0.0002970ms40%
51Hermes 3 405B91.0%$0.00006.4s43%
52WizardLM 2 8x22b91.0%$0.00016.2s43%
53GPT-5.4 Nano (Reasoning, Low)90.0%$0.00011.9s40%
54Claude Sonnet 4.690.0%$0.00041.1s40%
55ByteDance Seed 1.6 Flash90.0%$0.00012.2s40%
56GPT-5.590.0%$0.00051.2s40%
57Z.AI GLM 4.590.0%$0.00012.6s40%
58GPT-5.4 Mini (Reasoning, Low)90.0%$0.00022.1s40%
59Claude 3.7 Sonnet90.0%$0.00041.7s40%
60Claude Sonnet 4.590.0%$0.00031.9s40%
61Claude Opus 4.7 (Reasoning)90.0%$0.0007979ms40%
62GPT-5.4 Nano (Reasoning)90.0%$0.00012.8s40%
63GPT-5 Mini91.0%$0.00085.7s43%
64GPT-5.291.0%$0.00163.7s43%
65Claude Opus 4.690.0%$0.00062.4s40%
66GPT-5.4 Mini (Reasoning)90.0%$0.00052.9s40%
67Mistral Large91.0%$0.00116.0s43%
68Claude Opus 4.590.0%$0.00082.2s40%
69Grok 4 Fast90.0%$0.00033.9s40%
70GPT-5.4 (Reasoning, Low)90.0%$0.00092.3s40%
71Mistral Small 489.0%$0.0000527ms37%
72Claude 3 Haiku90.0%$0.00005.4s40%
73Gemini 3.5 Flash (Reasoning, Minimal)89.0%$0.0001854ms37%
74Qwen3 235B A22B Instruct 250789.0%$0.00001.4s37%
75Writer: Palmyra X590.0%$0.00025.3s40%
76Claude 3.5 Sonnet90.0%$0.00035.4s40%
77Grok 4.20 (Beta)89.0%$0.00021.6s37%
78GPT-4o, May 13th (temp=1)90.0%$0.00046.0s40%
79Qwen 3.6 Flash90.0%$0.00114.1s40%
80GPT-4o, May 13th (temp=0)90.0%$0.00046.2s40%
81Qwen 3.5 Plus (2026-02-15)89.0%$0.00002.9s37%
82Nemotron 3 Super90.0%$0.00007.9s40%
83Stealth: Healer Alpha90.0%$0.00007.9s40%
84GPT-5.190.0%$0.00134.3s40%
85GPT-OSS 120B90.0%$0.00018.0s40%
86Grok 4.1 Fast90.0%$0.00047.0s40%
87GPT-5 Nano90.0%$0.00027.7s40%
88GPT-5.4 (Reasoning)90.0%$0.00173.8s40%
89Llama 3.1 8B88.0%$0.0000427ms35%
90Qwen 3.6 35B90.0%$0.00106.4s40%
91Arcee AI: Trinity Mini89.0%$0.00014.6s37%
92Nemotron 3 Nano90.0%$0.00019.5s40%
93GPT-5.5 (Reasoning, Low)90.0%$0.00223.2s40%
94Qwen 3.5 Flash90.0%$0.00058.9s40%
95ByteDance Seed 1.690.0%$0.00078.5s40%
96Qwen 3 32B90.0%$0.00029.9s40%
97Claude Sonnet 4.6 (Reasoning)90.0%$0.00253.4s40%
98Z.AI GLM 5 Turbo90.0%$0.00176.1s40%
99Xiaomi MIMO v2.590.0%$0.00147.1s40%
100Z.AI GLM 4.695.0%$0.001831.4s56%
101Mistral NeMO88.0%$0.00003.3s35%
102GPT-5.5 (Reasoning)90.0%$0.00283.5s40%
103Z.AI GLM 4.7 Flash90.0%$0.000312.5s40%
104Stealth: Hunter Alpha91.0%$0.000019.9s43%
105Z.AI GLM 4.5 Air91.0%$0.000817.6s43%
106MiniMax M2.591.0%$0.000917.2s43%
107DeepSeek V3.289.0%$0.000210.5s37%
108Qwen 3.5 35B90.0%$0.00258.4s40%
109MiniMax M2.791.0%$0.001416.8s43%
110Grok 4.20 (Reasoning)90.0%$0.002210.3s40%
111Claude Opus 4.6 (Reasoning)90.0%$0.00414.7s40%
112Gemini 2.5 Pro94.0%$0.00939.0s53%
113Xiaomi MIMO v2.5 Pro90.0%$0.002312.5s40%
114Grok 4.20 (Beta, Reasoning)90.0%$0.00505.0s40%
115o4 Mini91.0%$0.002019.5s43%
116GPT-590.0%$0.00418.5s40%
117Qwen 3.6 27B89.0%$0.002110.7s37%
118Qwen 3.5 27B90.0%$0.003013.8s40%
119Qwen 3.5 9B90.0%$0.000224.5s40%
120Aion 2.090.0%$0.001421.0s40%
121Gemma 4 31B (Reasoning)97.0%$0.00041.0m66%
122Stealth: Aurora Alpha90.0%1.5s40%
123MoonshotAI: Kimi K2.592.0%$0.003226.0s46%
124Grok 4.384.0%$0.0002745ms27%
125o4 Mini High90.0%$0.004712.5s40%
126DeepSeek V4 Pro85.0%$0.00015.7s29%
127Grok 4.3 (Reasoning)90.0%$0.002519.6s40%
128Qwen3.6 Max Preview92.0%$0.005621.8s46%
129Grok 4.2083.0%$0.00021.1s25%
130Qwen 3.5 Plus (2026-04-20)90.0%$0.003322.7s40%
131ByteDance Seed 2.0 Mini90.0%$0.000531.5s40%
132Ministral 8B82.0%$0.0000305ms23%
133DeepSeek V4 Flash (Reasoning)90.0%$0.000332.2s40%
134Qwen3.7 Max90.0%$0.006514.6s40%
135Cohere Command R+ (Aug. 2024)81.0%$0.0002447ms22%
136Ministral 3 3B80.0%$0.0000360ms20%
137LFM2 24B80.0%$0.00001.3s20%
138Ministral 3B79.0%$0.0000297ms19%
139Z.AI GLM 5.193.0%$0.006441.2s49%
140Qwen 3.5 122B90.0%$0.008521.6s40%
141Qwen 3.5 397B A17B90.0%$0.004834.5s40%
142Z.AI GLM 4.790.0%$0.002142.9s40%
143Z.AI GLM 592.0%$0.004646.7s46%
144Gemini 3 Pro (Preview)90.0%$0.01512.9s40%
145MoonshotAI: Kimi K2.690.0%$0.005243.0s40%
146Ministral 3 8B70.0%$0.0000367ms8%
147DeepSeek V4 Flash69.0%$0.00002.8s8%
148DeepSeek V4 Pro (Reasoning)90.0%$0.00241.2m40%
149Gemini 3.1 Pro (Preview)92.0%$0.02422.8s46%
150Grok 490.0%$0.01930.3s40%
90.22%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Rocinante 12B100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100090.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100090.0%
DeepSeek V4 Flash100100100100100100100100100090.0%
Ministral 8B100100100100100100100100100090.0%
Ministral 3B1001001001001001001001000080.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Rocinante 12B100100100100100100100100100100100.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Rocinante 12B100100100100100100100100100100100.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100090.0%
Llama 3.1 8B100100100100100100100100100090.0%
Mistral Small 4 (Reasoning)1001001001001001001001000080.0%
Rocinante 12B1001001001001001001001000080.0%
Arcee AI: Trinity Mini10010010010010010010000070.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100090.0%
Rocinante 12B100100100100100100100100100090.0%
Hermes 3 405B1001001001001001001001000080.0%
DeepSeek V3.21001001001001001001001000080.0%
Grok 4.31001001001001001001001000080.0%
Ministral 8B1001001001001001001001000080.0%
Ministral 3B10010010010010010010000070.0%
DeepSeek V4 Pro1001001001001000000050.0%
DeepSeek V4 Flash00000000000.0%
Ministral 3 8B00000000000.0%
Cohere Command R+ (Aug. 2024)00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Rocinante 12B100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100090.0%
DeepSeek V3.1100100100100100100100100100090.0%
GPT-5.4 Nano100100100100100100100100100090.0%
Llama 3.1 8B100100100100100100100100100090.0%
Grok 4.310010010010010010010000070.0%
Ministral 8B1001001001001000000050.0%
Grok 4.20100100100000000030.0%
Ministral 3B1001000000000020.0%
DeepSeek V4 Flash00000000000.0%
Ministral 3 8B00000000000.0%
Ministral 3 3B00000000000.0%
LFM2 24B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Rocinante 12B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)1001001001001001001001000080.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Sonnet 4100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100090.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100090.0%
Gemma 4 26B (Reasoning)1001001001001001001001000080.0%
GPT-4o Mini (temp=1)1001001001001001001001000080.0%
Gemma 4 31B (Reasoning)10010010010010010010000070.0%
DeepSeek V3 (2025-03-24)10010010010010010010000070.0%
Mistral Small Creative10010010010010010010000070.0%
Gemma 4 31B100100100100100100000060.0%
Gemini 2.5 Flash (Reasoning)100100100100100100000060.0%
GPT-5.4 Nano100100100100100100000060.0%
Claude Opus 4.71001001001001000000050.0%
Z.AI GLM 4.61001001001001000000050.0%
Claude Opus 41001001001001000000050.0%
Rocinante 12B1001001001001000000050.0%
Gemini 2.5 Pro10010010010000000040.0%
Gemma 4 26B10010010010000000040.0%
Mistral Small 4 (Reasoning)10010010010000000040.0%
Z.AI GLM 5.1100100100000000030.0%
Gemini 2.5 Flash Lite (Reasoning)100100100000000030.0%
ByteDance Seed 2.0 Lite100100100000000030.0%
DeepSeek V3 (2024-12-26)100100100000000030.0%
Hermes 3 405B100100100000000030.0%
Qwen3.6 Max Preview1001000000000020.0%
Gemini 3.1 Pro (Preview)1001000000000020.0%
Z.AI GLM 51001000000000020.0%
MoonshotAI: Kimi K2.51001000000000020.0%
Z.AI GLM 4.5 Air1001000000000020.0%
GPT-5.4 Mini1001000000000020.0%
DeepSeek V3.11001000000000020.0%
Arcee AI: Trinity Mini1001000000000020.0%
Ministral 3B1001000000000020.0%
GPT-5 Mini10000000000010.0%
GPT-5.210000000000010.0%
MiniMax M2.710000000000010.0%
MiniMax M2.510000000000010.0%
GPT-4.110000000000010.0%
o4 Mini10000000000010.0%
Stealth: Hunter Alpha10000000000010.0%
DeepSeek V3.210000000000010.0%
Mistral Large10000000000010.0%
Hermes 3 70B10000000000010.0%
WizardLM 2 8x22b10000000000010.0%
Cohere Command R+ (Aug. 2024)10000000000010.0%
Ministral 8B10000000000010.0%
Qwen3.7 Max00000000000.0%
Claude Opus 4.6 (Reasoning)00000000000.0%
Z.AI GLM 5 Turbo00000000000.0%
Claude Sonnet 4.6 (Reasoning)00000000000.0%
Grok 4.3 (Reasoning)00000000000.0%
GPT-5.4 (Reasoning)00000000000.0%
Claude Opus 4.7 (Reasoning)00000000000.0%
GPT-5.5 (Reasoning)00000000000.0%
GPT-5.5 (Reasoning, Low)00000000000.0%
GPT-5.100000000000.0%
Claude Opus 4.600000000000.0%
MoonshotAI: Kimi K2.600000000000.0%
GPT-500000000000.0%
Qwen 3.5 397B A17B00000000000.0%
Qwen 3.5 122B00000000000.0%
Qwen 3.5 Plus (2026-04-20)00000000000.0%
Grok 4.20 (Beta, Reasoning)00000000000.0%
GPT-5.4 (Reasoning, Low)00000000000.0%
Grok 4.20 (Reasoning)00000000000.0%
Claude Sonnet 4.600000000000.0%
Qwen 3.5 27B00000000000.0%
ByteDance Seed 1.600000000000.0%
Qwen 3.6 Flash00000000000.0%
GPT-5.4 Mini (Reasoning)00000000000.0%
o4 Mini High00000000000.0%
DeepSeek V4 Pro (Reasoning)00000000000.0%
Qwen 3.6 27B00000000000.0%
Claude Opus 4.500000000000.0%
Grok 4.1 Fast00000000000.0%
Aion 2.000000000000.0%
GPT-5.500000000000.0%
Qwen 3.6 35B00000000000.0%
DeepSeek V4 Flash (Reasoning)00000000000.0%
Gemini 3 Pro (Preview)00000000000.0%
Z.AI GLM 4.700000000000.0%
Grok 400000000000.0%
Claude Sonnet 4.500000000000.0%
Qwen 3.5 35B00000000000.0%
Xiaomi MIMO v2.5 Pro00000000000.0%
ByteDance Seed 2.0 Mini00000000000.0%
Gemini 3.5 Flash (Reasoning, Minimal)00000000000.0%
GPT-OSS 120B00000000000.0%
Gemini 3.1 Flash Lite (Reasoning)00000000000.0%
Qwen 3.5 Flash00000000000.0%
Z.AI GLM 4.500000000000.0%
Grok 4 Fast00000000000.0%
Qwen 3.5 9B00000000000.0%
Qwen 3.5 Plus (2026-02-15)00000000000.0%
Stealth: Healer Alpha00000000000.0%
Gemini 3.1 Flash Lite (Preview)00000000000.0%
Gemini 3.1 Flash Lite00000000000.0%
GPT-5.4 Mini (Reasoning, Low)00000000000.0%
Mistral Large 300000000000.0%
GPT-4o, May 13th (temp=0)00000000000.0%
Gemini 3 Flash (Preview)00000000000.0%
Claude Haiku 4.500000000000.0%
Xiaomi MIMO v2.500000000000.0%
DeepSeek-V2 Chat00000000000.0%
Z.AI GLM 4.7 Flash00000000000.0%
Nemotron 3 Super00000000000.0%
GPT-5.400000000000.0%
Claude 3.5 Sonnet00000000000.0%
Grok 4.20 (Beta)00000000000.0%
Inception Mercury 200000000000.0%
GPT-4o, May 13th (temp=1)00000000000.0%
Stealth: Aurora Alpha00000000000.0%
Claude 3.7 Sonnet00000000000.0%
GPT-4.1 Mini00000000000.0%
DeepSeek V4 Pro00000000000.0%
GPT-4o, Aug. 6th (temp=1)00000000000.0%
GPT-5 Nano00000000000.0%
GPT-4o, Aug. 6th (temp=0)00000000000.0%
Mistral Large 200000000000.0%
Qwen 3 32B00000000000.0%
DeepSeek V4 Flash00000000000.0%
Grok 4.2000000000000.0%
GPT-5.4 Nano (Reasoning)00000000000.0%
Gemini 2.5 Flash Lite00000000000.0%
Gemini 2.5 Flash00000000000.0%
Qwen3 235B A22B Instruct 250700000000000.0%
Writer: Palmyra X500000000000.0%
Inception Mercury00000000000.0%
GPT-5.4 Nano (Reasoning, Low)00000000000.0%
Grok 4.300000000000.0%
Mistral Small 3.2 24B00000000000.0%
Gemma 3 12B00000000000.0%
Llama 3.1 70B00000000000.0%
Gemma 3 27B00000000000.0%
Mistral Medium 3.100000000000.0%
Nemotron 3 Nano00000000000.0%
Mistral Small 400000000000.0%
Qwen 2.5 72B00000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
Arcee AI: Trinity Large (Preview)00000000000.0%
ByteDance Seed 1.6 Flash00000000000.0%
Ministral 3 14B00000000000.0%
GPT-4.1 Nano00000000000.0%
Ministral 3 8B00000000000.0%
Claude 3 Haiku00000000000.0%
Gemma 3 4B00000000000.0%
Ministral 3 3B00000000000.0%
Mistral NeMO00000000000.0%
Llama 3.1 8B00000000000.0%
LFM2 24B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Rocinante 12B100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100090.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Rocinante 12B100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100090.0%
Grok 4.3100100100100100100100100100090.0%
Ministral 8B100100100100100100100100100090.0%
Mistral NeMO1001001001001001001001000080.0%