Matches text

Test: Data extraction

Avg. Score
91.5%
Scenarios
9

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemma 3 4B100.0%$0.0000249ms100%
2Gemini 2.5 Flash Lite100.0%$0.0000355ms100%
3Inception Mercury100.0%$0.0000393ms100%
4Gemma 3 12B100.0%$0.0000414ms100%
5Gemini 2.5 Flash100.0%$0.0000455ms100%
6Inception Mercury 2100.0%$0.0001361ms100%
7Gemma 3 27B100.0%$0.0000691ms100%
8Gemini 3.1 Flash Lite100.0%$0.0000729ms100%
9GPT-5.4 Mini100.0%$0.0001662ms100%
10Gemini 3 Flash (Preview)100.0%$0.0000831ms100%
11GPT-5.4100.0%$0.0003652ms100%
12GPT-5.4 Nano (Reasoning, Low)100.0%$0.00001.5s100%
13Grok 4 Fast100.0%$0.00011.5s100%
14Gemma 4 26B100.0%$0.00001.9s100%
15Stealth: Aurora Alpha100.0%1.5s100%
16Gemini 2.5 Flash Lite (Reasoning)100.0%$0.00011.9s100%
17GPT-5.4 Mini (Reasoning, Low)100.0%$0.00021.9s100%
18GPT-5.5100.0%$0.00051.2s100%
19GPT-5.4 Mini (Reasoning)100.0%$0.00032.1s100%
20Claude Opus 4.7100.0%$0.0007978ms100%
21Claude Opus 4.7 (Reasoning)100.0%$0.0007988ms100%
22Nemotron 3 Super100.0%$0.00003.2s100%
23GPT-5.4 (Reasoning, Low)100.0%$0.00061.8s100%
24GPT-5.2100.0%$0.00062.1s100%
25GPT-5 Mini100.0%$0.00033.1s100%
26GPT-5.1100.0%$0.00052.7s100%
27ByteDance Seed 2.0 Lite100.0%$0.00034.0s100%
28Gemini 3.1 Flash Lite (Preview)98.9%$0.0000697ms79%
29o4 Mini High100.0%$0.00073.0s100%
30Qwen 3.6 Flash100.0%$0.00073.0s100%
31GPT-4.1 Nano98.9%$0.0000994ms79%
32GPT-5.4 (Reasoning)100.0%$0.00092.6s100%
33Gemini 3.1 Flash Lite (Reasoning)98.9%$0.00001.2s79%
34GPT-OSS 120B100.0%$0.00005.3s100%
35Gemini 3 Flash (Preview, Reasoning)100.0%$0.00063.8s100%
36Gemma 4 31B100.0%$0.00005.5s100%
37Z.AI GLM 5 Turbo100.0%$0.00083.4s100%
38GPT-5 Nano100.0%$0.00015.3s100%
39Qwen 3.6 35B100.0%$0.00064.1s100%
40Qwen 3.5 Flash100.0%$0.00025.2s100%
41GPT-5.5 (Reasoning, Low)100.0%$0.00132.7s100%
42Qwen 3.5 Plus (2026-02-15)98.9%$0.00002.4s79%
43Nemotron 3 Nano100.0%$0.00006.7s100%
44GPT-5.5 (Reasoning)100.0%$0.00152.7s100%
45MiniMax M2.5100.0%$0.00046.5s100%
46DeepSeek V4 Flash (Reasoning)98.9%$0.00003.4s79%
47Qwen 3.5 35B100.0%$0.00134.3s100%
48GPT-5.4 Nano (Reasoning)97.8%$0.00011.9s71%
49Gemma 4 26B (Reasoning)100.0%$0.00018.4s100%
50Gemini 3.5 Flash (Reasoning, Minimal)96.7%$0.0001851ms64%
51Gemini 3.5 Flash (Reasoning)100.0%$0.00241.8s100%
52Z.AI GLM 4.7 Flash100.0%$0.00028.7s100%
53MoonshotAI: Kimi K2.5100.0%$0.00087.1s100%
54GPT-5100.0%$0.00184.3s100%
55Qwen 3.5 27B100.0%$0.00146.8s100%
56Gemini 2.5 Flash (Reasoning)97.8%$0.00082.5s71%
57MiniMax M2.798.9%$0.00035.9s79%
58MoonshotAI: Kimi K2.6100.0%$0.00098.7s100%
59Z.AI GLM 4.594.4%$0.00012.7s54%
60ByteDance Seed 1.6 Flash93.3%$0.00011.8s50%
61DeepSeek V4 Pro (Reasoning)98.9%$0.00038.0s79%
62Z.AI GLM 5.1100.0%$0.001210.0s100%
63GPT-5.4 Nano91.1%$0.0000711ms43%
64Mistral Small 490.0%$0.0000509ms40%
65Qwen 3.5 9B100.0%$0.000114.4s100%
66Grok 4.1 Fast92.2%$0.00012.6s46%
67Mistral Small Creative88.9%$0.0000354ms37%
68Ministral 3 14B88.9%$0.0000424ms37%
69Gemini 3.1 Pro (Preview)100.0%$0.00325.9s100%
70Mistral Medium 3.188.9%$0.0000650ms37%
71Qwen 2.5 72B88.9%$0.0000715ms37%
72Grok 4.3 (Reasoning)100.0%$0.001212.0s100%
73Mistral Large 388.9%$0.00001.0s37%
74Claude Haiku 4.588.9%$0.00011.1s37%
75GPT-4.1 Mini88.9%$0.00001.4s37%
76DeepSeek-V2 Chat88.9%$0.00001.5s37%
77GPT-4o, Aug. 6th (temp=1)88.9%$0.0002962ms37%
78GPT-4o, Aug. 6th (temp=0)88.9%$0.0002966ms37%
79Hermes 3 70B87.8%$0.0000742ms34%
80GPT-4.188.9%$0.00021.2s37%
81DeepSeek V3 (2024-12-26)87.8%$0.0000897ms34%
82Gemini 2.5 Pro100.0%$0.00404.9s100%
83Grok 4.387.8%$0.0002727ms34%
84DeepSeek V3.291.1%$0.00003.7s43%
85Qwen3 235B A22B Instruct 250787.8%$0.00001.4s34%
86Qwen 3.6 27B98.9%$0.00158.3s79%
87Mistral Small 4 (Reasoning)88.9%$0.00012.1s37%
88Claude Sonnet 488.9%$0.00041.5s37%
89Gemini 3 Pro (Preview)100.0%$0.00415.3s100%
90Qwen 3.5 Plus (2026-04-20)98.9%$0.00129.7s79%
91Llama 3.1 8B85.6%$0.0000404ms30%
92Claude 3.7 Sonnet88.9%$0.00041.7s37%
93Claude Sonnet 4.588.9%$0.00031.9s37%
94Mistral Large 285.6%$0.0002436ms30%
95Qwen3.7 Max100.0%$0.00348.1s100%
96Aion 2.095.6%$0.00058.0s59%
97Llama 3.1 Nemotron 70B84.4%$0.0000821ms28%
98Grok 4100.0%$0.00426.6s100%
99Gemma 4 31B (Reasoning)100.0%$0.000118.6s100%
100Xiaomi MIMO v2.591.1%$0.00074.0s43%
101Mistral NeMO86.7%$0.00002.9s32%
102Stealth: Healer Alpha90.0%$0.00005.4s40%
103Claude Sonnet 4.6 (Reasoning)88.9%$0.00101.6s37%
104Claude Opus 4.588.9%$0.00092.2s37%
105Claude Opus 4.687.8%$0.00062.4s34%
106o4 Mini100.0%$0.000617.7s100%
107Grok 4.20 (Beta)84.4%$0.00021.8s28%
108Z.AI GLM 4.7100.0%$0.000817.4s100%
109ByteDance Seed 1.690.0%$0.00045.3s40%
110Z.AI GLM 5100.0%$0.001515.9s100%
111Writer: Palmyra X588.9%$0.00025.2s37%
112DeepSeek V3 (2025-03-24)82.2%$0.00001.7s24%
113Claude 3 Haiku87.8%$0.00005.4s34%
114Llama 3.1 70B80.0%$0.0001533ms20%
115Claude 3.5 Sonnet88.9%$0.00045.4s37%
116ByteDance Seed 2.0 Mini92.2%$0.00018.8s46%
117Z.AI GLM 4.6100.0%$0.001118.3s100%
118GPT-4o, May 13th (temp=1)88.9%$0.00045.9s37%
119Grok 4.2080.0%$0.00021.1s20%
120Mistral Small 3.2 24B77.8%$0.0000505ms17%
121GPT-4o, May 13th (temp=0)88.9%$0.00046.2s37%
122Qwen 3.5 397B A17B100.0%$0.002115.7s100%
123Claude Opus 4.6 (Reasoning)88.9%$0.00172.5s37%
124Qwen 3 32B87.8%$0.00016.5s34%
125GPT-4o Mini (temp=0)88.9%$0.00007.6s37%
126Arcee AI: Trinity Large (Preview)77.8%$0.00001.2s17%
127DeepSeek V4 Pro83.3%$0.00014.2s25%
128LFM2 24B77.8%$0.00001.3s17%
129Ministral 3 3B75.6%$0.0000365ms14%
130DeepSeek V3.178.9%$0.00002.3s18%
131Hermes 3 405B85.6%$0.00006.3s30%
132Xiaomi MIMO v2.5 Pro88.9%$0.00106.3s37%
133Claude Sonnet 4.677.8%$0.00041.1s17%
134Qwen3.6 Max Preview100.0%$0.003514.0s100%
135Arcee AI: Trinity Mini75.6%$0.00012.1s14%
136Ministral 8B70.0%$0.0000307ms8%
137Claude Opus 488.9%$0.00176.2s37%
138GPT-4o Mini (temp=1)88.9%$0.000011.3s37%
139Ministral 3B66.7%$0.0000297ms6%
140Ministral 3 8B66.7%$0.0000372ms6%
141DeepSeek V4 Flash68.9%$0.00002.9s7%
142Z.AI GLM 4.5 Air86.7%$0.000512.1s32%
143Rocinante 12B57.8%$0.00002.8s1%
144Stealth: Hunter Alpha88.9%$0.000018.5s37%
145Cohere Command R+ (Aug. 2024)46.7%$0.0002446ms0%
146Qwen 3.5 122B100.0%$0.006315.7s100%
147WizardLM 2 8x22b48.9%$0.00016.1s0%
148Mistral Large41.1%$0.00105.9s0%
149Grok 4.20 (Beta, Reasoning)46.7%$0.00293.5s0%
150Grok 4.20 (Reasoning)32.2%$0.00146.1s0%
91.46%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100090.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100090.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100090.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100090.0%
Qwen 3 32B100100100100100100100100100090.0%
DeepSeek V4 Flash100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
ByteDance Seed 1.6 Flash100100100100100100100100100090.0%
Ministral 8B100100100100100100100100100090.0%
Llama 3.1 8B100100100100100100100100100090.0%
Ministral 3B1001001001001001001001000080.0%
Rocinante 12B10010010010010010010000070.0%
Mistral Large1001001001001000000050.0%
Arcee AI: Trinity Mini1001000000000020.0%
Cohere Command R+ (Aug. 2024)1001000000000020.0%
Grok 4.20 (Beta, Reasoning)00000000000.0%
Grok 4.20 (Reasoning)00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100090.0%
MiniMax M2.7100100100100100100100100100090.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100090.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100090.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100090.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100090.0%
Grok 4.20100100100100100100100100100090.0%
GPT-4.1 Nano100100100100100100100100100090.0%
Grok 4.20 (Beta)1001001001001001001001000080.0%
GPT-5.4 Nano (Reasoning)1001001001001001001001000080.0%
ByteDance Seed 1.6 Flash1001001001001001001001000080.0%
Grok 4.1 Fast10010010010010010010000070.0%
Grok 4.310010010010010010010000070.0%
Aion 2.0100100100100100100000060.0%
Z.AI GLM 4.51001001001001000000050.0%
DeepSeek V4 Pro1001001001001000000050.0%
Cohere Command R+ (Aug. 2024)1001001001001000000050.0%
DeepSeek V3.210010010010000000040.0%
DeepSeek V4 Flash100100100000000030.0%
GPT-5.4 Nano100100100000000030.0%
Xiaomi MIMO v2.51001000000000020.0%
Z.AI GLM 4.5 Air1001000000000020.0%
Mistral Small 4 (Reasoning)1001000000000020.0%
Llama 3.1 70B1001000000000020.0%
Mistral Small 41001000000000020.0%
Stealth: Hunter Alpha10000000000010.0%
Stealth: Healer Alpha10000000000010.0%
Mistral NeMO10000000000010.0%
Rocinante 12B10000000000010.0%
Claude Opus 4.6 (Reasoning)00000000000.0%
Claude Sonnet 4.6 (Reasoning)00000000000.0%
Claude Opus 4.600000000000.0%
Grok 4.20 (Beta, Reasoning)00000000000.0%
Grok 4.20 (Reasoning)00000000000.0%
Claude Sonnet 4.600000000000.0%
Claude Opus 4.500000000000.0%
Claude Sonnet 400000000000.0%
GPT-4.100000000000.0%
Claude Sonnet 4.500000000000.0%
Claude Opus 400000000000.0%
Xiaomi MIMO v2.5 Pro00000000000.0%
Mistral Large 300000000000.0%
GPT-4o, May 13th (temp=0)00000000000.0%
Claude Haiku 4.500000000000.0%
DeepSeek-V2 Chat00000000000.0%
Claude 3.5 Sonnet00000000000.0%
GPT-4o, May 13th (temp=1)00000000000.0%
DeepSeek V3 (2024-12-26)00000000000.0%
Claude 3.7 Sonnet00000000000.0%
GPT-4.1 Mini00000000000.0%
Hermes 3 405B00000000000.0%
GPT-4o, Aug. 6th (temp=1)00000000000.0%
GPT-4o, Aug. 6th (temp=0)00000000000.0%
Mistral Large 200000000000.0%
DeepSeek V3.100000000000.0%
Qwen 3 32B00000000000.0%
DeepSeek V3 (2025-03-24)00000000000.0%
Mistral Large00000000000.0%
Qwen3 235B A22B Instruct 250700000000000.0%
Writer: Palmyra X500000000000.0%
GPT-4o Mini (temp=1)00000000000.0%
Mistral Small 3.2 24B00000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Mistral Medium 3.100000000000.0%
Qwen 2.5 72B00000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
Arcee AI: Trinity Large (Preview)00000000000.0%
Mistral Small Creative00000000000.0%
Hermes 3 70B00000000000.0%
Ministral 3 14B00000000000.0%
Ministral 3 8B00000000000.0%
Claude 3 Haiku00000000000.0%
WizardLM 2 8x22b00000000000.0%
Arcee AI: Trinity Mini00000000000.0%
Ministral 3 3B00000000000.0%
Ministral 8B00000000000.0%
Llama 3.1 8B00000000000.0%
Ministral 3B00000000000.0%
LFM2 24B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
Rocinante 12B100100100100100100000060.0%
Grok 4.20 (Beta, Reasoning)1001001001001000000050.0%
WizardLM 2 8x22b100100100000000030.0%
Cohere Command R+ (Aug. 2024)100100100000000030.0%
Grok 4.20 (Reasoning)1001000000000020.0%
Llama 3.1 70B10000000000010.0%
Mistral Large00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100090.0%
Qwen 3.6 27B100100100100100100100100100090.0%
Hermes 3 405B100100100100100100100100100090.0%
ByteDance Seed 1.6 Flash100100100100100100100100100090.0%
Llama 3.1 8B100100100100100100100100100090.0%
Z.AI GLM 4.5 Air1001001001001001001001000080.0%
Mistral Small 4 (Reasoning)1001001001001001001001000080.0%
Grok 4.31001001001001001001001000080.0%
Cohere Command R+ (Aug. 2024)1001001001001001001001000080.0%
Mistral Large10010010010010010010000070.0%
Arcee AI: Trinity Mini10010010010010010010000070.0%
Grok 4.1 Fast100100100100100100000060.0%
Rocinante 12B100100100100100100000060.0%
Grok 4.20 (Reasoning)1001001001001000000050.0%
WizardLM 2 8x22b1001001001001000000050.0%
ByteDance Seed 2.0 Mini100100100000000030.0%
ByteDance Seed 1.610000000000010.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100090.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100090.0%
Z.AI GLM 4.5 Air100100100100100100100100100090.0%
Mistral Small 4100100100100100100100100100090.0%
Claude 3 Haiku100100100100100100100100100090.0%
Arcee AI: Trinity Mini100100100100100100100100100090.0%
Mistral NeMO100100100100100100100100100090.0%
Grok 4.20 (Reasoning)1001001001001001001001000080.0%
Hermes 3 405B1001001001001001001001000080.0%
DeepSeek V3.21001001001001001001001000080.0%
Grok 4.31001001001001001001001000080.0%
Ministral 3 3B1001001001001001001001000080.0%
Mistral Large 210010010010010010010000070.0%
DeepSeek V3 (2025-03-24)100100100100100100000060.0%
Rocinante 12B1001001001001000000050.0%
DeepSeek V3.1100100100000000030.0%
Ministral 3B10000000000010.0%
Claude Sonnet 4.600000000000.0%
Grok 4.20 (Beta)00000000000.0%
DeepSeek V4 Pro00000000000.0%
DeepSeek V4 Flash00000000000.0%
Grok 4.2000000000000.0%
Mistral Large00000000000.0%
Mistral Small 3.2 24B00000000000.0%
Arcee AI: Trinity Large (Preview)00000000000.0%
Ministral 3 8B00000000000.0%
WizardLM 2 8x22b00000000000.0%
Cohere Command R+ (Aug. 2024)00000000000.0%
Ministral 8B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100090.0%
Llama 3.1 70B100100100100100100100100100090.0%
Llama 3.1 Nemotron 70B100100100100100100100100100090.0%
GPT-5.4 Nano100100100100100100100100100090.0%
Llama 3.1 8B100100100100100100100100100090.0%
Grok 4.20 (Beta)1001001001001001001001000080.0%
DeepSeek V3.11001001001001001001001000080.0%
ByteDance Seed 1.6 Flash1001001001001001001001000080.0%
Grok 4.310010010010010010010000070.0%
Cohere Command R+ (Aug. 2024)1001001001001000000050.0%
Ministral 8B1001001001001000000050.0%
Grok 4.20100100100000000030.0%
Mistral Large100100100000000030.0%
WizardLM 2 8x22b100100100000000030.0%
Rocinante 12B100100100000000030.0%
Grok 4.20 (Beta, Reasoning)1001000000000020.0%
Grok 4.20 (Reasoning)1001000000000020.0%
Ministral 3B10000000000010.0%
DeepSeek V4 Flash00000000000.0%
Ministral 3 8B00000000000.0%
Ministral 3 3B00000000000.0%
LFM2 24B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)1001001001001001001001000080.0%
Grok 4.20 (Beta, Reasoning)10010010010010010010000070.0%
Llama 3.1 Nemotron 70B10010010010010010010000070.0%
Cohere Command R+ (Aug. 2024)10010010010010010010000070.0%
Rocinante 12B10010010010010010010000070.0%
Grok 4.20 (Reasoning)10010010010000000040.0%
WizardLM 2 8x22b10010010010000000040.0%
Mistral Large100100100000000030.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100090.0%
WizardLM 2 8x22b100100100100100100100100100090.0%
Rocinante 12B1001001001001001001001000080.0%
Grok 4.20 (Beta, Reasoning)10010010010010010010000070.0%
Grok 4.20 (Reasoning)100100100100100100000060.0%
Cohere Command R+ (Aug. 2024)1001001001001000000050.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100090.0%
Mistral Large100100100100100100100100100090.0%
Grok 4.3100100100100100100100100100090.0%
Hermes 3 70B100100100100100100100100100090.0%
Ministral 8B100100100100100100100100100090.0%
Rocinante 12B100100100100100100100100100090.0%
Mistral NeMO1001001001001001001001000080.0%
Cohere Command R+ (Aug. 2024)10010010010010010010000070.0%
Grok 4.20 (Beta, Reasoning)100100100000000030.0%
Grok 4.20 (Reasoning)1001000000000020.0%