Matches Regex

Test: Voice/dialogue sheets

Avg. Score
63.5%
Scenarios
5

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemma 4 31B100.0%$0.00018.3s100%
2Claude Sonnet 4.6100.0%$0.00331.9s100%
3Claude Sonnet 4100.0%$0.00332.8s100%
4Claude 3.7 Sonnet100.0%$0.00343.5s100%
5Claude Opus 4.6100.0%$0.00553.9s100%
6Claude Opus 4.6 (Reasoning)100.0%$0.00603.1s100%
7GPT-4.196.0%$0.00172.6s61%
8GPT-4o, May 13th (temp=0)96.0%$0.00375.2s61%
9GPT-4o, Aug. 6th (temp=0)92.0%$0.00222.4s46%
10Gemini 2.5 Flash88.0%$0.0005933ms35%
11ByteDance Seed 1.694.0%$0.001314.2s53%
12Gemma 4 26B88.0%$0.00014.5s35%
13Qwen 3.5 Plus (2026-02-15)86.0%$0.00056.6s31%
14Grok 4 Fast84.0%$0.00033.6s27%
15Grok 4.1 Fast84.0%$0.00044.4s27%
16GPT-4o, May 13th (temp=1)90.0%$0.00375.2s40%
17DeepSeek V3 (2025-03-24)84.0%$0.00036.1s27%
18Grok 4.20 (Reasoning)92.0%$0.003512.9s46%
19Claude Haiku 4.582.0%$0.00111.6s23%
20Gemini 3 Flash (Preview)80.0%$0.00061.8s20%
21Claude Sonnet 4.6 (Reasoning)90.0%$0.00533.3s40%
22Mistral Large 380.0%$0.00044.1s20%
23Gemini 3.1 Flash Lite (Reasoning)78.0%$0.00031.4s17%
24GPT-4o, Aug. 6th (temp=1)82.0%$0.00222.3s23%
25DeepSeek-V2 Chat80.0%$0.00018.2s20%
26Gemini 3.1 Flash Lite76.0%$0.00031.1s15%
27Claude Sonnet 4.584.0%$0.00332.8s27%
28DeepSeek V4 Pro82.0%$0.000412.1s23%
29GPT-4.1 Mini76.0%$0.00032.7s15%
30Stealth: Healer Alpha76.0%$0.00005.0s15%
31Z.AI GLM 4.576.0%$0.00045.3s15%
32Mistral Small 3.2 24B72.0%$0.00012.1s10%
33DeepSeek V3 (2024-12-26)74.0%$0.00034.6s12%
34Llama 3.1 70B72.0%$0.00051.5s10%
35Writer: Palmyra X578.0%$0.00108.1s17%
36DeepSeek V3.176.0%$0.00028.4s15%
37Grok 4.2072.0%$0.00071.5s10%
38Gemini 3.1 Flash Lite (Preview)70.0%$0.0003979ms8%
39Qwen3 235B A22B Instruct 250772.0%$0.00014.9s10%
40Gemini 3.5 Flash (Reasoning, Minimal)74.0%$0.00181.2s12%
41Claude 3.5 Sonnet80.0%$0.00344.5s20%
42Gemini 2.5 Flash Lite (Reasoning)70.0%$0.00032.7s8%
43Gemma 4 31B (Reasoning)92.0%$0.000343.9s46%
44Gemini 2.5 Flash Lite66.0%$0.0001610ms5%
45Z.AI GLM 5 Turbo78.0%$0.00307.7s17%
46Hermes 3 70B68.0%$0.00026.0s7%
47DeepSeek V4 Flash (Reasoning)78.0%$0.000220.6s17%
48ByteDance Seed 1.6 Flash66.0%$0.00024.5s5%
49GPT-5.4 (Reasoning)80.0%$0.00495.4s20%
50Gemini 3 Flash (Preview, Reasoning)72.0%$0.00244.3s10%
51Xiaomi MIMO v2.570.0%$0.00136.2s8%
52Xiaomi MIMO v2.5 Pro72.0%$0.00157.9s10%
53Mistral Large 270.0%$0.00184.7s8%
54Cohere Command R+ (Aug. 2024)70.0%$0.00233.2s8%
55Mistral Small 460.0%$0.00011.3s2%
56DeepSeek V4 Flash62.0%$0.00013.7s3%
57Gemini 3.5 Flash (Reasoning)86.0%$0.00874.2s31%
58Qwen3.7 Max98.0%$0.01323.9s72%
59GPT-4o Mini (temp=0)60.0%$0.00013.7s2%
60Grok 4.20 (Beta)60.0%$0.0009735ms2%
61Grok 4.360.0%$0.00081.7s2%
62Claude Opus 4.7 (Reasoning)82.0%$0.00801.8s23%
63Qwen 3.5 122B88.0%$0.007317.5s35%
64Llama 3.1 8B54.0%$0.0001919ms0%
65GPT-5 Mini72.0%$0.001813.0s10%
66GPT-5.5 (Reasoning, Low)78.0%$0.00653.3s17%
67Qwen 3.6 35B72.0%$0.002212.2s10%
68GPT-5.4 Mini (Reasoning)64.0%$0.00184.1s4%
69Stealth: Hunter Alpha70.0%$0.000018.7s8%
70ByteDance Seed 2.0 Lite76.0%$0.001820.6s15%
71Claude Opus 4.780.0%$0.00802.2s20%
72Qwen 3.5 35B82.0%$0.005117.7s23%
73Grok 4.3 (Reasoning)80.0%$0.003820.2s20%
74Gemma 3 12B52.0%$0.00004.1s0%
75Z.AI GLM 4.7 Flash70.0%$0.000521.5s8%
76MiniMax M2.756.0%$0.00066.8s1%
77Qwen 3.6 Flash68.0%$0.003110.1s7%
78Arcee AI: Trinity Large (Preview)48.0%$0.00004.8s0%
79Claude Opus 4.570.0%$0.00552.9s8%
80Hermes 3 405B58.0%$0.000013.3s1%
81Mistral Small 4 (Reasoning)48.0%$0.00044.2s0%
82DeepSeek V3.250.0%$0.00026.7s0%
83GPT-4o Mini (temp=1)60.0%$0.000115.0s2%
84GPT-5.4 Mini (Reasoning, Low)48.0%$0.00102.2s0%
85Qwen 3.5 Flash70.0%$0.001222.0s8%
86Ministral 3 8B40.0%$0.00011.2s0%
87Rocinante 12B52.0%$0.00029.1s0%
88GPT-4.1 Nano40.0%$0.00012.0s0%
89GPT-5.4 Nano (Reasoning, Low)40.0%$0.00021.4s0%
90GPT-5.254.0%$0.00252.1s0%
91Mistral Small Creative38.0%$0.00011.2s0%
92Grok 4.20 (Beta, Reasoning)78.0%$0.00759.9s17%
93GPT-5.4 (Reasoning, Low)58.0%$0.00343.2s1%
94GPT-5.5 (Reasoning)76.0%$0.00834.6s15%
95GPT-5.164.0%$0.00416.9s4%
96MiniMax M2.548.0%$0.00067.7s0%
97GPT-5.4 Mini40.0%$0.00091.1s0%
98Claude 3 Haiku40.0%$0.00033.7s0%
99Inception Mercury34.0%$0.0001961ms0%
100Aion 2.066.0%$0.001520.9s5%
101Z.AI GLM 5.182.0%$0.004533.2s23%
102WizardLM 2 8x22b42.0%$0.00047.6s0%
103o4 Mini60.0%$0.00368.6s2%
104Nemotron 3 Super42.0%$0.00009.8s0%
105Gemma 3 4B30.0%$0.00001.9s0%
106GPT-5.4 Nano30.0%$0.00021.4s0%
107Gemini 2.5 Flash (Reasoning)38.0%$0.00142.7s0%
108GPT-5.560.0%$0.00582.3s2%
109Gemma 3 27B30.0%$0.00015.0s0%
110Inception Mercury 226.0%$0.0004823ms0%
111GPT-5.4 Nano (Reasoning)24.0%$0.00031.5s0%
112Qwen 3.6 27B76.0%$0.005625.4s15%
113Gemma 4 26B (Reasoning)60.0%$0.000327.8s2%
114Llama 3.1 Nemotron 70B30.0%$0.00026.9s0%
115Ministral 3 14B20.0%$0.00011.5s0%
116Mistral NeMO22.0%$0.00013.2s0%
117Z.AI GLM 4.5 Air36.0%$0.000511.1s0%
118Mistral Medium 3.122.0%$0.00043.2s0%
119Qwen 3.5 27B66.0%$0.004420.4s5%
120GPT-5.434.0%$0.00302.0s0%
121Qwen 2.5 72B20.0%$0.00024.9s0%
122Qwen 3 32B50.0%$0.000624.6s0%
123GPT-OSS 120B38.0%$0.000318.0s0%
124GPT-5 Nano44.0%$0.000721.1s0%
125Ministral 3B10.0%$0.00001.1s0%
126ByteDance Seed 2.0 Mini64.0%$0.000737.7s4%
127Claude Opus 486.0%$0.0177.0s31%
128Ministral 8B8.0%$0.00011.5s0%
129Qwen3.6 Max Preview96.0%$0.01446.9s61%
130Nemotron 3 Nano22.0%$0.000213.0s0%
131Qwen 3.5 Plus (2026-04-20)64.0%$0.004229.1s4%
132Ministral 3 3B0.0%$0.0001835ms0%
133Z.AI GLM 556.0%$0.003526.7s1%
134LFM2 24B0.0%$0.00002.9s0%
135Arcee AI: Trinity Mini8.0%$0.00018.1s0%
136Qwen 3.5 9B76.0%$0.00071.1m15%
137Gemini 3.1 Pro (Preview)74.0%$0.01312.9s12%
138MoonshotAI: Kimi K2.550.0%$0.004030.8s0%
139Z.AI GLM 4.654.0%$0.002539.8s0%
140Grok 468.0%$0.01217.7s7%
141Stealth: Aurora Alpha34.0%1.7s0%
142Mistral Large26.0%$0.00706.2s0%
143GPT-558.0%$0.01119.2s1%
144Gemini 3 Pro (Preview)68.0%$0.01610.7s7%
145Gemini 2.5 Pro40.0%$0.0118.4s0%
146DeepSeek V4 Pro (Reasoning)70.0%$0.00391.1m8%
147Qwen 3.5 397B A17B82.0%$0.00861.1m23%
148Z.AI GLM 4.760.0%$0.00271.1m2%
149o4 Mini High58.0%$0.005258.3s1%
150MoonshotAI: Kimi K2.650.0%$0.00661.1m0%
63.48%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100090.0%
Qwen 3.5 122B100100100100100100100100100090.0%
Gemma 4 26B (Reasoning)100100100100100100100100100090.0%
GPT-4.1100100100100100100100100100090.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100090.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100090.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100090.0%
GPT-4.1 Mini100100100100100100100100100090.0%
Gemma 4 31B (Reasoning)1001001001001001001001000080.0%
ByteDance Seed 1.61001001001001001001001000080.0%
Gemini 3 Flash (Preview, Reasoning)1001001001001001001001000080.0%
GPT-4o, May 13th (temp=0)1001001001001001001001000080.0%
DeepSeek V3.21001001001001001001001000080.0%
Z.AI GLM 5.110010010010010010010000070.0%
Aion 2.010010010010010010010000070.0%
DeepSeek V3.110010010010010010010000070.0%
Z.AI GLM 5 Turbo100100100100100100000060.0%
Qwen 3.5 397B A17B100100100100100100000060.0%
Grok 4.20 (Reasoning)100100100100100100000060.0%
Z.AI GLM 5100100100100100100000060.0%
Grok 4.1 Fast100100100100100100000060.0%
Grok 4100100100100100100000060.0%
Stealth: Hunter Alpha100100100100100100000060.0%
Gemini 2.5 Flash (Reasoning)100100100100100100000060.0%
Grok 4 Fast100100100100100100000060.0%
Stealth: Healer Alpha100100100100100100000060.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100000060.0%
Grok 4.3 (Reasoning)1001001001001000000050.0%
GPT-5.4 (Reasoning)1001001001001000000050.0%
Qwen 3.5 27B1001001001001000000050.0%
Claude Opus 4.51001001001001000000050.0%
Z.AI GLM 4.61001001001001000000050.0%
Xiaomi MIMO v2.51001001001001000000050.0%
DeepSeek V3 (2024-12-26)1001001001001000000050.0%
DeepSeek V4 Pro1001001001001000000050.0%
DeepSeek V3 (2025-03-24)1001001001001000000050.0%
Gemma 3 12B1001001001001000000050.0%
DeepSeek V4 Pro (Reasoning)10010010010000000040.0%
Qwen 3.6 35B10010010010000000040.0%
Z.AI GLM 4.710010010010000000040.0%
Gemini 2.5 Pro10010010010000000040.0%
Qwen 3.5 9B10010010010000000040.0%
Qwen 3.6 27B100100100000000030.0%
DeepSeek V4 Flash (Reasoning)100100100000000030.0%
MiniMax M2.5100100100000000030.0%
Qwen 3.5 35B100100100000000030.0%
Gemini 3.1 Flash Lite (Preview)100100100000000030.0%
Gemini 3.1 Flash Lite100100100000000030.0%
Z.AI GLM 4.7 Flash100100100000000030.0%
GPT-4o, Aug. 6th (temp=1)100100100000000030.0%
Grok 4.3100100100000000030.0%
GPT-5.11001000000000020.0%
MoonshotAI: Kimi K2.61001000000000020.0%
Qwen 3.5 Plus (2026-04-20)1001000000000020.0%
Grok 4.20 (Beta, Reasoning)1001000000000020.0%
GPT-5.4 (Reasoning, Low)1001000000000020.0%
ByteDance Seed 2.0 Mini1001000000000020.0%
Qwen 3.5 Flash1001000000000020.0%
Qwen3 235B A22B Instruct 25071001000000000020.0%
Arcee AI: Trinity Large (Preview)1001000000000020.0%
ByteDance Seed 1.6 Flash1001000000000020.0%
Cohere Command R+ (Aug. 2024)1001000000000020.0%
GPT-5.5 (Reasoning)10000000000010.0%
MoonshotAI: Kimi K2.510000000000010.0%
Qwen 3.6 Flash10000000000010.0%
GPT-5.4 Mini (Reasoning)10000000000010.0%
GPT-5.210000000000010.0%
MiniMax M2.710000000000010.0%
Gemini 3.1 Flash Lite (Reasoning)10000000000010.0%
Z.AI GLM 4.510000000000010.0%
Stealth: Aurora Alpha10000000000010.0%
Gemini 2.5 Flash Lite10000000000010.0%
Inception Mercury10000000000010.0%
GPT-5 Mini00000000000.0%
GPT-5.5 (Reasoning, Low)00000000000.0%
GPT-500000000000.0%
o4 Mini High00000000000.0%
GPT-5.500000000000.0%
o4 Mini00000000000.0%
GPT-OSS 120B00000000000.0%
GPT-5.4 Mini (Reasoning, Low)00000000000.0%
Mistral Large 300000000000.0%
DeepSeek-V2 Chat00000000000.0%
Nemotron 3 Super00000000000.0%
GPT-5.400000000000.0%
Claude 3.5 Sonnet00000000000.0%
Grok 4.20 (Beta)00000000000.0%
Inception Mercury 200000000000.0%
Z.AI GLM 4.5 Air00000000000.0%
Hermes 3 405B00000000000.0%
GPT-5 Nano00000000000.0%
GPT-5.4 Mini00000000000.0%
Mistral Small 4 (Reasoning)00000000000.0%
Qwen 3 32B00000000000.0%
DeepSeek V4 Flash00000000000.0%
Grok 4.2000000000000.0%
GPT-5.4 Nano (Reasoning)00000000000.0%
Mistral Large00000000000.0%
Writer: Palmyra X500000000000.0%
GPT-5.4 Nano (Reasoning, Low)00000000000.0%
GPT-4o Mini (temp=1)00000000000.0%
Mistral Small 3.2 24B00000000000.0%
Llama 3.1 70B00000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Gemma 3 27B00000000000.0%
Mistral Medium 3.100000000000.0%
Nemotron 3 Nano00000000000.0%
Mistral Small 400000000000.0%
Qwen 2.5 72B00000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
GPT-5.4 Nano00000000000.0%
Mistral Small Creative00000000000.0%
Hermes 3 70B00000000000.0%
Ministral 3 14B00000000000.0%
GPT-4.1 Nano00000000000.0%
Ministral 3 8B00000000000.0%
Claude 3 Haiku00000000000.0%
WizardLM 2 8x22b00000000000.0%
Arcee AI: Trinity Mini00000000000.0%
Gemma 3 4B00000000000.0%
Ministral 3 3B00000000000.0%
Mistral NeMO00000000000.0%
Ministral 8B00000000000.0%
Llama 3.1 8B00000000000.0%
Ministral 3B00000000000.0%
LFM2 24B00000000000.0%
Rocinante 12B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Qwen3.7 Max100100100100100100100100100090.0%
Qwen3.6 Max Preview100100100100100100100100100090.0%
Z.AI GLM 5.1100100100100100100100100100090.0%
Gemma 4 31B (Reasoning)100100100100100100100100100090.0%
ByteDance Seed 1.6100100100100100100100100100090.0%
GPT-4.1100100100100100100100100100090.0%
Stealth: Hunter Alpha100100100100100100100100100090.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100090.0%
Gemini 3.1 Flash Lite100100100100100100100100100090.0%
ByteDance Seed 2.0 Lite100100100100100100100100100090.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100090.0%
GPT-4.1 Mini100100100100100100100100100090.0%
DeepSeek V3.1100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
Gemini 2.5 Flash Lite100100100100100100100100100090.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100090.0%
Mistral Small 4100100100100100100100100100090.0%
Qwen 3.6 Flash1001001001001001001001000080.0%
Qwen 3.6 27B1001001001001001001001000080.0%
Grok 4.1 Fast1001001001001001001001000080.0%
Claude Sonnet 4.51001001001001001001001000080.0%
Qwen 3.5 35B1001001001001001001001000080.0%
Grok 4 Fast1001001001001001001001000080.0%
Qwen 3.5 9B1001001001001001001001000080.0%
DeepSeek V4 Pro1001001001001001001001000080.0%
Cohere Command R+ (Aug. 2024)1001001001001001001001000080.0%
GPT-5.5 (Reasoning)10010010010010010010000070.0%
GPT-5 Mini10010010010010010010000070.0%
Grok 4.20 (Beta, Reasoning)10010010010010010010000070.0%
DeepSeek V4 Flash (Reasoning)10010010010010010010000070.0%
Qwen 3.5 Flash10010010010010010010000070.0%
Z.AI GLM 4.7 Flash10010010010010010010000070.0%
Mistral Large 210010010010010010010000070.0%
Grok 4.310010010010010010010000070.0%
Llama 3.1 70B10010010010010010010000070.0%
ByteDance Seed 1.6 Flash10010010010010010010000070.0%
Qwen 3.5 122B100100100100100100000060.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100000060.0%
Aion 2.0100100100100100100000060.0%
MiniMax M2.7100100100100100100000060.0%
Qwen 3.6 35B100100100100100100000060.0%
Xiaomi MIMO v2.5 Pro100100100100100100000060.0%
ByteDance Seed 2.0 Mini100100100100100100000060.0%
Xiaomi MIMO v2.5100100100100100100000060.0%
Gemini 2.5 Flash100100100100100100000060.0%
Mistral Small 3.2 24B100100100100100100000060.0%
Hermes 3 70B100100100100100100000060.0%
Rocinante 12B100100100100100100000060.0%
Gemini 3.1 Pro (Preview)1001001001001000000050.0%
Gemini 3.5 Flash (Reasoning)1001001001001000000050.0%
Grok 4.3 (Reasoning)1001001001001000000050.0%
GPT-5.4 (Reasoning)1001001001001000000050.0%
Qwen 3.5 397B A17B1001001001001000000050.0%
Qwen 3.5 27B1001001001001000000050.0%
MiniMax M2.51001001001001000000050.0%
Stealth: Healer Alpha1001001001001000000050.0%
DeepSeek V3.21001001001001000000050.0%
Qwen 3 32B1001001001001000000050.0%
GPT-5.210010010010000000040.0%
GPT-5.510010010010000000040.0%
Gemini 2.5 Flash Lite (Reasoning)10010010010000000040.0%
Gemma 3 4B10010010010000000040.0%
Llama 3.1 8B10010010010000000040.0%
Z.AI GLM 5 Turbo100100100000000030.0%
Gemini 3 Flash (Preview, Reasoning)100100100000000030.0%
Z.AI GLM 4.6100100100000000030.0%
Gemini 3 Pro (Preview)100100100000000030.0%
Gemini 3 Flash (Preview)100100100000000030.0%
Mistral Small 4 (Reasoning)100100100000000030.0%
Gemma 4 26B (Reasoning)1001000000000020.0%
Z.AI GLM 51001000000000020.0%
GPT-5.4 Mini (Reasoning)1001000000000020.0%
o4 Mini High1001000000000020.0%
DeepSeek V4 Pro (Reasoning)1001000000000020.0%
Z.AI GLM 4.71001000000000020.0%
Gemini 2.5 Pro1001000000000020.0%
o4 Mini1001000000000020.0%
Gemini 3.5 Flash (Reasoning, Minimal)1001000000000020.0%
GPT-5.4 Mini (Reasoning, Low)1001000000000020.0%
Gemma 3 27B1001000000000020.0%
Arcee AI: Trinity Large (Preview)1001000000000020.0%
Ministral 3B1001000000000020.0%
Claude Opus 4.7 (Reasoning)10000000000010.0%
Grok 410000000000010.0%
Gemini 2.5 Flash (Reasoning)10000000000010.0%
Claude Haiku 4.510000000000010.0%
Nemotron 3 Super10000000000010.0%
Inception Mercury10000000000010.0%
GPT-5.4 Nano10000000000010.0%
WizardLM 2 8x22b10000000000010.0%
Arcee AI: Trinity Mini10000000000010.0%
Ministral 8B10000000000010.0%
GPT-5.100000000000.0%
MoonshotAI: Kimi K2.600000000000.0%
GPT-500000000000.0%
GPT-5.4 (Reasoning, Low)00000000000.0%
MoonshotAI: Kimi K2.500000000000.0%
Claude Opus 4.700000000000.0%
Claude Opus 4.500000000000.0%
GPT-OSS 120B00000000000.0%
GPT-5.400000000000.0%
Inception Mercury 200000000000.0%
Stealth: Aurora Alpha00000000000.0%
Z.AI GLM 4.5 Air00000000000.0%
GPT-5 Nano00000000000.0%
GPT-5.4 Mini00000000000.0%
GPT-5.4 Nano (Reasoning)00000000000.0%
Mistral Large00000000000.0%
GPT-5.4 Nano (Reasoning, Low)00000000000.0%
Gemma 3 12B00000000000.0%
Mistral Medium 3.100000000000.0%
Nemotron 3 Nano00000000000.0%
Qwen 2.5 72B00000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
Mistral Small Creative00000000000.0%
Ministral 3 14B00000000000.0%
GPT-4.1 Nano00000000000.0%
Ministral 3 8B00000000000.0%
Claude 3 Haiku00000000000.0%
Ministral 3 3B00000000000.0%
Mistral NeMO00000000000.0%
LFM2 24B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100090.0%
Gemma 4 31B (Reasoning)100100100100100100100100100090.0%
Qwen 3.5 122B100100100100100100100100100090.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100090.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100090.0%
Qwen 3.6 35B100100100100100100100100100090.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100090.0%
Grok 4 Fast100100100100100100100100100090.0%
Qwen 3.5 9B100100100100100100100100100090.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100090.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100090.0%
Xiaomi MIMO v2.5100100100100100100100100100090.0%
Hermes 3 405B100100100100100100100100100090.0%
DeepSeek V4 Pro100100100100100100100100100090.0%
DeepSeek V3.1100100100100100100100100100090.0%
Gemini 2.5 Flash Lite100100100100100100100100100090.0%
Grok 4.3100100100100100100100100100090.0%
Llama 3.1 70B100100100100100100100100100090.0%
ByteDance Seed 1.6 Flash100100100100100100100100100090.0%
Mistral Small Creative100100100100100100100100100090.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100090.0%
Z.AI GLM 5.11001001001001001001001000080.0%
Gemini 3.5 Flash (Reasoning)1001001001001001001001000080.0%
GPT-5.21001001001001001001001000080.0%
Grok 4.1 Fast1001001001001001001001000080.0%
Grok 41001001001001001001001000080.0%
Gemini 3.1 Flash Lite1001001001001001001001000080.0%
GPT-5.41001001001001001001001000080.0%
Mistral Large 21001001001001001001001000080.0%
Rocinante 12B1001001001001001001001000080.0%
Qwen 3.5 Plus (2026-04-20)10010010010010010010000070.0%
GPT-5.4 (Reasoning, Low)10010010010010010010000070.0%
Qwen 3.5 Flash10010010010010010010000070.0%
Gemini 3.1 Flash Lite (Preview)10010010010010010010000070.0%
Gemini 3 Flash (Preview)10010010010010010010000070.0%
DeepSeek V3 (2024-12-26)10010010010010010010000070.0%
Z.AI GLM 4.5 Air10010010010010010010000070.0%
Qwen 3 32B10010010010010010010000070.0%
GPT-5.4 Nano (Reasoning, Low)10010010010010010010000070.0%
MoonshotAI: Kimi K2.5100100100100100100000060.0%
Qwen 3.6 Flash100100100100100100000060.0%
Z.AI GLM 4.7 Flash100100100100100100000060.0%
ByteDance Seed 2.0 Lite100100100100100100000060.0%
Mistral Small 4 (Reasoning)100100100100100100000060.0%
Llama 3.1 Nemotron 70B100100100100100100000060.0%
Llama 3.1 8B100100100100100100000060.0%
Gemini 3.1 Pro (Preview)1001001001001000000050.0%
Aion 2.01001001001001000000050.0%
Gemini 3 Pro (Preview)1001001001001000000050.0%
MiniMax M2.51001001001001000000050.0%
Z.AI GLM 4.71001001001001000000050.0%
Xiaomi MIMO v2.5 Pro1001001001001000000050.0%
Gemini 3.5 Flash (Reasoning, Minimal)1001001001001000000050.0%
GPT-OSS 120B1001001001001000000050.0%
Gemma 4 26B1001001001001000000050.0%
Nemotron 3 Super1001001001001000000050.0%
DeepSeek V3.21001001001001000000050.0%
Arcee AI: Trinity Large (Preview)1001001001001000000050.0%
MoonshotAI: Kimi K2.610010010010000000040.0%
Qwen 3.5 27B10010010010000000040.0%
Stealth: Hunter Alpha10010010010000000040.0%
GPT-5 Nano10010010010000000040.0%
GPT-5.4 Nano (Reasoning)10010010010000000040.0%
Mistral NeMO10010010010000000040.0%
Gemma 4 26B (Reasoning)100100100000000030.0%
Z.AI GLM 5100100100000000030.0%
Z.AI GLM 4.6100100100000000030.0%
Gemini 2.5 Flash Lite (Reasoning)100100100000000030.0%
Stealth: Aurora Alpha100100100000000030.0%
Mistral Large100100100000000030.0%
Inception Mercury100100100000000030.0%
MiniMax M2.71001000000000020.0%
Mistral Medium 3.11001000000000020.0%
Ministral 3B1001000000000020.0%
Gemini 2.5 Pro10000000000010.0%
Inception Mercury 210000000000010.0%
Nemotron 3 Nano10000000000010.0%
GPT-5.4 Nano10000000000010.0%
Gemma 3 4B10000000000010.0%
Gemini 2.5 Flash (Reasoning)00000000000.0%
Gemma 3 27B00000000000.0%
Qwen 2.5 72B00000000000.0%
Ministral 3 14B00000000000.0%
Arcee AI: Trinity Mini00000000000.0%
Ministral 3 3B00000000000.0%
Ministral 8B00000000000.0%
LFM2 24B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100090.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100090.0%
MoonshotAI: Kimi K2.6100100100100100100100100100090.0%
GPT-5100100100100100100100100100090.0%
Qwen 3.5 27B100100100100100100100100100090.0%
Qwen 3.6 Flash100100100100100100100100100090.0%
GPT-5.2100100100100100100100100100090.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100090.0%
MiniMax M2.7100100100100100100100100100090.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100090.0%
Z.AI GLM 4.7100100100100100100100100100090.0%
Grok 4100100100100100100100100100090.0%
Qwen 3.5 Flash100100100100100100100100100090.0%
Grok 4 Fast100100100100100100100100100090.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100090.0%
Gemma 4 26B100100100100100100100100100090.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100090.0%
GPT-5.4100100100100100100100100100090.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100090.0%
DeepSeek V4 Pro100100100100100100100100100090.0%
Gemini 2.5 Flash100100100100100100100100100090.0%
Mistral Medium 3.1100100100100100100100100100090.0%
Llama 3.1 Nemotron 70B100100100100100100100100100090.0%
Z.AI GLM 5.11001001001001001001001000080.0%
MoonshotAI: Kimi K2.51001001001001001001001000080.0%
o4 Mini1001001001001001001001000080.0%
Qwen 3.5 Plus (2026-02-15)1001001001001001001001000080.0%
Stealth: Healer Alpha1001001001001001001001000080.0%
Gemini 3.1 Flash Lite1001001001001001001001000080.0%
Xiaomi MIMO v2.51001001001001001001001000080.0%
GPT-5 Nano1001001001001001001001000080.0%
DeepSeek V3.11001001001001001001001000080.0%
Qwen 3 32B1001001001001001001001000080.0%
GPT-5.4 Nano (Reasoning)1001001001001001001001000080.0%
ByteDance Seed 1.6 Flash1001001001001001001001000080.0%
Llama 3.1 8B1001001001001001001001000080.0%
Gemini 3.1 Pro (Preview)10010010010010010010000070.0%
Qwen 3.5 Plus (2026-04-20)10010010010010010010000070.0%
Z.AI GLM 510010010010010010010000070.0%
Gemini 3 Flash (Preview, Reasoning)10010010010010010010000070.0%
o4 Mini High10010010010010010010000070.0%
Qwen 3.6 27B10010010010010010010000070.0%
Qwen 3.6 35B10010010010010010010000070.0%
Xiaomi MIMO v2.5 Pro10010010010010010010000070.0%
Stealth: Hunter Alpha10010010010010010010000070.0%
Qwen 3.5 9B10010010010010010010000070.0%
GPT-4o, May 13th (temp=1)10010010010010010010000070.0%
Z.AI GLM 4.5 Air10010010010010010010000070.0%
Mistral Small 4 (Reasoning)10010010010010010010000070.0%
DeepSeek V3.210010010010010010010000070.0%
DeepSeek V4 Flash10010010010010010010000070.0%
Gemini 2.5 Flash Lite10010010010010010010000070.0%
Rocinante 12B10010010010010010010000070.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100000060.0%
Gemma 4 26B (Reasoning)100100100100100100000060.0%
Aion 2.0100100100100100100000060.0%
Z.AI GLM 4.6100100100100100100000060.0%
Gemini 3 Pro (Preview)100100100100100100000060.0%
Mistral NeMO100100100100100100000060.0%
Nemotron 3 Super1001001001001000000050.0%
GPT-OSS 120B10010010010000000040.0%
Gemini 2.5 Pro100100100000000030.0%
ByteDance Seed 2.0 Lite100100100000000030.0%
Stealth: Aurora Alpha100100100000000030.0%
Gemma 3 27B100100100000000030.0%
Ministral 8B100100100000000030.0%
Gemini 2.5 Flash (Reasoning)1001000000000020.0%
Inception Mercury 21001000000000020.0%
Inception Mercury1001000000000020.0%
MiniMax M2.510000000000010.0%
Nemotron 3 Nano10000000000010.0%
Ministral 3B10000000000010.0%
Qwen 2.5 72B00000000000.0%
Arcee AI: Trinity Mini00000000000.0%
Ministral 3 3B00000000000.0%
LFM2 24B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100090.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100090.0%
Aion 2.0100100100100100100100100100090.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100090.0%
Stealth: Hunter Alpha100100100100100100100100100090.0%
Stealth: Healer Alpha100100100100100100100100100090.0%
Z.AI GLM 4.7 Flash100100100100100100100100100090.0%
Gemini 2.5 Flash100100100100100100100100100090.0%
Writer: Palmyra X5100100100100100100100100100090.0%
Nemotron 3 Nano100100100100100100100100100090.0%
Llama 3.1 8B100100100100100100100100100090.0%
GPT-4o, Aug. 6th (temp=1)1001001001001001001001000080.0%
Mistral Small 4 (Reasoning)1001001001001001001001000080.0%
DeepSeek V3 (2025-03-24)1001001001001001001001000080.0%
Hermes 3 70B1001001001001001001001000080.0%
Z.AI GLM 4.510010010010010010010000070.0%
Xiaomi MIMO v2.510010010010010010010000070.0%
Gemini 2.5 Flash Lite10010010010010010010000070.0%
ByteDance Seed 1.6 Flash10010010010010010010000070.0%
GPT-5.5100100100100100100000060.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100000060.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100000060.0%
DeepSeek V3 (2024-12-26)100100100100100100000060.0%
Grok 4.20100100100100100100000060.0%
Cohere Command R+ (Aug. 2024)100100100100100100000060.0%
GPT-5.21001001001001000000050.0%
DeepSeek V3.11001001001001000000050.0%
Qwen 3 32B1001001001001000000050.0%
Qwen3 235B A22B Instruct 25071001001001001000000050.0%
Arcee AI: Trinity Large (Preview)1001001001001000000050.0%
Rocinante 12B1001001001001000000050.0%
Claude Sonnet 4.510010010010000000040.0%
ByteDance Seed 2.0 Mini10010010010000000040.0%
Z.AI GLM 4.5 Air10010010010000000040.0%
DeepSeek V4 Flash10010010010000000040.0%
Claude Opus 4100100100000000030.0%
GPT-5.4 Mini (Reasoning, Low)100100100000000030.0%
GPT-5.4 Nano (Reasoning, Low)100100100000000030.0%
GPT-5.4 Nano100100100000000030.0%
Arcee AI: Trinity Mini100100100000000030.0%
Grok 4.310000000000010.0%
Gemma 3 12B10000000000010.0%
Mistral Small 410000000000010.0%
Mistral NeMO10000000000010.0%
GPT-5.400000000000.0%
Grok 4.20 (Beta)00000000000.0%
GPT-4.1 Mini00000000000.0%
Hermes 3 405B00000000000.0%
GPT-5.4 Mini00000000000.0%
Mistral Large 200000000000.0%
DeepSeek V3.200000000000.0%
GPT-5.4 Nano (Reasoning)00000000000.0%
Mistral Large00000000000.0%
GPT-4o Mini (temp=1)00000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Mistral Medium 3.100000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
Mistral Small Creative00000000000.0%
Ministral 3 14B00000000000.0%
GPT-4.1 Nano00000000000.0%
Ministral 3 8B00000000000.0%
Claude 3 Haiku00000000000.0%
WizardLM 2 8x22b00000000000.0%
Gemma 3 4B00000000000.0%
Ministral 3 3B00000000000.0%
Ministral 8B00000000000.0%
Ministral 3B00000000000.0%
LFM2 24B00000000000.0%