Matches word count

Test: Write N of X

Avg. Score
75.2%
Scenarios
5

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Inception Mercury99.8%$0.00022.1s98%
2Inception Mercury 299.8%$0.00091.2s98%
3GPT-5.4 Nano (Reasoning)99.9%$0.00104.4s99%
4GPT-5.4 Mini (Reasoning)99.9%$0.00313.8s99%
5Nemotron 3 Super100.0%$0.000015.3s100%
6GPT-5 Nano100.0%$0.000819.2s100%
7Stealth: Aurora Alpha99.5%2.4s94%
8GPT-5.199.8%$0.00739.2s99%
9GPT-5 Mini99.2%$0.002411.6s93%
10GPT-OSS 120B99.8%$0.000433.4s98%
11GPT-5.299.7%$0.00879.6s98%
12o4 Mini99.5%$0.006914.8s96%
13GPT-5.4 (Reasoning)99.7%$0.009811.2s98%
14Qwen 3.6 35B100.0%$0.006837.0s100%
15Z.AI GLM 5 Turbo100.0%$0.01125.2s100%
16Qwen 3.5 Flash100.0%$0.003654.0s100%
17Grok 4.3 (Reasoning)100.0%$0.009239.1s100%
18GPT-5.5 (Reasoning)99.9%$0.0217.7s99%
19GPT-5.5 (Reasoning, Low)99.5%$0.0196.8s94%
20GPT-598.4%$0.01318.1s91%
21Gemini 3 Flash (Preview, Reasoning)100.0%$0.01828.6s100%
22o4 Mini High99.4%$0.01036.6s94%
23GPT-4.194.6%$0.00202.5s67%
24ByteDance Seed 1.699.1%$0.00611.0m93%
25Claude Opus 4.796.6%$0.0173.3s83%
26Qwen 3.5 35B100.0%$0.01749.8s100%
27Claude Opus 4.7 (Reasoning)96.7%$0.0173.8s83%
28Gemini 3.1 Flash Lite92.7%$0.00051.1s62%
29Gemini 3 Flash (Preview)93.0%$0.00111.5s58%
30Qwen 3.6 Flash98.0%$0.008324.3s72%
31GPT-4o, Aug. 6th (temp=0)94.8%$0.00482.2s61%
32Gemini 3.1 Flash Lite (Reasoning)92.1%$0.00051.2s57%
33GPT-4o, Aug. 6th (temp=1)93.3%$0.00482.0s60%
34Claude Sonnet 4.6 (Reasoning)100.0%$0.03519.6s100%
35Gemini 3.5 Flash (Reasoning)100.0%$0.03916.0s100%
36Qwen 3.5 27B100.0%$0.0191.3m100%
37GPT-5.4 Mini (Reasoning, Low)92.1%$0.00182.4s51%
38GPT-5.593.6%$0.00863.6s59%
39Z.AI GLM 5100.0%$0.0161.6m100%
40Gemini 3.1 Flash Lite (Preview)89.7%$0.00051.1s48%
41MoonshotAI: Kimi K2.599.9%$0.0162.0m99%
42Gemini 3 Pro (Preview)100.0%$0.04628.1s100%
43MoonshotAI: Kimi K2.699.9%$0.0152.0m99%
44GPT-5.4 (Reasoning, Low)87.4%$0.00433.6s46%
45Qwen3.7 Max100.0%$0.03759.2s100%
46Claude Opus 4.6 (Reasoning)100.0%$0.05019.6s100%
47ByteDance Seed 2.0 Lite95.3%$0.00551.0m61%
48GPT-4o Mini (temp=1)90.9%$0.000342.7s50%
49Nemotron 3 Nano90.0%$0.000422.7s40%
50GPT-5.4 Nano (Reasoning, Low)86.2%$0.00072.9s35%
51Z.AI GLM 4.7 Flash96.8%$0.00261.8m69%
52Qwen 3.5 Plus (2026-04-20)98.0%$0.0141.4m72%
53Z.AI GLM 5.1100.0%$0.0222.1m100%
54MiniMax M2.788.3%$0.002227.8s38%
55Qwen 3.5 9B98.0%$0.00162.2m72%
56GPT-5.481.9%$0.00282.9s33%
57Z.AI GLM 4.7100.0%$0.00983.0m100%
58GPT-4o Mini (temp=0)84.1%$0.000310.7s30%
59GPT-4.1 Nano78.7%$0.00012.2s30%
60Qwen3.6 Max Preview100.0%$0.0341.9m100%
61Qwen 3.5 122B97.5%$0.0271.1m71%
62GPT-4.1 Mini79.9%$0.00042.3s27%
63Gemini 2.5 Pro86.0%$0.01310.9s40%
64GPT-5.4 Mini79.5%$0.00091.1s25%
65Gemma 4 31B (Reasoning)100.0%$0.00193.7m100%
66Gemini 3.5 Flash (Reasoning, Minimal)81.3%$0.00331.4s26%
67Grok 4 Fast77.8%$0.00043.4s27%
68Grok 483.2%$0.01013.3s36%
69Gemma 4 31B80.2%$0.000313.2s24%
70Gemma 4 26B79.1%$0.00026.5s22%
71Gemma 4 26B (Reasoning)100.0%$0.00273.9m100%
72Qwen 3.6 27B95.9%$0.0201.4m61%
73GPT-4o, May 13th (temp=1)82.3%$0.009117.6s32%
74Stealth: Healer Alpha78.4%$0.000015.6s21%
75Claude Opus 4.581.5%$0.0124.4s28%
76Claude Opus 4.680.4%$0.0126.0s30%
77Grok 4.20 (Beta, Reasoning)86.4%$0.01911.3s35%
78DeepSeek V3 (2024-12-26)72.2%$0.00063.8s21%
79GPT-5.4 Nano71.1%$0.00041.4s20%
80Xiaomi MIMO v2.572.9%$0.00167.3s21%
81Claude Sonnet 4.574.1%$0.00704.1s25%
82Grok 4.2073.2%$0.00181.8s18%
83Gemini 2.5 Flash Lite (Reasoning)70.3%$0.00054.5s19%
84DeepSeek V3 (2025-03-24)71.5%$0.00056.5s17%
85Mistral Large 368.3%$0.00102.7s18%
86Claude Sonnet 4.674.8%$0.00713.5s20%
87Mistral Medium 3.168.1%$0.00092.9s16%
88Stealth: Hunter Alpha69.8%$0.000015.6s17%
89Gemini 2.5 Flash Lite66.9%$0.0002711ms14%
90GPT-4o, May 13th (temp=0)78.7%$0.009118.7s21%
91DeepSeek V4 Flash66.8%$0.00014.7s14%
92DeepSeek V4 Pro (Reasoning)91.1%$0.00752.1m48%
93Gemini 3.1 Pro (Preview)100.0%$0.0751.0m100%
94Claude 3.7 Sonnet70.3%$0.00694.8s16%
95ByteDance Seed 2.0 Mini96.5%$0.00313.3m66%
96Grok 4.20 (Reasoning)78.7%$0.009128.1s20%
97Grok 4.1 Fast66.5%$0.00076.3s11%
98Grok 4.20 (Beta)61.2%$0.0015863ms15%
99MiniMax M2.570.4%$0.001217.6s12%
100Ministral 3 14B63.0%$0.00031.3s12%
101Aion 2.074.2%$0.003534.4s16%
102Claude Sonnet 467.6%$0.00703.8s14%
103Z.AI GLM 4.561.3%$0.00063.7s12%
104Qwen 3.5 397B A17B100.0%$0.0303.5m100%
105Ministral 3 8B59.0%$0.00031.0s11%
106Gemma 3 27B58.9%$0.00023.8s11%
107Z.AI GLM 4.672.5%$0.003648.5s19%
108Hermes 3 405B60.3%$0.000011.0s8%
109Claude 3.5 Sonnet62.7%$0.006710.3s13%
110Gemma 3 12B56.2%$0.00013.1s6%
111DeepSeek V4 Pro57.9%$0.00128.9s8%
112DeepSeek V4 Flash (Reasoning)56.0%$0.00014.9s7%
113DeepSeek-V2 Chat55.9%$0.00038.1s8%
114Mistral Small 4 (Reasoning)58.9%$0.001414.4s8%
115Qwen 3.5 Plus (2026-02-15)57.6%$0.00096.6s5%
116DeepSeek V3.154.5%$0.00048.5s8%
117Xiaomi MIMO v2.5 Pro56.1%$0.00147.5s7%
118Llama 3.1 8B53.0%$0.0003694ms6%
119LFM2 24B52.1%$0.00013.6s7%
120Gemini 2.5 Flash (Reasoning)61.7%$0.006612.0s8%
121Ministral 3 3B52.1%$0.0002716ms4%
122Mistral Small 3.2 24B51.3%$0.00022.3s5%
123Llama 3.1 Nemotron 70B50.2%$0.00063.3s7%
124Arcee AI: Trinity Large (Preview)50.4%$0.00003.1s4%
125Qwen 2.5 72B49.7%$0.00073.1s5%
126Z.AI GLM 4.5 Air50.8%$0.00069.4s6%
127Mistral Small 448.9%$0.00031.5s3%
128Llama 3.1 70B45.7%$0.00151.3s6%
129Claude Haiku 4.548.5%$0.00232.3s3%
130Qwen 3 32B46.7%$0.000410.7s4%
131Claude Opus 476.9%$0.03521.6s22%
132Gemini 2.5 Flash41.3%$0.00051.1s1%
133DeepSeek V3.242.3%$0.00056.1s1%
134Cohere Command R+ (Aug. 2024)44.9%$0.00512.7s3%
135Mistral Small Creative39.2%$0.0002927ms0%
136ByteDance Seed 1.6 Flash46.6%$0.001017.0s0%
137Qwen3 235B A22B Instruct 250739.6%$0.00025.2s0%
138Grok 4.340.4%$0.00201.7s0%
139Arcee AI: Trinity Mini31.3%$0.00011.7s0%
140Claude 3 Haiku33.5%$0.000517.9s0%
141Writer: Palmyra X530.3%$0.00188.3s0%
142Mistral NeMO22.0%$0.00032.4s0%
143Ministral 8B18.8%$0.00021.4s0%
144Mistral Large 223.5%$0.00423.0s0%
145Hermes 3 70B17.3%$0.00074.8s0%
146Ministral 3B14.1%$0.0001710ms0%
147WizardLM 2 8x22b18.5%$0.001519.8s0%
148Gemma 3 4B3.8%$0.00011.9s0%
149Rocinante 12B5.4%$0.000617.7s0%
150Mistral Large21.9%$0.01729.8s0%
75.23%

Individual Scenarios

words

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Grok 410010010010010010010010010010099.9%
Claude Sonnet 4.510010010010010010010010010010099.9%
Gemma 3 12B10010010010010010010010010010099.9%
Claude Opus 410010010010010010010010010010099.9%
Claude Opus 4.610010010010010010010010010010099.9%
GPT-4.1 Mini10010010010010010010010010010099.9%
Ministral 3 14B10010010010010010010010010010099.9%
Gemma 3 27B10010010010010010010010010010099.9%
MiniMax M2.51001001001001001001001001009899.8%
Stealth: Healer Alpha1001001001001001001001001009899.8%
Gemini 2.5 Flash Lite (Reasoning)1001001001001001001001001009899.8%
GPT-4.1 Nano1001001001001001001001001009899.8%
Xiaomi MIMO v2.5 Pro1001001001001001001001001009899.8%
Claude Haiku 4.51001001001001001001001001009899.8%
Qwen 3.5 Plus (2026-02-15)1001001001001001001001001009899.8%
Cohere Command R+ (Aug. 2024)1001001001001001001001001009899.8%
Mistral Medium 3.11001001001001001001001001009899.8%
Mistral Small 41001001001001001001001001009899.8%
DeepSeek V4 Flash (Reasoning)100100100100100100100100989899.7%
Mistral Small 4 (Reasoning)100100100100100100100100989899.7%
Mistral Small Creative100100100100100100100100989899.7%
GPT-4o, May 13th (temp=1)100100100100100100100100989899.6%
Stealth: Hunter Alpha100100100100100100100100989899.6%
Mistral Small 3.2 24B100100100100100100100100989899.6%
DeepSeek-V2 Chat100100100100100100100100989899.6%
Xiaomi MIMO v2.510010010010010010010098989899.5%
Aion 2.010010010010010010010098989899.5%
DeepSeek V4 Flash10010010010010010010098989899.5%
GPT-4o, May 13th (temp=0)10010010010010010010098989899.5%
Grok 4.31001001001001001009898989899.3%
Z.AI GLM 4.51001001001001001009898989899.3%
Claude Opus 4.51001001001001001009898989899.3%
DeepSeek V3.1100100100100100989898989899.2%
Mistral Large 3100100100100100989898989899.2%
DeepSeek V3 (2025-03-24)100100100100100100100100989299.0%
LFM2 24B10098989898989898989898.6%
Ministral 3 8B100100989898989898989298.1%
Claude Sonnet 4.61001001001001001009898929298.1%
DeepSeek V3.2100100100100100989898929297.9%
Llama 3.1 8B100100100100100989898929297.9%
DeepSeek V3 (2024-12-26)10010010010098989898929297.8%
Ministral 3 3B10010010010098989898987796.9%
Writer: Palmyra X51001001001001001009898927796.6%
Claude 3 Haiku1001001009898989892927795.5%
Gemini 2.5 Flash100100100100100989292927795.2%
Hermes 3 405B10010010010010010010098985495.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100925494.5%
Z.AI GLM 4.5 Air9898989898989292925492.1%
Arcee AI: Trinity Mini9292929292929292927790.7%
Qwen 2.5 72B10098989898989292775490.7%
Qwen 3 32B10010010010098989292922790.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100090.0%
Llama 3.1 Nemotron 70B9898989292929277775487.2%
Arcee AI: Trinity Large (Preview)1001001001001001001009854085.2%
Llama 3.1 70B9892929292927777775484.5%
Grok 4.20 (Reasoning)10010010010010010010000070.0%
WizardLM 2 8x22b1001001009898775420062.9%
Mistral NeMO10098987777772700055.6%
Hermes 3 70B100989892920000048.1%
Mistral Large10010010077779200046.5%
Ministral 8B9898925490000035.2%
Ministral 3B98989227272000034.5%
Mistral Large 298982200000020.1%
Rocinante 12B98770000000017.6%
Gemma 3 4B5427200000008.3%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Gemma 4 31B10010010010010010010010010010099.9%
Gemini 3.1 Flash Lite (Preview)10010010010010010010010010010099.9%
Gemini 3.5 Flash (Reasoning, Minimal)10010010010010010010010010010099.9%
Gemini 2.5 Pro10010010010010010010010010010099.9%
GPT-4o, May 13th (temp=0)10010010010010010010010010010099.9%
Grok 41001001001001001001001001009899.8%
Grok 4.20 (Beta, Reasoning)1001001001001001001001001009899.8%
Gemini 2.5 Flash (Reasoning)1001001001001001001001001009899.8%
Claude Opus 4.71001001001001001001001001009899.8%
Gemini 2.5 Flash Lite (Reasoning)1001001001001001001001001009899.8%
GPT-4o, May 13th (temp=1)1001001001001001001001001009899.8%
DeepSeek V3 (2025-03-24)1001001001001001001001001009899.8%
Ministral 3 3B1001001001001001001001001009899.8%
GPT-5.51001001001001001001001001009899.8%
Claude Sonnet 4.61001001001001001001001001009899.8%
GPT-4.1 Nano1001001001001001001001001009899.8%
GPT-5.4100100100100100100100100989899.7%
Mistral Small 4 (Reasoning)100100100100100100100100989899.7%
Grok 4.1 Fast10010010010010010010098989899.5%
Qwen 3.5 Plus (2026-02-15)10010010010010010010098989899.5%
Z.AI GLM 4.510010010010010010010098989899.5%
GPT-4.110010010010010010010098989899.5%
GPT-4.1 Mini10010010010010010010098989899.5%
Claude 3.7 Sonnet10010010010010010010098989899.5%
GPT-4o, Aug. 6th (temp=1)1001001001001001009898989899.3%
ByteDance Seed 1.6 Flash1001001001001001001001001009299.2%
Stealth: Healer Alpha1001001001001001001001001009299.2%
Grok 4 Fast1001001001001001001001001009299.2%
Claude Opus 4100100100100100989898989899.2%
Claude Opus 4.7 (Reasoning)100100100100100100100100989299.0%
DeepSeek V3 (2024-12-26)100100100100100100100100989299.0%
GPT-5.4 Nano100100100100100100100100989299.0%
MiniMax M2.510010010010010010010098989298.9%
GPT-5.4 Mini10010010010010010010098989298.9%
Grok 4.2010010010010010010010098989298.8%
Claude 3.5 Sonnet1001001001001001009898989298.7%
Xiaomi MIMO v2.5100100100100100989898989298.6%
DeepSeek V4 Flash (Reasoning)100100100100100989898989298.5%
GPT-4o, Aug. 6th (temp=0)9898989898989898989898.4%
Qwen 2.5 72B100100100100100100100100929298.4%
DeepSeek-V2 Chat100100100100100100100100929298.4%
Z.AI GLM 4.61001001009898989898989298.2%
Stealth: Hunter Alpha100100100100100989898929297.9%
Gemini 2.5 Flash Lite100100100100100989898929297.9%
Claude Sonnet 41001001009898989898929297.6%
Xiaomi MIMO v2.5 Pro1001001001001001009898987797.2%
Aion 2.0100100100100100100100100927797.0%
Gemma 3 12B100100100100100989898927796.4%
Ministral 3 8B100100989898989292927794.8%
Claude Opus 4.610098989898989292927794.6%
DeepSeek V4 Flash100100100100100989898925494.0%
Claude Opus 4.510010010010098989898925493.9%
Mistral Small 3.2 24B10098989898989277777791.6%
Nemotron 3 Nano100100100100100100100100100290.2%
Qwen 3.5 122B100100100100100100100100100090.0%
Cohere Command R+ (Aug. 2024)10098989898927777777789.5%
DeepSeek V4 Pro10098929292929277777789.2%
Hermes 3 405B100100100100100100989292088.3%
Mistral Medium 3.1100100989892929277775488.2%
Mistral Small 410010010010098927777775487.6%
Claude Sonnet 4.510098989292929277775487.4%
DeepSeek V3.11001001009898929277542783.9%
Ministral 3 14B10098927777777777777783.3%
Llama 3.1 8B100100100100100989277272782.3%
Arcee AI: Trinity Large (Preview)1001001001009898929227080.9%
Grok 4.3100100989292929277272779.9%
Grok 4.20 (Reasoning)100100100100100100100980079.8%
Gemma 3 27B10098989292927777272778.3%
Mistral Small Creative10077777777777777775477.3%
Qwen 3 32B100100100100929292779977.2%
Qwen3 235B A22B Instruct 250710010098929277775454274.6%
Claude 3 Haiku1001001001009877545427971.9%
Grok 4.20 (Beta)10010010098927754549969.3%
Claude Haiku 4.5100100100987754542727964.6%
Gemini 2.5 Flash989292777777542727963.2%
Z.AI GLM 4.5 Air1001009877775427279958.0%
Llama 3.1 Nemotron 70B10010098989227992053.6%
DeepSeek V3.2100989277542220042.7%
Writer: Palmyra X592927754279000035.2%
WizardLM 2 8x22b1009898000000029.7%
Ministral 3B929254000000023.8%
Hermes 3 70B77540000000013.1%
Arcee AI: Trinity Mini77540000000013.1%
Ministral 8B10020000000010.2%
Mistral NeMO920000000009.2%
Rocinante 12B920000000009.2%
Mistral Large549900000007.2%
LFM2 24B542222220006.4%
Llama 3.1 70B270000000002.8%
Gemma 3 4B00000000000.0%
Mistral Large 200000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)10010010010010010010010010010099.9%
GPT-4o, May 13th (temp=0)10010010010010010010010010010099.9%
GPT-5.5 (Reasoning)1001001001001001001001001009899.8%
GPT-5.4 Mini (Reasoning)1001001001001001001001001009899.8%
GPT-5.4 Nano (Reasoning)1001001001001001001001001009899.8%
GPT-5.11001001001001001001001001009899.8%
GPT-5.4 (Reasoning)1001001001001001001001001009899.8%
GPT-4o, Aug. 6th (temp=0)1001001001001001001001001009899.8%
Gemini 2.5 Pro100100100100100100100100989899.7%
GPT-5.5100100100100100100100100989899.6%
MiniMax M2.710010010010010010010098989899.5%
GPT-4o Mini (temp=1)1001001001001001001001001009299.2%
Claude Opus 4.7 (Reasoning)10010010010010010010098989298.9%
Gemma 4 26B10010010010010010010098989298.9%
Grok 4100100100100100100100100929298.4%
Claude Opus 4.5100100100100100100100100929298.4%
GPT-4.1 Mini1001001001001001009898929298.1%
GPT-5.41001001001001001009898929298.1%
Claude Opus 4100100100100100989898929297.9%
Stealth: Healer Alpha100100100100100100100100987797.6%
GPT-4o, May 13th (temp=1)1001001001001001009892929297.5%
GPT-5.4 (Reasoning, Low)1001001001001001009898987797.2%
Claude Opus 4.6100100100100100989898987797.1%
GPT-4.11001001001001001009898927796.6%
GPT-4o, Aug. 6th (temp=1)1001001001001001009892927796.0%
GPT-5.4 Mini10010010010098989892927795.7%
Mistral Medium 3.110010010010098989292927795.1%
Stealth: Hunter Alpha10010010010010010010092777794.7%
Grok 4.20 (Reasoning)1001001001001001001001001002792.7%
DeepSeek V4 Pro (Reasoning)1001001001001001001001001002792.7%
GPT-4.1 Nano1001001009898989292925492.5%
Claude Opus 4.710010010010098989898775492.4%
Grok 4 Fast1001001009898989892775491.7%
Xiaomi MIMO v2.5100100100100100929277777791.6%
Z.AI GLM 4.61001001001001001009877775490.7%
Qwen 3.6 27B100100100100100100100100100090.0%
Grok 4.1 Fast100100100100100100989892289.1%
Aion 2.010010010010010010010010077087.7%
GPT-5.4 Nano1001001009898927777775487.5%
Gemma 3 27B100100100989898927777985.1%
Gemini 2.5 Flash Lite10010098989892929277285.1%
Gemini 2.5 Flash Lite (Reasoning)1001001001009892927777984.7%
Claude Sonnet 4.6100100100989892927777984.5%
MiniMax M2.510010098989292929277084.3%
Llama 3.1 Nemotron 70B100100100989892927777283.7%
DeepSeek V4 Flash10010010010010098987754983.6%
Qwen 3.5 Plus (2026-02-15)100100100989898777754280.5%
Claude Sonnet 4.510010010010010098777727078.0%
Grok 4.201001001001001009892772277.1%
Grok 4.20 (Beta)100100989892777754272775.2%
DeepSeek V4 Pro10010010098989277549273.1%
Llama 3.1 70B100100989892775427272770.2%
Nemotron 3 Nano10010010010010010010000070.0%
Claude 3.7 Sonnet989898929292542727968.9%
Z.AI GLM 4.51009898989254545427968.4%
DeepSeek V3 (2025-03-24)100989892927754542066.7%
Ministral 3 14B1001001009292777722064.2%
Claude 3.5 Sonnet10010098927754542727963.9%
Xiaomi MIMO v2.5 Pro10010098927754542727063.0%
Mistral Small 4 (Reasoning)1001001001001005427272061.0%
Hermes 3 405B10098989892772792060.3%
DeepSeek V3 (2024-12-26)1001009892925427279060.0%
Z.AI GLM 4.5 Air1001001001009877992059.6%
Claude Haiku 4.5100100989877772792059.0%
Mistral Small 3.2 24B10010010098982727279058.8%
Gemini 2.5 Flash (Reasoning)1001001009292542722056.9%
Claude Sonnet 498929277545427272052.4%
DeepSeek V3.11007754545454542727950.9%
DeepSeek V4 Flash (Reasoning)98929277542727279250.7%
Qwen 3 32B100100777754542720049.1%
LFM2 24B100989277779992247.6%
Mistral NeMO10010092775427200045.2%
Llama 3.1 8B100100775454272700043.9%
Mistral Large 310010010027272727270043.7%
Mistral Large 2989277775427000042.6%
DeepSeek-V2 Chat9292545454542700042.6%
Ministral 8B100989277270000039.6%
DeepSeek V3.210098772799999034.9%
Ministral 3 8B10098545492000031.6%
ByteDance Seed 1.6 Flash100100100000000030.0%
Arcee AI: Trinity Mini9877772790000029.0%
Arcee AI: Trinity Large (Preview)1009292000000028.4%
Mistral Small 41009854992000027.2%
Qwen 2.5 72B9277542792200026.3%
Grok 4.31009827200000022.7%
Ministral 3 3B985454220000020.9%
Qwen3 235B A22B Instruct 2507540000000005.4%
Gemma 3 12B279200000003.8%
Mistral Large272000000002.9%
Ministral 3B99220000002.1%
Gemma 3 4B90000000000.9%
Writer: Palmyra X500000000000.0%
Gemini 2.5 Flash00000000000.0%
Cohere Command R+ (Aug. 2024)00000000000.0%
Rocinante 12B00000000000.0%
Hermes 3 70B00000000000.0%
Mistral Small Creative00000000000.0%
Claude 3 Haiku00000000000.0%
WizardLM 2 8x22b00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
ByteDance Seed 1.610010010010010010010010010010099.9%
Nemotron 3 Super1001001001001001001001001009899.8%
GPT-5.4 Mini (Reasoning)1001001001001001001001001009899.8%
GPT-5 Nano1001001001001001001001001009899.8%
GPT-5 Mini1001001001001001001001001009899.8%
GPT-4o Mini (temp=0)1001001001001001001001001009899.8%
GPT-5.4 (Reasoning)100100100100100100100100989899.7%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100989899.6%
o4 Mini High10010010010010010010098989899.5%
GPT-5.110010010010010010010098989899.5%
GPT-4o, Aug. 6th (temp=1)10010010010010010010098989899.5%
GPT-5.21001001001001001009898989899.4%
GPT-5.51001001001001001009898989899.3%
Gemini 3 Flash (Preview)1001001001001001009898989899.3%
Inception Mercury 21001001001001001001001001009299.2%
Inception Mercury1001001001001001001001001009299.2%
Grok 4.20 (Beta, Reasoning)100100100100100100100100989299.1%
GPT-4.110010010010010010010098989298.9%
Claude Opus 4.71001001001001001009898989298.7%
Z.AI GLM 4.7 Flash100100100100100100100100929298.4%
Gemini 3.1 Flash Lite (Reasoning)1001001009898989898989298.2%
GPT-4o Mini (temp=1)100100100100100989898929297.9%
GPT-5.5 (Reasoning, Low)1001001001001001001001001007797.7%
Gemini 3.1 Flash Lite1001001009898989898929297.6%
Claude Opus 4.7 (Reasoning)1001001001001001009892929297.5%
Claude Opus 4.510010010010010010010098987797.4%
Gemini 3.5 Flash (Reasoning, Minimal)10010010010010010010098987797.4%
GPT-4o, Aug. 6th (temp=0)100100100100100989892929297.3%
Gemma 4 31B1001001009898989898927796.1%
Gemma 4 26B10098989898929292929295.5%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100985495.2%
GPT-51001001001001001009898777795.1%
GPT-5.4 (Reasoning, Low)100100989898989292927794.8%
Claude Opus 4.610010010010098989892777794.2%
GPT-4o, May 13th (temp=0)10010010010098989892777794.2%
Claude Sonnet 4.6100100989898929292925491.7%
Grok 4.20 (Reasoning)100100100100100100100100100990.9%
GPT-4.1 Mini100100989892929277777790.6%
Qwen 3.6 27B100100100100100100100100100090.0%
Nemotron 3 Nano100100100100100100100100100090.0%
MiniMax M2.71001001001001001001009898089.7%
GPT-5.4 Nano (Reasoning, Low)100100100100100100989898089.5%
GPT-4o, May 13th (temp=1)10010010010098987777545485.8%
GPT-5.410098989892929277772785.4%
Claude 3.7 Sonnet10010010010010098927754983.0%
GPT-5.4 Mini100100100989898927754081.8%
Gemini 2.5 Pro10098929292777777545481.4%
DeepSeek V3 (2025-03-24)100100100989292929227079.4%
Claude Sonnet 410010010098989898922078.7%
DeepSeek V3 (2024-12-26)100100100989292775454076.7%
Grok 4 Fast10010098929292777727976.6%
DeepSeek V4 Pro (Reasoning)1001001001001009292540073.8%
Z.AI GLM 4.610010098989277545454272.9%
Llama 3.1 70B10010098989277545427971.0%
Claude Opus 4100100100987777772727068.5%
Aion 2.0100100100100100987700067.6%
GPT-4.1 Nano1001009892777754542065.4%
Grok 4.2010010010098927754272065.0%
LFM2 24B1001009292775454270059.6%
GPT-5.4 Nano10010092777754272727959.2%
Stealth: Healer Alpha10010010010010077200057.9%
Hermes 3 405B100100989877772700057.9%
Stealth: Hunter Alpha1001007777545454540056.9%
Xiaomi MIMO v2.5100929277775427279055.7%
Mistral Large 2100100100927777000054.7%
Grok 4.20 (Beta)100987777545454272254.5%
Mistral Large10010092929254000053.0%
Grok 41009292927754992252.9%
Gemma 3 12B1009292929254220052.6%
Gemini 2.5 Flash Lite98987777545427270051.4%
Claude 3.5 Sonnet10098927754542790051.1%
MiniMax M2.5100100100100929000050.1%
Claude Sonnet 4.510010098777727990049.9%
Mistral Medium 3.1100929292779000046.3%
Z.AI GLM 4.5 Air100989892542000044.5%
Gemini 2.5 Flash Lite (Reasoning)989892772727920043.2%
Ministral 3 8B777777777727900042.4%
Llama 3.1 8B1001001009892000040.9%
Gemini 2.5 Flash1001001009290000040.1%
Gemini 2.5 Flash (Reasoning)1001001007799220039.9%
Z.AI GLM 4.510010092272727992039.4%
DeepSeek-V2 Chat1007777542727000036.3%
Mistral Small 4 (Reasoning)100927727279920034.4%
Mistral Large 398777754279000034.3%
Qwen 2.5 72B100100775420000033.3%
Ministral 3 3B10092775490000033.2%
Arcee AI: Trinity Large (Preview)10092922799200033.2%
Grok 4.1 Fast92775454540000033.1%
Cohere Command R+ (Aug. 2024)92775454279900032.2%
Ministral 3 14B100100100990000031.8%
DeepSeek V4 Flash100100772700000030.5%
Llama 3.1 Nemotron 70B100545427272000026.3%
Gemma 3 27B9892272790000025.5%
Hermes 3 70B10010054000000025.4%
DeepSeek V3.11005454000000020.7%
Xiaomi MIMO v2.5 Pro985454200000020.7%
DeepSeek V3.21007727200000020.6%
DeepSeek V4 Flash (Reasoning)1001002000000020.2%
Mistral Small 498929000000020.0%
Arcee AI: Trinity Mini545427200000013.6%
Qwen 3 32B10099200000012.0%
DeepSeek V4 Pro77270000000010.5%
Qwen3 235B A22B Instruct 250777270000000010.5%
Claude Haiku 4.5980000000009.9%
Ministral 3B980000000009.8%
Mistral Small Creative920000000009.2%
Ministral 8B920000000009.2%
Qwen 3.5 Plus (2026-02-15)772200000008.1%
Writer: Palmyra X52727920000006.5%
Mistral Small 3.2 24B549200000006.4%
ByteDance Seed 1.6 Flash270000000002.7%
Grok 4.320000000000.2%
Mistral NeMO00000000000.0%
Rocinante 12B00000000000.0%
Gemma 3 4B00000000000.0%
Claude 3 Haiku00000000000.0%
WizardLM 2 8x22b00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo1001001001001001001001001009899.8%
GPT-5.5 (Reasoning)1001001001001001001001001009899.8%
GPT-5.4 Mini (Reasoning)1001001001001001001001001009899.8%
Qwen 3.6 27B10010010010010010010098989899.5%
GPT-5.110010010010010010010098989899.5%
GPT-5.4 Nano (Reasoning)10010010010010010010098989899.5%
MoonshotAI: Kimi K2.51001001001001001009898989899.4%
MoonshotAI: Kimi K2.61001001001001001009898989899.4%
GPT-OSS 120B1001001001001001001001001009299.2%
GPT-5.2100100100100100100100100989299.1%
GPT-5.4 (Reasoning)100100100100100100100100989299.0%
Stealth: Aurora Alpha1001001001001001001001001007797.7%
o4 Mini High10010010010010010010098987797.4%
Qwen 3.5 122B10010010010010010010098987797.4%
o4 Mini100100100100100989892929297.3%
GPT-5100100989898989892929296.9%
GPT-5 Mini10010010010098989898927796.3%
ByteDance Seed 1.6100100100100100989892927795.8%
Claude Opus 4.71001001009892929292777792.2%
Qwen 3.6 Flash100100100100100100100100100090.0%
Qwen 3.5 9B100100100100100100100100100090.0%
Qwen 3.5 Plus (2026-04-20)10010010010010010010010098089.8%
DeepSeek V4 Pro (Reasoning)1001001001001001001009892089.0%
Claude Opus 4.7 (Reasoning)100100989898987777775487.9%
Z.AI GLM 4.7 Flash10010010010010010010010054085.3%
ByteDance Seed 2.0 Mini10010010010010010010010027082.7%
GPT-4o, Aug. 6th (temp=0)100989898989898920078.3%
GPT-4.110010098989892777727977.9%
ByteDance Seed 2.0 Lite100100100100100100100549076.3%
GPT-4o, Aug. 6th (temp=1)98989898929277549071.8%
GPT-5.510010010098927777279969.1%
Gemini 3.1 Flash Lite989892777777545427265.7%
Gemini 3 Flash (Preview)100929292927754542265.7%
GPT-5.4 Mini (Reasoning, Low)1001001009898777720065.3%
Grok 41001009892927754279065.0%
Mistral Large 31001001009892925490064.5%
Gemini 3.1 Flash Lite (Reasoning)100929292925454279962.1%
Grok 4.20 (Reasoning)100100100100100100200060.2%
GPT-4o Mini (temp=1)100100989292542792257.6%
Claude Sonnet 4.5100100100989254920055.5%
MiniMax M2.7100100100989827000052.4%
Gemini 2.5 Pro1001009892929000049.2%
Gemini 3.1 Flash Lite (Preview)9892777754542792049.1%
LFM2 24B9892929254272700048.3%
GPT-5.4 (Reasoning, Low)100100775454272790044.8%
Grok 4.20 (Beta, Reasoning)10010010077540000043.1%
GPT-5.4 Nano (Reasoning, Low)100929277542000041.7%
Stealth: Healer Alpha1001001007700000037.7%
GPT-4.1 Nano1009877272727000035.8%
Ministral 3 14B100987754270000035.7%
GPT-4o, May 13th (temp=1)10010077920000028.8%
Gemma 3 12B1009877900000028.5%
Ministral 3 8B100100542700000028.1%
DeepSeek V3 (2024-12-26)10010077000000027.7%
DeepSeek V4 Flash989254992000026.4%
GPT-5.47777545420000026.4%
Grok 4.20989854000000025.0%
Arcee AI: Trinity Large (Preview)989254000000024.4%
Gemini 2.5 Flash Lite (Reasoning)7754545400000023.8%
Grok 4 Fast987727920000021.4%
GPT-5.4 Mini1001009200000021.1%
GPT-4o Mini (temp=0)777754000000020.9%
Aion 2.0100920000000019.2%
Xiaomi MIMO v2.5100920000000019.2%
MiniMax M2.598920000000019.1%
Claude Opus 498920000000019.1%
Claude Opus 4.5100779000000018.7%
DeepSeek V3.1100770000000017.7%
DeepSeek V4 Pro92770000000017.0%
Claude Opus 4.6922727922000016.0%
DeepSeek V3.2100542000000015.5%
Writer: Palmyra X5772727000000013.2%
Qwen3 235B A22B Instruct 2507100272000000012.9%
DeepSeek V3 (2025-03-24)98270000000012.6%
Gemini 2.5 Flash (Reasoning)92270000000012.0%
GPT-4.1 Mini77279200000011.6%
Mistral Medium 3.177279000000011.4%
Grok 4.1 Fast10092000000011.1%
DeepSeek V4 Flash (Reasoning)9892000000010.9%
Arcee AI: Trinity Mini9820000000010.0%
Mistral Small Creative10000000000010.0%
GPT-5.4 Nano10000000000010.0%
Mistral Small 410000000000010.0%
Gemma 3 4B980000000009.8%
Ministral 3 3B980000000009.8%
Claude Sonnet 4920000000009.2%
Claude Haiku 4.5920000000009.2%
Gemini 3.5 Flash (Reasoning, Minimal)5427920000009.2%
Gemini 2.5 Flash770000000007.8%
Grok 4.20 (Beta)549900000007.2%
Gemma 3 27B2727200000005.6%
Qwen 3 32B540000000005.4%
Gemma 4 31B279922000004.9%
Cohere Command R+ (Aug. 2024)270000000002.8%
DeepSeek-V2 Chat270000000002.7%
Gemma 4 26B92200000001.3%
Z.AI GLM 4.690000000000.9%
ByteDance Seed 1.6 Flash90000000000.9%
Gemini 2.5 Flash Lite20000000000.2%
Claude Sonnet 4.620000000000.2%
GPT-4o, May 13th (temp=0)00000000000.0%
Claude 3.7 Sonnet00000000000.0%
Qwen 3.5 Plus (2026-02-15)00000000000.0%
Mistral Small 4 (Reasoning)00000000000.0%
Z.AI GLM 4.5 Air00000000000.0%
Stealth: Hunter Alpha00000000000.0%
Rocinante 12B00000000000.0%
Qwen 2.5 72B00000000000.0%
Claude 3 Haiku00000000000.0%
Grok 4.300000000000.0%
Xiaomi MIMO v2.5 Pro00000000000.0%
Z.AI GLM 4.500000000000.0%
Claude 3.5 Sonnet00000000000.0%
Hermes 3 405B00000000000.0%
Mistral Large 200000000000.0%
Mistral Large00000000000.0%
Mistral Small 3.2 24B00000000000.0%
Llama 3.1 70B00000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
Hermes 3 70B00000000000.0%
WizardLM 2 8x22b00000000000.0%
Mistral NeMO00000000000.0%
Ministral 8B00000000000.0%
Llama 3.1 8B00000000000.0%
Ministral 3B00000000000.0%