Matches word count

Test: Write N of X

Avg. Score
71.8%
Scenarios
5

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Inception Mercury99.8%$0.00022.1s98%
2Inception Mercury 299.8%$0.00091.2s98%
3GPT-5.4 Nano (Reasoning)99.9%$0.00104.4s99%
4GPT-5.4 Mini (Reasoning)99.9%$0.00313.8s99%
5Nemotron 3 Super100.0%$0.000015.3s100%
6Stealth: Aurora Alpha99.5%2.4s94%
7GPT-5 Nano100.0%$0.000819.2s100%
8GPT-5.199.8%$0.00739.2s99%
9GPT-5 Mini99.2%$0.002411.6s93%
10GPT-5.299.7%$0.00879.6s98%
11o4 Mini99.5%$0.006914.8s96%
12GPT-5.4 (Reasoning)99.7%$0.009811.2s98%
13Z.AI GLM 5 Turbo100.0%$0.01125.2s100%
14Qwen 3.5 Flash100.0%$0.003654.0s100%
15GPT-598.4%$0.01318.1s91%
16Gemini 3 Flash (Preview, Reasoning)100.0%$0.01828.6s100%
17o4 Mini High99.4%$0.01036.6s94%
18GPT-4.194.6%$0.00202.5s67%
19ByteDance Seed 1.699.1%$0.00611.0m93%
20Qwen 3.5 35B100.0%$0.01749.8s100%
21Gemini 3 Flash (Preview)93.0%$0.00111.5s58%
22GPT-4o, Aug. 6th (temp=0)94.8%$0.00482.2s61%
23GPT-4o, Aug. 6th (temp=1)93.3%$0.00482.0s60%
24Claude Sonnet 4.6 (Reasoning)100.0%$0.03519.6s100%
25GPT-5.4 Mini (Reasoning, Low)92.1%$0.00182.4s51%
26Qwen 3.5 27B100.0%$0.0191.3m100%
27Gemini 3.1 Flash Lite (Preview)89.7%$0.00051.1s48%
28Z.AI GLM 5100.0%$0.0161.6m100%
29GPT-5.4 (Reasoning, Low)87.4%$0.00433.6s46%
30Gemini 3 Pro (Preview)100.0%$0.04628.1s100%
31Claude Opus 4.6 (Reasoning)100.0%$0.05019.6s100%
32MoonshotAI: Kimi K2.599.9%$0.0162.0m99%
33ByteDance Seed 2.0 Lite95.3%$0.00551.0m61%
34GPT-4o Mini (temp=1)90.9%$0.000342.7s50%
35Nemotron 3 Nano90.0%$0.000422.7s40%
36GPT-5.4 Nano (Reasoning, Low)86.2%$0.00072.9s35%
37Z.AI GLM 4.7 Flash96.8%$0.00261.8m69%
38MiniMax M2.788.3%$0.002227.8s38%
39GPT-5.481.9%$0.00282.9s33%
40GPT-4o Mini (temp=0)84.1%$0.000310.7s30%
41GPT-4.1 Nano78.7%$0.00012.2s30%
42GPT-4.1 Mini79.9%$0.00042.3s27%
43Qwen 3.5 9B98.0%$0.00162.2m72%
44Gemini 2.5 Pro86.0%$0.01310.9s40%
45Qwen 3.5 122B97.5%$0.0271.1m71%
46GPT-5.4 Mini79.5%$0.00091.1s25%
47Z.AI GLM 4.7100.0%$0.00983.0m100%
48Grok 4 Fast77.8%$0.00043.4s27%
49Grok 483.2%$0.01013.3s36%
50GPT-4o, May 13th (temp=1)82.3%$0.009117.6s32%
51Claude Opus 4.581.5%$0.0124.4s28%
52Stealth: Healer Alpha78.4%$0.000015.6s21%
53Claude Opus 4.680.4%$0.0126.0s30%
54Grok 4.20 (Beta, Reasoning)86.4%$0.01911.3s35%
55DeepSeek V3 (2024-12-26)72.2%$0.00063.8s21%
56GPT-5.4 Nano71.1%$0.00041.4s20%
57Claude Sonnet 4.574.1%$0.00704.1s25%
58Gemini 2.5 Flash Lite (Reasoning)70.3%$0.00054.5s19%
59DeepSeek V3 (2025-03-24)71.5%$0.00056.5s17%
60Mistral Large 368.3%$0.00102.7s18%
61Claude Sonnet 4.674.8%$0.00713.5s20%
62Mistral Medium 3.168.1%$0.00092.9s16%
63Gemini 2.5 Flash Lite66.9%$0.0002711ms14%
64Stealth: Hunter Alpha69.8%$0.000015.6s17%
65GPT-4o, May 13th (temp=0)78.7%$0.009118.7s21%
66Claude 3.7 Sonnet70.3%$0.00694.8s16%
67Grok 4.20 (Beta)61.2%$0.0015863ms15%
68Grok 4.1 Fast66.5%$0.00076.3s11%
69Ministral 3 14B63.0%$0.00031.3s12%
70MiniMax M2.570.4%$0.001217.6s12%
71Gemini 3.1 Pro (Preview)100.0%$0.0751.0m100%
72Claude Sonnet 467.6%$0.00703.8s14%
73Z.AI GLM 4.561.3%$0.00063.7s12%
74Aion 2.074.2%$0.003534.4s16%
75Ministral 3 8B59.0%$0.00031.0s11%
76Gemma 3 27B58.9%$0.00023.8s11%
77ByteDance Seed 2.0 Mini96.5%$0.00313.3m66%
78Z.AI GLM 4.672.5%$0.003648.5s19%
79Hermes 3 405B60.3%$0.000011.0s8%
80Claude 3.5 Sonnet62.7%$0.006710.3s13%
81Gemma 3 12B56.2%$0.00013.1s6%
82Qwen 3.5 397B A17B100.0%$0.0303.5m100%
83DeepSeek-V2 Chat55.9%$0.00038.1s8%
84Mistral Small 4 (Reasoning)58.9%$0.001414.4s8%
85Qwen 3.5 Plus (2026-02-15)57.6%$0.00096.6s5%
86DeepSeek V3.154.5%$0.00048.5s8%
87Llama 3.1 8B53.0%$0.0003694ms6%
88LFM2 24B52.1%$0.00013.6s7%
89Gemini 2.5 Flash (Reasoning)61.7%$0.006612.0s8%
90Ministral 3 3B52.1%$0.0002716ms4%
91Mistral Small 3.2 24B51.3%$0.00022.3s5%
92Llama 3.1 Nemotron 70B50.2%$0.00063.3s7%
93Arcee AI: Trinity Large (Preview)50.4%$0.00003.1s4%
94Qwen 2.5 72B49.7%$0.00073.1s5%
95Mistral Small 448.9%$0.00031.5s3%
96Llama 3.1 70B45.7%$0.00151.3s6%
97Claude Haiku 4.548.5%$0.00232.3s3%
98Qwen 3 32B46.7%$0.000410.7s4%
99Claude Opus 476.9%$0.03521.6s22%
100Gemini 2.5 Flash41.3%$0.00051.1s1%
101DeepSeek V3.242.3%$0.00056.1s1%
102Cohere Command R+ (Aug. 2024)44.9%$0.00512.7s3%
103Mistral Small Creative39.2%$0.0002927ms0%
104ByteDance Seed 1.6 Flash46.6%$0.001017.0s0%
105Qwen3 235B A22B Instruct 250739.6%$0.00025.2s0%
106Claude 3.5 Haiku40.6%$0.00183.3s0%
107Arcee AI: Trinity Mini31.3%$0.00011.7s0%
108Claude 3 Haiku33.5%$0.000517.9s0%
109Writer: Palmyra X530.3%$0.00188.3s0%
110Mistral NeMO22.0%$0.00032.4s0%
111Ministral 8B18.8%$0.00021.4s0%
112Mistral Large 223.5%$0.00423.0s0%
113Hermes 3 70B17.3%$0.00074.8s0%
114Ministral 3B14.1%$0.0001710ms0%
115WizardLM 2 8x22b18.5%$0.001519.8s0%
116Gemma 3 4B3.8%$0.00011.9s0%
117Rocinante 12B5.4%$0.000617.7s0%
118Mistral Large21.9%$0.01729.8s0%
71.84%

Individual Scenarios

words

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Claude Sonnet 4.510010010010010010010010010010099.9%
Claude 3.5 Haiku10010010010010010010010010010099.9%
Grok 410010010010010010010010010010099.9%
Gemma 3 12B10010010010010010010010010010099.9%
Claude Opus 4.610010010010010010010010010010099.9%
Claude Opus 410010010010010010010010010010099.9%
GPT-4.1 Mini10010010010010010010010010010099.9%
Ministral 3 14B10010010010010010010010010010099.9%
Gemma 3 27B10010010010010010010010010010099.9%
MiniMax M2.51001001001001001001001001009899.8%
Stealth: Healer Alpha1001001001001001001001001009899.8%
Gemini 2.5 Flash Lite (Reasoning)1001001001001001001001001009899.8%
GPT-4.1 Nano1001001001001001001001001009899.8%
Claude Haiku 4.51001001001001001001001001009899.8%
Qwen 3.5 Plus (2026-02-15)1001001001001001001001001009899.8%
Cohere Command R+ (Aug. 2024)1001001001001001001001001009899.8%
Mistral Medium 3.11001001001001001001001001009899.8%
Mistral Small 41001001001001001001001001009899.8%
Mistral Small 4 (Reasoning)100100100100100100100100989899.7%
Mistral Small Creative100100100100100100100100989899.7%
GPT-4o, May 13th (temp=1)100100100100100100100100989899.6%
Stealth: Hunter Alpha100100100100100100100100989899.6%
DeepSeek-V2 Chat100100100100100100100100989899.6%
Mistral Small 3.2 24B100100100100100100100100989899.6%
Aion 2.010010010010010010010098989899.5%
GPT-4o, May 13th (temp=0)10010010010010010010098989899.5%
Z.AI GLM 4.51001001001001001009898989899.3%
Claude Opus 4.51001001001001001009898989899.3%
Mistral Large 3100100100100100989898989899.2%
DeepSeek V3.1100100100100100989898989899.2%
DeepSeek V3 (2025-03-24)100100100100100100100100989299.0%
LFM2 24B10098989898989898989898.6%
Ministral 3 8B100100989898989898989298.1%
Claude Sonnet 4.61001001001001001009898929298.1%
DeepSeek V3.2100100100100100989898929297.9%
Llama 3.1 8B100100100100100989898929297.9%
DeepSeek V3 (2024-12-26)10010010010098989898929297.8%
Ministral 3 3B10010010010098989898987796.9%
Writer: Palmyra X51001001001001001009898927796.6%
Claude 3 Haiku1001001009898989892927795.5%
Gemini 2.5 Flash100100100100100989292927795.2%
Hermes 3 405B10010010010010010010098985495.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100925494.5%
Arcee AI: Trinity Mini9292929292929292927790.7%
Qwen 2.5 72B10098989898989292775490.7%
Qwen 3 32B10010010010098989292922790.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100090.0%
Llama 3.1 Nemotron 70B9898989292929277775487.2%
Arcee AI: Trinity Large (Preview)1001001001001001001009854085.2%
Llama 3.1 70B9892929292927777775484.5%
WizardLM 2 8x22b1001001009898775420062.9%
Mistral NeMO10098987777772700055.6%
Hermes 3 70B100989892920000048.1%
Mistral Large10010010077779200046.5%
Ministral 8B9898925490000035.2%
Ministral 3B98989227272000034.5%
Mistral Large 298982200000020.1%
Rocinante 12B98770000000017.6%
Gemma 3 4B5427200000008.3%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)10010010010010010010010010010099.9%
Gemini 2.5 Pro10010010010010010010010010010099.9%
GPT-4o, May 13th (temp=0)10010010010010010010010010010099.9%
Grok 41001001001001001001001001009899.8%
Grok 4.20 (Beta, Reasoning)1001001001001001001001001009899.8%
Gemini 2.5 Flash (Reasoning)1001001001001001001001001009899.8%
Gemini 2.5 Flash Lite (Reasoning)1001001001001001001001001009899.8%
GPT-4o, May 13th (temp=1)1001001001001001001001001009899.8%
DeepSeek V3 (2025-03-24)1001001001001001001001001009899.8%
Ministral 3 3B1001001001001001001001001009899.8%
Claude Sonnet 4.61001001001001001001001001009899.8%
GPT-4.1 Nano1001001001001001001001001009899.8%
Mistral Small 4 (Reasoning)100100100100100100100100989899.7%
GPT-5.4100100100100100100100100989899.7%
Grok 4.1 Fast10010010010010010010098989899.5%
Z.AI GLM 4.510010010010010010010098989899.5%
Qwen 3.5 Plus (2026-02-15)10010010010010010010098989899.5%
GPT-4.110010010010010010010098989899.5%
GPT-4.1 Mini10010010010010010010098989899.5%
Claude 3.7 Sonnet10010010010010010010098989899.5%
Claude 3.5 Haiku1001001001001001009898989899.3%
GPT-4o, Aug. 6th (temp=1)1001001001001001009898989899.3%
ByteDance Seed 1.6 Flash1001001001001001001001001009299.2%
Stealth: Healer Alpha1001001001001001001001001009299.2%
Grok 4 Fast1001001001001001001001001009299.2%
Claude Opus 4100100100100100989898989899.2%
DeepSeek V3 (2024-12-26)100100100100100100100100989299.0%
GPT-5.4 Nano100100100100100100100100989299.0%
MiniMax M2.510010010010010010010098989298.9%
GPT-5.4 Mini10010010010010010010098989298.9%
Claude 3.5 Sonnet1001001001001001009898989298.7%
GPT-4o, Aug. 6th (temp=0)9898989898989898989898.4%
Qwen 2.5 72B100100100100100100100100929298.4%
DeepSeek-V2 Chat100100100100100100100100929298.4%
Z.AI GLM 4.61001001009898989898989298.2%
Stealth: Hunter Alpha100100100100100989898929297.9%
Gemini 2.5 Flash Lite100100100100100989898929297.9%
Claude Sonnet 41001001009898989898929297.6%
Aion 2.0100100100100100100100100927797.0%
Gemma 3 12B100100100100100989898927796.4%
Ministral 3 8B100100989898989292927794.8%
Claude Opus 4.610098989898989292927794.6%
Claude Opus 4.510010010010098989898925493.9%
Mistral Small 3.2 24B10098989898989277777791.6%
Nemotron 3 Nano100100100100100100100100100290.2%
Qwen 3.5 122B100100100100100100100100100090.0%
Cohere Command R+ (Aug. 2024)10098989898927777777789.5%
Hermes 3 405B100100100100100100989292088.3%
Mistral Medium 3.1100100989892929277775488.2%
Mistral Small 410010010010098927777775487.6%
Claude Sonnet 4.510098989292929277775487.4%
DeepSeek V3.11001001009898929277542783.9%
Ministral 3 14B10098927777777777777783.3%
Llama 3.1 8B100100100100100989277272782.3%
Arcee AI: Trinity Large (Preview)1001001001009898929227080.9%
Gemma 3 27B10098989292927777272778.3%
Mistral Small Creative10077777777777777775477.3%
Qwen 3 32B100100100100929292779977.2%
Qwen3 235B A22B Instruct 250710010098929277775454274.6%
Claude 3 Haiku1001001001009877545427971.9%
Grok 4.20 (Beta)10010010098927754549969.3%
Claude Haiku 4.5100100100987754542727964.6%
Gemini 2.5 Flash989292777777542727963.2%
Llama 3.1 Nemotron 70B10010098989227992053.6%
DeepSeek V3.2100989277542220042.7%
Writer: Palmyra X592927754279000035.2%
WizardLM 2 8x22b1009898000000029.7%
Ministral 3B929254000000023.8%
Hermes 3 70B77540000000013.1%
Arcee AI: Trinity Mini77540000000013.1%
Ministral 8B10020000000010.2%
Mistral NeMO920000000009.2%
Rocinante 12B920000000009.2%
Mistral Large549900000007.2%
LFM2 24B542222220006.4%
Llama 3.1 70B270000000002.8%
Gemma 3 4B00000000000.0%
Mistral Large 200000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)10010010010010010010010010010099.9%
GPT-5.11001001001001001001001001009899.8%
GPT-5.4 Mini (Reasoning)1001001001001001001001001009899.8%
GPT-5.4 Nano (Reasoning)1001001001001001001001001009899.8%
GPT-5.4 (Reasoning)1001001001001001001001001009899.8%
GPT-4o, Aug. 6th (temp=0)1001001001001001001001001009899.8%
Gemini 2.5 Pro100100100100100100100100989899.7%
MiniMax M2.710010010010010010010098989899.5%
GPT-4o Mini (temp=1)1001001001001001001001001009299.2%
Grok 4100100100100100100100100929298.4%
Claude Opus 4.5100100100100100100100100929298.4%
GPT-4.1 Mini1001001001001001009898929298.1%
GPT-5.41001001001001001009898929298.1%
Claude Opus 4100100100100100989898929297.9%
Stealth: Healer Alpha100100100100100100100100987797.6%
GPT-4o, May 13th (temp=1)1001001001001001009892929297.5%
GPT-5.4 (Reasoning, Low)1001001001001001009898987797.2%
Claude Opus 4.6100100100100100989898987797.1%
GPT-4.11001001001001001009898927796.6%
GPT-4o, Aug. 6th (temp=1)1001001001001001009892927796.0%
GPT-5.4 Mini10010010010098989892927795.7%
Mistral Medium 3.110010010010098989292927795.1%
Stealth: Hunter Alpha10010010010010010010092777794.7%
GPT-4.1 Nano1001001009898989292925492.5%
Grok 4 Fast1001001009898989892775491.7%
Z.AI GLM 4.61001001001001001009877775490.7%
Grok 4.1 Fast100100100100100100989892289.1%
Aion 2.010010010010010010010010077087.7%
GPT-5.4 Nano1001001009898927777775487.5%
Gemma 3 27B100100100989898927777985.1%
Gemini 2.5 Flash Lite10010098989892929277285.1%
Gemini 2.5 Flash Lite (Reasoning)1001001001009892927777984.7%
Claude Sonnet 4.6100100100989892927777984.5%
MiniMax M2.510010098989292929277084.3%
Llama 3.1 Nemotron 70B100100100989892927777283.7%
Qwen 3.5 Plus (2026-02-15)100100100989898777754280.5%
Claude Sonnet 4.510010010010010098777727078.0%
Grok 4.20 (Beta)100100989892777754272775.2%
Llama 3.1 70B100100989892775427272770.2%
Nemotron 3 Nano10010010010010010010000070.0%
Claude 3.7 Sonnet989898929292542727968.9%
Z.AI GLM 4.51009898989254545427968.4%
DeepSeek V3 (2025-03-24)100989892927754542066.7%
Ministral 3 14B1001001009292777722064.2%
Claude 3.5 Sonnet10010098927754542727963.9%
Mistral Small 4 (Reasoning)1001001001001005427272061.0%
Hermes 3 405B10098989892772792060.3%
DeepSeek V3 (2024-12-26)1001009892925427279060.0%
Claude Haiku 4.5100100989877772792059.0%
Mistral Small 3.2 24B10010010098982727279058.8%
Gemini 2.5 Flash (Reasoning)1001001009292542722056.9%
Claude Sonnet 498929277545427272052.4%
DeepSeek V3.11007754545454542727950.9%
Qwen 3 32B100100777754542720049.1%
LFM2 24B100989277779992247.6%
Mistral NeMO10010092775427200045.2%
Llama 3.1 8B100100775454272700043.9%
Mistral Large 310010010027272727270043.7%
Mistral Large 2989277775427000042.6%
DeepSeek-V2 Chat9292545454542700042.6%
Ministral 8B100989277270000039.6%
DeepSeek V3.210098772799999034.9%
Ministral 3 8B10098545492000031.6%
ByteDance Seed 1.6 Flash100100100000000030.0%
Arcee AI: Trinity Mini9877772790000029.0%
Arcee AI: Trinity Large (Preview)1009292000000028.4%
Mistral Small 41009854992000027.2%
Qwen 2.5 72B9277542792200026.3%
Ministral 3 3B985454220000020.9%
Qwen3 235B A22B Instruct 2507540000000005.4%
Gemma 3 12B279200000003.8%
Mistral Large272000000002.9%
Claude 3.5 Haiku270000000002.7%
Ministral 3B99220000002.1%
Gemma 3 4B90000000000.9%
Writer: Palmyra X500000000000.0%
Gemini 2.5 Flash00000000000.0%
Cohere Command R+ (Aug. 2024)00000000000.0%
Rocinante 12B00000000000.0%
Hermes 3 70B00000000000.0%
Mistral Small Creative00000000000.0%
Claude 3 Haiku00000000000.0%
WizardLM 2 8x22b00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
ByteDance Seed 1.610010010010010010010010010010099.9%
Nemotron 3 Super1001001001001001001001001009899.8%
GPT-5.4 Mini (Reasoning)1001001001001001001001001009899.8%
GPT-5 Nano1001001001001001001001001009899.8%
GPT-5 Mini1001001001001001001001001009899.8%
GPT-4o Mini (temp=0)1001001001001001001001001009899.8%
GPT-5.4 (Reasoning)100100100100100100100100989899.7%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100989899.6%
o4 Mini High10010010010010010010098989899.5%
GPT-5.110010010010010010010098989899.5%
GPT-4o, Aug. 6th (temp=1)10010010010010010010098989899.5%
GPT-5.21001001001001001009898989899.4%
Gemini 3 Flash (Preview)1001001001001001009898989899.3%
Inception Mercury 21001001001001001001001001009299.2%
Inception Mercury1001001001001001001001001009299.2%
Grok 4.20 (Beta, Reasoning)100100100100100100100100989299.1%
GPT-4.110010010010010010010098989298.9%
Z.AI GLM 4.7 Flash100100100100100100100100929298.4%
GPT-4o Mini (temp=1)100100100100100989898929297.9%
Claude Opus 4.510010010010010010010098987797.4%
GPT-4o, Aug. 6th (temp=0)100100100100100989892929297.3%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100985495.2%
GPT-51001001001001001009898777795.1%
GPT-5.4 (Reasoning, Low)100100989898989292927794.8%
Claude Opus 4.610010010010098989892777794.2%
GPT-4o, May 13th (temp=0)10010010010098989892777794.2%
Claude Sonnet 4.6100100989898929292925491.7%
GPT-4.1 Mini100100989892929277777790.6%
Nemotron 3 Nano100100100100100100100100100090.0%
MiniMax M2.71001001001001001001009898089.7%
GPT-5.4 Nano (Reasoning, Low)100100100100100100989898089.5%
GPT-4o, May 13th (temp=1)10010010010098987777545485.8%
GPT-5.410098989892929277772785.4%
Claude 3.7 Sonnet10010010010010098927754983.0%
GPT-5.4 Mini100100100989898927754081.8%
Gemini 2.5 Pro10098929292777777545481.4%
DeepSeek V3 (2025-03-24)100100100989292929227079.4%
Claude Sonnet 410010010098989898922078.7%
DeepSeek V3 (2024-12-26)100100100989292775454076.7%
Grok 4 Fast10010098929292777727976.6%
Z.AI GLM 4.610010098989277545454272.9%
Llama 3.1 70B10010098989277545427971.0%
Claude Opus 4100100100987777772727068.5%
Aion 2.0100100100100100987700067.6%
GPT-4.1 Nano1001009892777754542065.4%
LFM2 24B1001009292775454270059.6%
GPT-5.4 Nano10010092777754272727959.2%
Stealth: Healer Alpha10010010010010077200057.9%
Hermes 3 405B100100989877772700057.9%
Stealth: Hunter Alpha1001007777545454540056.9%
Mistral Large 2100100100927777000054.7%
Grok 4.20 (Beta)100987777545454272254.5%
Mistral Large10010092929254000053.0%
Grok 41009292927754992252.9%
Gemma 3 12B1009292929254220052.6%
Gemini 2.5 Flash Lite98987777545427270051.4%
Claude 3.5 Sonnet10098927754542790051.1%
MiniMax M2.5100100100100929000050.1%
Claude Sonnet 4.510010098777727990049.9%
Mistral Medium 3.1100929292779000046.3%
Gemini 2.5 Flash Lite (Reasoning)989892772727920043.2%
Ministral 3 8B777777777727900042.4%
Llama 3.1 8B1001001009892000040.9%
Gemini 2.5 Flash1001001009290000040.1%
Gemini 2.5 Flash (Reasoning)1001001007799220039.9%
Z.AI GLM 4.510010092272727992039.4%
DeepSeek-V2 Chat1007777542727000036.3%
Mistral Small 4 (Reasoning)100927727279920034.4%
Mistral Large 398777754279000034.3%
Qwen 2.5 72B100100775420000033.3%
Ministral 3 3B10092775490000033.2%
Arcee AI: Trinity Large (Preview)10092922799200033.2%
Grok 4.1 Fast92775454540000033.1%
Cohere Command R+ (Aug. 2024)92775454279900032.2%
Ministral 3 14B100100100990000031.8%
Llama 3.1 Nemotron 70B100545427272000026.3%
Gemma 3 27B9892272790000025.5%
Hermes 3 70B10010054000000025.4%
DeepSeek V3.11005454000000020.7%
DeepSeek V3.21007727200000020.6%
Mistral Small 498929000000020.0%
Arcee AI: Trinity Mini545427200000013.6%
Qwen 3 32B10099200000012.0%
Qwen3 235B A22B Instruct 250777270000000010.5%
Claude Haiku 4.5980000000009.9%
Ministral 3B980000000009.8%
Mistral Small Creative920000000009.2%
Ministral 8B920000000009.2%
Qwen 3.5 Plus (2026-02-15)772200000008.1%
Writer: Palmyra X52727920000006.5%
Mistral Small 3.2 24B549200000006.4%
ByteDance Seed 1.6 Flash270000000002.7%
Claude 3.5 Haiku90000000000.9%
Mistral NeMO00000000000.0%
Rocinante 12B00000000000.0%
Gemma 3 4B00000000000.0%
Claude 3 Haiku00000000000.0%
WizardLM 2 8x22b00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo1001001001001001001001001009899.8%
GPT-5.4 Mini (Reasoning)1001001001001001001001001009899.8%
GPT-5.110010010010010010010098989899.5%
GPT-5.4 Nano (Reasoning)10010010010010010010098989899.5%
MoonshotAI: Kimi K2.51001001001001001009898989899.4%
GPT-5.2100100100100100100100100989299.1%
GPT-5.4 (Reasoning)100100100100100100100100989299.0%
Stealth: Aurora Alpha1001001001001001001001001007797.7%
o4 Mini High10010010010010010010098987797.4%
Qwen 3.5 122B10010010010010010010098987797.4%
o4 Mini100100100100100989892929297.3%
GPT-5100100989898989892929296.9%
GPT-5 Mini10010010010098989898927796.3%
ByteDance Seed 1.6100100100100100989892927795.8%
Qwen 3.5 9B100100100100100100100100100090.0%
Z.AI GLM 4.7 Flash10010010010010010010010054085.3%
ByteDance Seed 2.0 Mini10010010010010010010010027082.7%
GPT-4o, Aug. 6th (temp=0)100989898989898920078.3%
GPT-4.110010098989892777727977.9%
ByteDance Seed 2.0 Lite100100100100100100100549076.3%
GPT-4o, Aug. 6th (temp=1)98989898929277549071.8%
Gemini 3 Flash (Preview)100929292927754542265.7%
GPT-5.4 Mini (Reasoning, Low)1001001009898777720065.3%
Grok 41001009892927754279065.0%
Mistral Large 31001001009892925490064.5%
GPT-4o Mini (temp=1)100100989292542792257.6%
Claude Sonnet 4.5100100100989254920055.5%
MiniMax M2.7100100100989827000052.4%
Gemini 2.5 Pro1001009892929000049.2%
Gemini 3.1 Flash Lite (Preview)9892777754542792049.1%
LFM2 24B9892929254272700048.3%
GPT-5.4 (Reasoning, Low)100100775454272790044.8%
Grok 4.20 (Beta, Reasoning)10010010077540000043.1%
GPT-5.4 Nano (Reasoning, Low)100929277542000041.7%
Stealth: Healer Alpha1001001007700000037.7%
GPT-4.1 Nano1009877272727000035.8%
Ministral 3 14B100987754270000035.7%
GPT-4o, May 13th (temp=1)10010077920000028.8%
Gemma 3 12B1009877900000028.5%
Ministral 3 8B100100542700000028.1%
DeepSeek V3 (2024-12-26)10010077000000027.7%
GPT-5.47777545420000026.4%
Arcee AI: Trinity Large (Preview)989254000000024.4%
Gemini 2.5 Flash Lite (Reasoning)7754545400000023.8%
Grok 4 Fast987727920000021.4%
GPT-5.4 Mini1001009200000021.1%
GPT-4o Mini (temp=0)777754000000020.9%
Aion 2.0100920000000019.2%
MiniMax M2.598920000000019.1%
Claude Opus 498920000000019.1%
Claude Opus 4.5100779000000018.7%
DeepSeek V3.1100770000000017.7%
Claude Opus 4.6922727922000016.0%
DeepSeek V3.2100542000000015.5%
Writer: Palmyra X5772727000000013.2%
Qwen3 235B A22B Instruct 2507100272000000012.9%
DeepSeek V3 (2025-03-24)98270000000012.6%
Gemini 2.5 Flash (Reasoning)92270000000012.0%
GPT-4.1 Mini77279200000011.6%
Mistral Medium 3.177279000000011.4%
Grok 4.1 Fast10092000000011.1%
Arcee AI: Trinity Mini9820000000010.0%
Mistral Small Creative10000000000010.0%
GPT-5.4 Nano10000000000010.0%
Mistral Small 410000000000010.0%
Gemma 3 4B980000000009.8%
Ministral 3 3B980000000009.8%
Claude Sonnet 4920000000009.2%
Claude Haiku 4.5920000000009.2%
Gemini 2.5 Flash770000000007.8%
Grok 4.20 (Beta)549900000007.2%
Gemma 3 27B2727200000005.6%
Qwen 3 32B540000000005.4%
Cohere Command R+ (Aug. 2024)270000000002.8%
DeepSeek-V2 Chat270000000002.7%
Z.AI GLM 4.690000000000.9%
ByteDance Seed 1.6 Flash90000000000.9%
Gemini 2.5 Flash Lite20000000000.2%
Claude Sonnet 4.620000000000.2%
GPT-4o, May 13th (temp=0)00000000000.0%
Claude 3.7 Sonnet00000000000.0%
Qwen 3.5 Plus (2026-02-15)00000000000.0%
Mistral Small 4 (Reasoning)00000000000.0%
Stealth: Hunter Alpha00000000000.0%
Claude 3.5 Haiku00000000000.0%
Rocinante 12B00000000000.0%
Qwen 2.5 72B00000000000.0%
Claude 3 Haiku00000000000.0%
Z.AI GLM 4.500000000000.0%
Claude 3.5 Sonnet00000000000.0%
Hermes 3 405B00000000000.0%
Mistral Large 200000000000.0%
Mistral Large00000000000.0%
Mistral Small 3.2 24B00000000000.0%
Llama 3.1 70B00000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
Hermes 3 70B00000000000.0%
WizardLM 2 8x22b00000000000.0%
Mistral NeMO00000000000.0%
Ministral 8B00000000000.0%
Llama 3.1 8B00000000000.0%
Ministral 3B00000000000.0%