Subordinate conjunction sentence starts

Test: Bad Writing Habits

Avg. Score
32.5%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1GPT-5.4 Nano44.2%$0.005726.3s15%
2ByteDance Seed 1.6 Flash45.7%$0.001327.3s11%
3Gemini 2.5 Flash Lite43.8%$0.00099.5s11%
4Stealth: Healer Alpha37.9%$0.000023.7s13%
5Gemini 2.5 Flash Lite (Reasoning)44.3%$0.002830.8s10%
6Grok 4.20 (Beta)38.0%$0.01815.8s12%
7Qwen 3 32B46.1%$0.001554.6s10%
8Gemma 3 4B42.9%$0.000220.0s9%
9Stealth: Hunter Alpha41.8%$0.000055.0s11%
10Writer: Palmyra X544.4%$0.01122.0s9%
11Gemma 3 12B37.9%$0.000441.3s10%
12Gemini 3.1 Flash Lite (Preview)47.6%$0.00308.4s6%
13Rocinante 12B48.7%$0.001438.4s6%
14GPT-5.4 Nano (Reasoning, Low)39.1%$0.005520.6s9%
15Qwen 3.5 Plus (2026-02-15)38.5%$0.006031.5s9%
16Z.AI GLM 541.3%$0.00841.2m10%
17Mistral Small Creative36.9%$0.00079.1s7%
18Z.AI GLM 5 Turbo36.5%$0.008133.2s9%
19GPT-5.4 Mini33.1%$0.01516.8s10%
20Gemma 3 27B39.2%$0.000652.6s8%
21Qwen 3.5 397B A17B54.2%$0.0143.0m10%
22Qwen3 235B A22B Instruct 250739.3%$0.001159.2s8%
23Gemini 2.5 Flash (Reasoning)38.0%$0.01121.5s7%
24GPT-5.4 Mini (Reasoning, Low)32.8%$0.01516.8s9%
25o4 Mini29.5%$0.01525.7s9%
26Claude Sonnet 445.8%$0.03243.7s7%
27GPT-4o, Aug. 6th (temp=1)45.8%$0.01824.4s4%
28GPT-5.4 (Reasoning, Low)44.9%$0.0551.4m10%
29GPT-5 Nano38.3%$0.00421.4m8%
30GPT-5.4 Mini (Reasoning)32.4%$0.02228.1s9%
31Z.AI GLM 4.736.5%$0.0101.4m9%
32Claude Opus 4.634.7%$0.0781.2m14%
33GPT-5.4 Nano (Reasoning)30.1%$0.006124.5s8%
34GPT-5.444.6%$0.0491.4m9%
35GPT-5.4 (Reasoning)47.6%$0.0892.6m14%
36Aion 2.032.1%$0.00641.3m9%
37GPT-5 Mini33.0%$0.010057.4s8%
38Gemini 3 Pro (Preview)40.3%$0.05554.4s9%
39Cohere Command R+ (Aug. 2024)47.1%$0.02052.5s3%
40Hermes 3 70B47.9%$0.00101.2m3%
41GPT-5.135.4%$0.0541.8m12%
42Claude Opus 4.6 (Reasoning)38.8%$0.0881.4m11%
43Qwen 3.5 35B36.3%$0.0181.0m5%
44Mistral Small 4 (Reasoning)31.0%$0.002230.2s4%
45Mistral NeMO38.7%$0.000510.1s0%
46Qwen 3.5 Flash35.4%$0.002547.5s3%
47GPT-4o Mini (temp=1)40.9%$0.001234.8s0%
48GPT-4.1 Nano36.0%$0.000713.3s0%
49Llama 3.1 Nemotron 70B37.6%$0.003831.7s0%
50Ministral 3 14B33.2%$0.000711.7s0%
51Llama 3.1 8B41.4%$0.00031.3m0%
52Arcee AI: Trinity Mini30.9%$0.00039.2s0%
53Claude Haiku 4.535.3%$0.01121.6s0%
54Claude 3 Haiku31.4%$0.002514.9s0%
55o4 Mini High26.5%$0.02547.2s5%
56Gemini 2.5 Flash31.2%$0.005210.6s0%
57Grok 4 Fast31.9%$0.001724.1s0%
58Mistral Large35.1%$0.01430.9s0%
59Mistral Small 430.1%$0.001418.2s0%
60GPT-534.6%$0.0652.8m10%
61Gemini 3 Flash (Preview, Reasoning)33.5%$0.01230.1s0%
62Z.AI GLM 4.7 Flash36.3%$0.00171.2m0%
63Ministral 3 8B27.6%$0.000819.6s0%
64Z.AI GLM 4.531.8%$0.005142.1s0%
65Claude Sonnet 4.637.6%$0.03139.3s0%
66GPT-5.224.8%$0.0561.5m8%
67GPT-4o, May 13th (temp=1)33.2%$0.03314.4s0%
68Z.AI GLM 4.632.0%$0.006551.5s0%
69Mistral Large 328.0%$0.003330.3s0%
70Ministral 8B23.9%$0.000410.4s0%
71GPT-4.1 Mini25.4%$0.002719.0s0%
72Mistral Medium 3.128.3%$0.004836.5s0%
73GPT-4o Mini (temp=0)27.0%$0.001234.8s0%
74Arcee AI: Trinity Large (Preview)27.9%$0.000043.6s0%
75Hermes 3 405B29.9%$0.003253.2s0%
76Gemini 3 Flash (Preview)25.2%$0.007819.6s0%
77Claude 3.7 Sonnet36.6%$0.04246.7s0%
78Mistral Large 226.5%$0.01329.4s0%
79Llama 3.1 70B23.0%$0.001529.4s0%
80DeepSeek-V2 Chat26.5%$0.002153.3s0%
81LFM2 24B22.0%$0.000228.4s0%
82MiniMax M2.728.6%$0.00401.1m0%
83DeepSeek V3.234.6%$0.00141.9m0%
84Ministral 3B18.4%$0.00018.1s0%
85MiniMax M2.529.0%$0.00341.3m0%
86DeepSeek V3 (2024-12-26)25.4%$0.002154.6s0%
87Qwen 2.5 72B22.0%$0.001036.7s0%
88Gemini 2.5 Pro29.5%$0.03636.2s0%
89Nemotron 3 Nano25.5%$0.00101.1m0%
90GPT-4.125.9%$0.01844.7s0%
91DeepSeek V3.131.7%$0.00201.8m0%
92GPT-4o, Aug. 6th (temp=0)23.4%$0.02322.7s0%
93Grok 4.20 (Beta, Reasoning)28.6%$0.03934.0s0%
94Qwen 3.5 122B29.0%$0.0251.1m0%
95GPT-4o, May 13th (temp=0)23.3%$0.03514.1s0%
96Claude Sonnet 4.526.0%$0.03538.1s0%
97Ministral 3 3B13.0%$0.000511.1s0%
98DeepSeek V3 (2025-03-24)17.4%$0.001439.4s0%
99Stealth: Aurora Alpha12.5%$0.00009.8s0%
100WizardLM 2 8x22b27.8%$0.00261.8m0%
101ByteDance Seed 1.636.2%$0.0132.5m0%
102Claude Opus 4.535.1%$0.07053.4s0%
103Claude 3.5 Haiku12.2%$0.003510.8s0%
104Qwen 3.5 9B22.3%$0.00111.4m0%
105Grok 4.1 Fast14.6%$0.001837.8s0%
106ByteDance Seed 2.0 Lite31.3%$0.0122.2m0%
107Nemotron 3 Super21.1%$0.00001.4m0%
108Qwen 3.5 27B27.1%$0.0201.6m0%
109Inception Mercury 29.2%$0.00327.0s0%
110MoonshotAI: Kimi K2.537.5%$0.0193.2m0%
111Claude Sonnet 4.6 (Reasoning)27.8%$0.0601.2m0%
112Grok 428.4%$0.0481.7m0%
113Claude Opus 439.1%$0.2091.4m8%
114Claude 3.5 Sonnet16.6%$0.04835.5s0%
115Inception Mercury3.7%$0.01117.6s0%
116ByteDance Seed 2.0 Mini38.0%$0.00454.9m0%
117Gemini 3.1 Pro (Preview)30.7%$0.1071.8m0%
118Mistral Small 3.2 24B5.9%$0.00695.7m0%
32.46%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemma 3 27B1001001001006492.8%
Cohere Command R+ (Aug. 2024)100100100100080.0%
Gemini 2.5 Flash Lite10010070646078.8%
Hermes 3 70B10010010071074.3%
Gemma 3 4B10010010071074.3%
Stealth: Hunter Alpha1001009865072.6%
DeepSeek V3.110010010062072.3%
Gemini 2.5 Pro1001008870071.6%
Gemma 3 12B1001007971070.2%
GPT-4o Mini (temp=0)1001007965068.9%
Mistral Large 2100887263064.5%
Gemini 3 Pro (Preview)1008552513163.6%
GPT-4o, Aug. 6th (temp=1)1001001000060.0%
Rocinante 12B1001001000060.0%
Llama 3.1 8B100100960059.2%
Z.AI GLM 4.7100625342051.3%
Claude Sonnet 4.610085710051.2%
LFM2 24B10077760050.5%
Writer: Palmyra X5100100490049.7%
GPT-5.4 (Reasoning)885945282348.7%
Gemini 3 Flash (Preview, Reasoning)100100430048.5%
Ministral 3 8B10070670047.4%
DeepSeek V3.210083490046.4%
Z.AI GLM 510060600044.0%
Gemini 2.5 Flash Lite (Reasoning)8665630042.9%
Arcee AI: Trinity Large (Preview)10056480040.9%
Qwen3 235B A22B Instruct 250710010000040.0%
Ministral 3 14B10010000040.0%
Mistral Small 4 (Reasoning)8368440039.1%
GPT-4.150494947038.8%
ByteDance Seed 2.0 Mini1009300038.5%
MoonshotAI: Kimi K2.51008800037.5%
Grok 41008300036.7%
ByteDance Seed 1.6 Flash10050280035.4%
Stealth: Healer Alpha56493534034.8%
GPT-4o, May 13th (temp=0)937500033.4%
Qwen 3.5 Flash10052140033.2%
Z.AI GLM 4.51006500033.0%
MiniMax M2.71006400032.8%
Claude Opus 47064300032.8%
Mistral Large9838270032.5%
Grok 4 Fast1006100032.2%
Ministral 8B6752420032.0%
Claude 3 Haiku897000031.9%
GPT-5.4 (Reasoning, Low)9634270031.4%
WizardLM 2 8x22b7242400030.9%
Z.AI GLM 5 Turbo1004900029.8%
Gemini 2.5 Flash786800029.2%
Claude Sonnet 4.51004200028.5%
GPT-4o, May 13th (temp=1)717000028.4%
Mistral Small Creative716700027.6%
Aion 2.01003800027.6%
Mistral NeMO963700026.6%
GPT-5.4 Nano (Reasoning, Low)58272522026.3%
Nemotron 3 Nano1003100026.2%
GPT-5 Mini1003000026.0%
Mistral Small 4685700025.1%
Claude Sonnet 4.6 (Reasoning)575700022.7%
Claude Opus 4.6585500022.6%
Grok 4.20 (Beta)5129270021.5%
GPT-56324200021.5%
Claude Opus 4.6 (Reasoning)574900021.2%
GPT-5.4 Nano (Reasoning)4827270020.4%
ByteDance Seed 1.6100000020.0%
Mistral Large 3100000020.0%
Claude Haiku 4.5100000020.0%
DeepSeek-V2 Chat100000020.0%
Z.AI GLM 4.7 Flash100000020.0%
ByteDance Seed 2.0 Lite100000020.0%
DeepSeek V3 (2024-12-26)100000020.0%
Hermes 3 405B100000020.0%
DeepSeek V3 (2025-03-24)100000020.0%
Llama 3.1 70B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
GPT-4.1 Nano100000020.0%
Arcee AI: Trinity Mini100000020.0%
Ministral 3B100000020.0%
Qwen 3.5 35B543800018.4%
GPT-5.4 Nano642500017.8%
GPT-4o, Aug. 6th (temp=0)88000017.5%
Claude Sonnet 483000016.7%
Qwen 2.5 72B75000014.9%
GPT-4.1 Mini69000013.9%
GPT-5.4 Mini (Reasoning, Low)353300013.7%
Gemini 2.5 Flash (Reasoning)68000013.5%
GPT-5.4 Mini343300013.4%
GPT-5.465000013.1%
Claude Opus 4.565000013.0%
Qwen 3 32B60000012.0%
Qwen 3.5 Plus (2026-02-15)59000011.8%
Qwen 3.5 9B58000011.6%
Gemini 3 Flash (Preview)51000010.1%
Qwen 3.5 397B A17B28220009.9%
Grok 4.20 (Beta, Reasoning)4800009.6%
Mistral Medium 3.14800009.6%
o4 Mini4700009.4%
Gemini 3.1 Pro (Preview)4700009.3%
GPT-5.124220009.1%
GPT-5 Nano24210009.1%
GPT-5.4 Mini (Reasoning)4100008.2%
Claude 3.7 Sonnet4000007.9%
o4 Mini High3100006.3%
GPT-5.23000006.0%
Qwen 3.5 122B000000.0%
Qwen 3.5 27B000000.0%
Grok 4.1 Fast000000.0%
Z.AI GLM 4.6000000.0%
MiniMax M2.5000000.0%
Gemini 3.1 Flash Lite (Preview)000000.0%
Nemotron 3 Super000000.0%
Claude 3.5 Sonnet000000.0%
Inception Mercury 2000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Haiku000000.0%
Inception Mercury000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Small 3.2 24B000000.0%
Ministral 3 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 2.5 Flash Lite10010095747188.0%
ByteDance Seed 2.0 Lite10010010077075.4%
MiniMax M2.71009978583373.6%
Claude Sonnet 4.610010010061072.2%
Aion 2.01008069563668.2%
GPT-5 Mini10010075362968.1%
Qwen3 235B A22B Instruct 250710010010040067.9%
ByteDance Seed 1.6 Flash1009745423764.2%
Mistral Medium 3.11006853514763.8%
Z.AI GLM 5 Turbo1006053535263.5%
Rocinante 12B1001006552063.3%
GPT-4o, Aug. 6th (temp=1)1001006947063.3%
GPT-5737069643862.7%
Gemini 2.5 Flash Lite (Reasoning)100896360062.4%
GPT-4o, May 13th (temp=1)1001001000060.0%
Gemma 3 27B1001005246059.5%
MiniMax M2.5100945147058.4%
Mistral Large 31001004943058.3%
Writer: Palmyra X51001004441057.1%
Claude Sonnet 4.51001004341056.8%
GPT-5.4 Nano846564432055.2%
Cohere Command R+ (Aug. 2024)100100680053.5%
Grok 4.20 (Beta, Reasoning)836942413153.3%
Claude Opus 4.6 (Reasoning)10086790053.0%
Claude Sonnet 4.6 (Reasoning)100100650053.0%
WizardLM 2 8x22b1006836352653.0%
Gemma 3 4B100615746052.9%
GPT-5.11005145412752.7%
ByteDance Seed 2.0 Mini9386830052.4%
Stealth: Hunter Alpha10085760052.2%
GPT-4.1 Nano100100570051.5%
Claude Haiku 4.593784638051.2%
Mistral Large10083710051.0%
GPT-5.4 (Reasoning)837834302850.5%
Grok 4100100520050.4%
GPT-5.4 Nano (Reasoning, Low)969023212150.3%
Gemini 3 Flash (Preview)100100450049.0%
Gemini 3 Pro (Preview)10079650048.9%
Claude Opus 4.6100823428048.9%
Claude Sonnet 4100100430048.7%
Claude 3.7 Sonnet83724543048.5%
GPT-5.4 Mini (Reasoning)100703731047.6%
Grok 4.20 (Beta)735548392247.3%
Nemotron 3 Super10070630046.6%
Z.AI GLM 4.510071610046.5%
GPT-4.1 Mini10070590045.8%
Ministral 8B10064630045.5%
Z.AI GLM 510074500044.7%
Qwen 3.5 Plus (2026-02-15)100563227043.1%
GPT-4o Mini (temp=1)10058520042.0%
Mistral NeMO8662610041.8%
Mistral Large 27467560039.3%
Hermes 3 70B1009400038.9%
GPT-4.19052510038.5%
GPT-5.28358500038.1%
GPT-5.4 Mini (Reasoning, Low)67602926036.2%
Stealth: Healer Alpha10041390035.9%
Gemma 3 12B8951380035.7%
GPT-5.466572826035.5%
Mistral Small 4 (Reasoning)9445270033.3%
DeepSeek V3.210034320033.3%
MoonshotAI: Kimi K2.51006200032.3%
GPT-5 Nano52383834032.3%
Qwen 3 32B57532626032.2%
Nemotron 3 Nano1005600031.1%
Ministral 3 8B1005500031.0%
Gemini 2.5 Pro1004900029.8%
Grok 4.1 Fast1004300028.7%
Mistral Small Creative5441360026.4%
Gemini 2.5 Flash834900026.3%
DeepSeek-V2 Chat884400026.3%
Gemini 2.5 Flash (Reasoning)714900024.0%
Claude Opus 4.5595400022.5%
GPT-5.4 (Reasoning, Low)29292825022.1%
Mistral Small 4555300021.6%
o4 Mini733400021.4%
Claude Opus 44439190020.4%
ByteDance Seed 1.6100000020.0%
Z.AI GLM 4.7 Flash100000020.0%
DeepSeek V3 (2024-12-26)100000020.0%
Llama 3.1 70B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Arcee AI: Trinity Mini100000020.0%
Ministral 3 3B100000020.0%
GPT-5.4 Nano (Reasoning)682900019.4%
Claude 3 Haiku89000017.9%
Gemini 3 Flash (Preview, Reasoning)444300017.4%
o4 Mini High2929270017.0%
GPT-5.4 Mini522800016.1%
Ministral 3 14B76000015.2%
LFM2 24B69000013.9%
Claude 3.5 Sonnet67000013.3%
DeepSeek V3.166000013.2%
GPT-4o Mini (temp=0)64000012.8%
Inception Mercury 263000012.7%
GPT-4o, May 13th (temp=0)60000012.0%
Z.AI GLM 4.657000011.5%
Arcee AI: Trinity Large (Preview)55000011.0%
Hermes 3 405B54000010.8%
Qwen 3.5 Flash351900010.7%
Gemini 3.1 Flash Lite (Preview)51000010.2%
Qwen 2.5 72B51000010.2%
Llama 3.1 8B4300008.7%
Qwen 3.5 35B24180008.4%
Gemini 3.1 Pro (Preview)4100008.3%
Z.AI GLM 4.73800007.6%
Qwen 3.5 27B2100004.3%
GPT-4o, Aug. 6th (temp=0)700001.4%
Qwen 3.5 397B A17B000000.0%
Qwen 3.5 122B000000.0%
Grok 4 Fast000000.0%
Qwen 3.5 9B000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Haiku000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Inception Mercury000000.0%
Mistral Small 3.2 24B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Llama 3.1 8B1001008282072.8%
GPT-4.1 Nano1001008377072.1%
Claude Haiku 4.510010010057071.4%
Claude Opus 410010010056071.2%
Qwen 3.5 397B A17B10010086362068.6%
Mistral Medium 3.11001008043064.6%
ByteDance Seed 1.6 Flash976959443661.1%
Qwen 3 32B93856661060.8%
Mistral Large1001001000060.0%
Writer: Palmyra X510096740054.0%
MoonshotAI: Kimi K2.510085820053.3%
GPT-4o, Aug. 6th (temp=1)9691790053.3%
Rocinante 12B10094670052.2%
Mistral Small 4 (Reasoning)95744542051.1%
Gemma 3 4B100100540050.9%
GPT-5.4 Nano (Reasoning, Low)100953127050.6%
o4 Mini High100863531050.5%
Grok 4.20 (Beta)100625037049.8%
GPT-5.4 Nano (Reasoning)74595655048.8%
Gemini 2.5 Flash Lite (Reasoning)76615347047.2%
GPT-5.4 Nano10075490044.8%
Claude Sonnet 410064580044.4%
GPT-5.181534336042.7%
Qwen 2.5 72B7471650042.0%
Stealth: Hunter Alpha10057520041.8%
Mistral Small 49570380040.5%
LFM2 24B7067640040.2%
GPT-5.4 Mini8874390040.2%
ByteDance Seed 2.0 Mini10010000040.0%
Llama 3.1 70B10010000040.0%
GPT-5.4100353330039.7%
Nemotron 3 Nano8168470039.3%
GPT-5.4 (Reasoning, Low)100363030039.2%
Qwen 3.5 35B80643811038.8%
Z.AI GLM 4.68257530038.4%
DeepSeek V3.21008800037.7%
Z.AI GLM 58355500037.6%
Claude Opus 4.51008800037.5%
Mistral Large 310046410037.4%
Gemini 3 Pro (Preview)1008600037.2%
Arcee AI: Trinity Large (Preview)1008300036.7%
Gemini 2.5 Pro7567420036.7%
GPT-5.4 Mini (Reasoning, Low)10042390036.2%
GPT-5 Nano9743380035.7%
Grok 41007800035.6%
GPT-4o Mini (temp=1)1006900033.9%
Qwen 3.5 27B1006900033.8%
Ministral 3 14B1006400032.8%
Gemini 2.5 Flash (Reasoning)1006300032.7%
Ministral 3B887500032.5%
Qwen 3.5 Plus (2026-02-15)5655500032.1%
GPT-4o, May 13th (temp=1)887100031.8%
Qwen 3.5 9B1005090031.7%
DeepSeek V3 (2024-12-26)867100031.5%
GPT-55856430031.3%
Ministral 8B1005600031.2%
Grok 4 Fast1005500031.0%
GPT-5.4 Mini (Reasoning)7841350031.0%
Claude Sonnet 4.6827100030.7%
Qwen3 235B A22B Instruct 25071005200030.3%
Nemotron 3 Super935400029.4%
Mistral Small Creative1004200028.4%
Stealth: Healer Alpha1004100028.2%
Claude Opus 4.6634600021.8%
Claude 3.7 Sonnet525100020.6%
Z.AI GLM 5 Turbo100000020.0%
Grok 4.20 (Beta, Reasoning)100000020.0%
ByteDance Seed 1.6100000020.0%
MiniMax M2.7100000020.0%
GPT-4.1100000020.0%
Claude 3.5 Haiku100000020.0%
Hermes 3 405B100000020.0%
Mistral Large 2100000020.0%
Gemma 3 12B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Claude 3 Haiku100000020.0%
GPT-5.4 (Reasoning)633000018.8%
WizardLM 2 8x22b484200017.9%
GPT-4o, Aug. 6th (temp=0)82000016.4%
Hermes 3 70B79000015.9%
o4 Mini423400015.3%
Gemini 3 Flash (Preview, Reasoning)72000014.5%
GPT-4o, May 13th (temp=0)71000014.3%
Arcee AI: Trinity Mini66000013.2%
GPT-5 Mini343100013.0%
Grok 4.1 Fast64000012.8%
Gemini 2.5 Flash Lite62000012.3%
Qwen 3.5 Flash60000011.9%
GPT-5.2292900011.5%
GPT-4.1 Mini54000010.9%
Claude Sonnet 4.551000010.2%
Gemini 3.1 Pro (Preview)4900009.8%
Stealth: Aurora Alpha4900009.8%
Mistral Small 3.2 24B4520009.4%
DeepSeek V3.14600009.2%
Gemma 3 27B4200008.5%
Aion 2.03600007.2%
Z.AI GLM 4.73400006.8%
Ministral 3 8B3300006.6%
Qwen 3.5 122B2900005.7%
Claude Opus 4.6 (Reasoning)000000.0%
Claude Sonnet 4.6 (Reasoning)000000.0%
MiniMax M2.5000000.0%
Z.AI GLM 4.5000000.0%
Gemini 3.1 Flash Lite (Preview)000000.0%
Gemini 3 Flash (Preview)000000.0%
DeepSeek-V2 Chat000000.0%
Z.AI GLM 4.7 Flash000000.0%
ByteDance Seed 2.0 Lite000000.0%
Claude 3.5 Sonnet000000.0%
Inception Mercury 2000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Gemini 2.5 Flash000000.0%
Inception Mercury000000.0%
GPT-4o Mini (temp=0)000000.0%
Cohere Command R+ (Aug. 2024)000000.0%
Ministral 3 3B000000.0%
Mistral NeMO000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-4o Mini (temp=1)10010096937592.7%
Mistral NeMO10010085824782.7%
Claude Haiku 4.510010068524573.0%
Z.AI GLM 51001001000060.0%
ByteDance Seed 1.61001001000060.0%
Claude Opus 4.6 (Reasoning)91895450056.8%
Writer: Palmyra X510088840054.5%
Claude Sonnet 4100100690053.9%
Stealth: Hunter Alpha100575753053.5%
Gemma 3 27B100100650053.0%
Gemma 3 12B75686356052.2%
Z.AI GLM 4.510082780052.0%
Qwen3 235B A22B Instruct 250777686548051.7%
GPT-4o Mini (temp=0)10082660049.6%
Mistral Small 4 (Reasoning)10081620048.5%
Gemini 3 Pro (Preview)100100410048.3%
GPT-51004341282547.3%
Gemini 3.1 Flash Lite (Preview)10060600044.0%
Grok 410063560043.8%
Stealth: Healer Alpha88454433042.1%
Claude 3.7 Sonnet10056500041.1%
Claude Opus 4.510010000040.0%
Z.AI GLM 4.610010000040.0%
GPT-4o, Aug. 6th (temp=1)10010000040.0%
Hermes 3 70B10010000040.0%
Llama 3.1 8B10010000040.0%
GPT-4o, May 13th (temp=0)1009800039.6%
Gemma 3 4B7169560039.4%
o4 Mini High8575360039.2%
Ministral 3 14B1009400038.9%
Mistral Large7470500038.8%
Cohere Command R+ (Aug. 2024)1009300038.5%
GPT-5.4 Nano474640272637.3%
GPT-4.1 Nano1008100036.1%
Gemini 2.5 Flash Lite6156560034.5%
Mistral Large 27460360034.1%
DeepSeek V3.11006900033.9%
Gemini 2.5 Flash (Reasoning)6756470033.9%
GPT-5.4 (Reasoning)10037320033.7%
Z.AI GLM 5 Turbo1006700033.3%
GPT-4o, May 13th (temp=1)937100032.8%
Qwen 3 32B5654520032.4%
Qwen 3.5 397B A17B97272315032.3%
Claude Sonnet 4.6827800032.0%
GPT-5.4 (Reasoning, Low)6460340031.4%
Rocinante 12B985800031.2%
Claude Sonnet 4.6 (Reasoning)1005500031.0%
Gemini 3 Flash (Preview)1005500031.0%
MiniMax M2.7916000030.1%
Qwen 3.5 Plus (2026-02-15)1004500028.9%
Mistral Medium 3.15949350028.5%
o4 Mini1003400026.8%
GPT-5.4 Nano (Reasoning)6835290026.5%
Grok 4.20 (Beta)5345330026.4%
GPT-5.137323130026.0%
Grok 4 Fast685900025.3%
Inception Mercury 2853500024.0%
Mistral Small Creative724200023.0%
Z.AI GLM 4.7 Flash605000022.0%
DeepSeek-V2 Chat614900022.0%
GPT-5 Nano6622220022.0%
GPT-5.4743300021.4%
Z.AI GLM 4.7100000020.0%
ByteDance Seed 2.0 Mini100000020.0%
ByteDance Seed 2.0 Lite100000020.0%
Claude 3.5 Sonnet100000020.0%
Hermes 3 405B100000020.0%
DeepSeek V3 (2025-03-24)100000020.0%
Llama 3.1 70B100000020.0%
Arcee AI: Trinity Mini100000020.0%
Ministral 3B100000020.0%
LFM2 24B100000020.0%
GPT-5.4 Mini3633300019.9%
Grok 4.1 Fast593900019.6%
Mistral Small 4514600019.5%
Mistral Small 3.2 24B87640019.4%
Claude Sonnet 4.596000019.2%
Gemini 2.5 Flash Lite (Reasoning)94000018.9%
Arcee AI: Trinity Large (Preview)91000018.2%
GPT-5.4 Nano (Reasoning, Low)4027220017.9%
Mistral Large 388000017.7%
GPT-5 Mini3228270017.5%
GPT-4.1 Mini86000017.2%
Claude Opus 4483700016.9%
GPT-5.4 Mini (Reasoning)433900016.4%
DeepSeek V3 (2024-12-26)81000016.1%
MiniMax M2.571000014.3%
Qwen 3.5 35B2626170013.8%
MoonshotAI: Kimi K2.568000013.7%
GPT-4o, Aug. 6th (temp=0)68000013.7%
Claude Opus 4.665000013.0%
Gemini 2.5 Pro63000012.5%
ByteDance Seed 1.6 Flash57000011.4%
Ministral 3 8B55000011.0%
GPT-5.2272600010.7%
Aion 2.053000010.6%
DeepSeek V3.250000010.0%
GPT-4.14900009.7%
Qwen 3.5 Flash31150009.3%
Nemotron 3 Super4400008.8%
Nemotron 3 Nano3900007.8%
GPT-5.4 Mini (Reasoning, Low)3800007.6%
Qwen 3.5 27B3610007.4%
Qwen 3.5 122B1900003.9%
Gemini 3.1 Pro (Preview)000000.0%
Grok 4.20 (Beta, Reasoning)000000.0%
Gemini 3 Flash (Preview, Reasoning)000000.0%
Qwen 3.5 9B000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Haiku000000.0%
Gemini 2.5 Flash000000.0%
Inception Mercury000000.0%
Qwen 2.5 72B000000.0%
Llama 3.1 Nemotron 70B000000.0%
Claude 3 Haiku000000.0%
WizardLM 2 8x22b000000.0%
Ministral 3 3B000000.0%
Ministral 8B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Cohere Command R+ (Aug. 2024)10010098856990.4%
DeepSeek V3.21001009795078.5%
Gemma 3 4B10010010085076.9%
Writer: Palmyra X510010010068073.5%
Qwen 3.5 397B A17B1009669632670.9%
Gemini 2.5 Flash Lite (Reasoning)1009166494369.8%
GPT-4o Mini (temp=1)100888181069.8%
Arcee AI: Trinity Mini100947776069.4%
Z.AI GLM 4.6100797571065.1%
Qwen 3.5 Flash1001006148061.9%
Claude Sonnet 41001001000060.0%
Hermes 3 70B1001001000060.0%
ByteDance Seed 1.6100100910058.2%
Stealth: Healer Alpha100815742055.8%
Gemini 2.5 Flash Lite100100680053.7%
Claude 3 Haiku9378710048.4%
Qwen 3.5 27B100100410048.2%
GPT-4o Mini (temp=0)8581700047.2%
GPT-5.472705834046.9%
Grok 410075520045.3%
Gemini 3 Flash (Preview, Reasoning)10063600044.4%
GPT-4o, May 13th (temp=1)8875570044.0%
Gemini 2.5 Flash10070440042.9%
Z.AI GLM 4.7 Flash10059540042.6%
Gemma 3 12B10056550042.2%
Mistral Large 26968680041.1%
Claude Haiku 4.57566640040.9%
Z.AI GLM 510060420040.4%
Claude Sonnet 4.510010000040.0%
ByteDance Seed 2.0 Mini10010000040.0%
Gemini 3.1 Flash Lite (Preview)10010000040.0%
Ministral 8B10010000040.0%
GPT-5.4 Nano (Reasoning, Low)86552727039.1%
GPT-5.4 (Reasoning, Low)10060350038.9%
Llama 3.1 Nemotron 70B1009300038.5%
Mistral Small 4 (Reasoning)10049410038.0%
Claude Opus 4.6 (Reasoning)10051360037.3%
Qwen 3.5 Plus (2026-02-15)10048390037.3%
GPT-5 Mini10043420036.9%
GPT-5.4 Mini (Reasoning, Low)10045370036.3%
Mistral Large908900035.9%
Claude Opus 4.56755530034.8%
GPT-4.1 Mini987500034.5%
Gemma 3 27B1007000034.1%
LFM2 24B1006800033.7%
Gemini 2.5 Flash (Reasoning)1006600033.2%
Claude Opus 47056370032.8%
Z.AI GLM 4.5887400032.2%
Mistral Small Creative1005900031.8%
Gemini 3.1 Pro (Preview)896400030.7%
Claude Sonnet 4.6836800030.4%
Claude Opus 4.65451460030.2%
ByteDance Seed 1.6 Flash5746460029.8%
DeepSeek V3 (2024-12-26)816800029.6%
MiniMax M2.5746800028.5%
Gemini 3 Pro (Preview)1004200028.4%
GPT-5.4 Nano46412624027.4%
Rocinante 12B696400026.7%
Qwen3 235B A22B Instruct 2507685700024.9%
o4 Mini794300024.4%
GPT-5.4 Nano (Reasoning)5233270022.3%
Mistral Medium 3.1515000020.0%
Qwen 3.5 122B100000020.0%
MoonshotAI: Kimi K2.5100000020.0%
Mistral Large 3100000020.0%
GPT-4o, May 13th (temp=0)100000020.0%
ByteDance Seed 2.0 Lite100000020.0%
Claude 3.5 Sonnet100000020.0%
GPT-4o, Aug. 6th (temp=1)100000020.0%
GPT-5 Nano100000020.0%
Mistral Small 3.2 24B100000020.0%
Llama 3.1 70B100000020.0%
Nemotron 3 Nano100000020.0%
Mistral Small 4100000020.0%
Qwen 2.5 72B100000020.0%
Arcee AI: Trinity Large (Preview)100000020.0%
Ministral 3 14B100000020.0%
Mistral NeMO100000020.0%
GPT-5.4 (Reasoning)633500019.6%
MiniMax M2.7504500018.9%
Z.AI GLM 4.791000018.2%
Stealth: Aurora Alpha77000015.4%
GPT-4.1 Nano77000015.4%
Llama 3.1 8B76000015.2%
DeepSeek V3.175000014.9%
Qwen 3 32B74000014.7%
Ministral 3B71000014.2%
GPT-5.4 Mini432500013.5%
GPT-5.2362900013.0%
DeepSeek-V2 Chat63000012.7%
GPT-5302900011.8%
Grok 4.20 (Beta)401800011.6%
Gemini 2.5 Pro54000010.9%
Claude 3.7 Sonnet54000010.8%
Qwen 3.5 9B52000010.4%
Grok 4.20 (Beta, Reasoning)4900009.7%
Gemini 3 Flash (Preview)4600009.3%
Grok 4 Fast4500009.0%
Inception Mercury 24200008.4%
Stealth: Hunter Alpha4100008.2%
Aion 2.03900007.8%
o4 Mini High3300006.5%
WizardLM 2 8x22b2700005.3%
Qwen 3.5 35B400000.7%
Z.AI GLM 5 Turbo000000.0%
Claude Sonnet 4.6 (Reasoning)000000.0%
GPT-5.1000000.0%
GPT-5.4 Mini (Reasoning)000000.0%
Grok 4.1 Fast000000.0%
GPT-4.1000000.0%
Nemotron 3 Super000000.0%
Claude 3.5 Haiku000000.0%
Hermes 3 405B000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Inception Mercury000000.0%
Ministral 3 8B000000.0%
Ministral 3 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Cohere Command R+ (Aug. 2024)10010010091078.2%
Gemini 2.5 Flash (Reasoning)10010010088077.5%
Rocinante 12B10010010079075.7%
Gemma 3 4B10010010059071.8%
Qwen3 235B A22B Instruct 2507989465583870.7%
Claude 3 Haiku100987772069.5%
GPT-4o, May 13th (temp=1)1001006152062.6%
GPT-4o Mini (temp=0)100995750061.2%
Gemini 3.1 Flash Lite (Preview)1001001000060.0%
Llama 3.1 Nemotron 70B1001001000060.0%
Mistral NeMO100100960059.2%
Gemini 3 Flash (Preview, Reasoning)100100900058.0%
Qwen 3.5 397B A17B1009555202057.9%
Gemini 2.5 Flash Lite (Reasoning)100100860057.2%
Gemma 3 27B99775343054.3%
GPT-4o, May 13th (temp=0)97595855053.8%
Gemma 3 12B10097690053.3%
Qwen 3.5 Plus (2026-02-15)100634846051.3%
WizardLM 2 8x22b854847332948.4%
Z.AI GLM 4.774666438048.2%
Writer: Palmyra X510090420046.4%
Arcee AI: Trinity Large (Preview)10069600045.8%
Gemini 2.5 Flash Lite10060600044.1%
Stealth: Healer Alpha9176390041.1%
DeepSeek-V2 Chat10010000040.0%
GPT-4o Mini (temp=1)10010000040.0%
Llama 3.1 8B10010000040.0%
Qwen 3 32B1009800039.6%
Mistral Small 48867430039.6%
Z.AI GLM 5 Turbo58524441039.1%
GPT-5706121201838.3%
MiniMax M2.57560530037.5%
Grok 4.20 (Beta)72652621036.9%
GPT-5.4 Nano (Reasoning)8775210036.7%
GPT-5.4 Nano66474422035.8%
GPT-4o, Aug. 6th (temp=1)1007500034.9%
Claude 3.5 Sonnet1007400034.7%
Nemotron 3 Nano1007200034.4%
GPT-4.1 Nano1006700033.3%
Claude Sonnet 4.61006600033.2%
Claude 3.7 Sonnet8540400033.0%
Claude Sonnet 4.6 (Reasoning)1006500033.0%
Mistral Large 21006500033.0%
DeepSeek V3.1947000032.9%
GPT-5.4 Mini (Reasoning)8344370032.8%
Z.AI GLM 4.7 Flash857200031.3%
Gemini 3.1 Pro (Preview)45413832031.3%
Claude Opus 4.67448340031.2%
GPT-5.4443028272631.2%
Claude Sonnet 45655400030.1%
GPT-5 Nano61402920030.0%
o4 Mini High8534300029.9%
Qwen 2.5 72B6346410029.8%
Gemini 2.5 Pro6447370029.6%
GPT-5.4 (Reasoning, Low)8630270028.6%
Mistral Medium 3.11004200028.4%
Mistral Small Creative1003900027.8%
Ministral 8B716700027.6%
Z.AI GLM 4.6874700026.8%
GPT-4.14544420026.3%
Gemini 2.5 Flash795000025.9%
GPT-5 Mini43292822024.4%
Stealth: Aurora Alpha5240290024.2%
Claude Opus 45531310023.5%
Mistral Small 4 (Reasoning)4834330022.9%
GPT-5.4 Nano (Reasoning, Low)4745190022.4%
DeepSeek V3 (2024-12-26)575400022.4%
GPT-5.4 (Reasoning)555400021.9%
GPT-5.144232120021.8%
Claude Opus 4.6 (Reasoning)4037310021.8%
Grok 4.20 (Beta, Reasoning)3838300021.2%
MoonshotAI: Kimi K2.5100000020.0%
ByteDance Seed 1.6100000020.0%
ByteDance Seed 2.0 Mini100000020.0%
DeepSeek V3.2100000020.0%
DeepSeek V3 (2025-03-24)100000020.0%
Hermes 3 70B100000020.0%
Arcee AI: Trinity Mini100000020.0%
Llama 3.1 70B98000019.6%
Grok 497000019.4%
Ministral 3 8B494400018.6%
Claude Opus 4.592000018.3%
Stealth: Hunter Alpha533600017.9%
Gemini 3 Flash (Preview)474100017.5%
Z.AI GLM 5454300017.5%
Claude Sonnet 4.5473700016.9%
Hermes 3 405B83000016.7%
Claude Haiku 4.5404000016.1%
Grok 4 Fast473200015.9%
o4 Mini2926210015.3%
Qwen 3.5 122B373600014.8%
Gemini 3 Pro (Preview)423000014.4%
Z.AI GLM 4.567000013.3%
GPT-5.4 Mini66000013.2%
Qwen 3.5 9B65000013.0%
Mistral Large63000012.7%
GPT-4.1 Mini63000012.5%
Ministral 3 14B54000010.9%
Mistral Large 353000010.5%
Inception Mercury 2282300010.1%
Grok 4.1 Fast4500009.1%
MiniMax M2.73900007.8%
ByteDance Seed 1.6 Flash3800007.6%
Qwen 3.5 35B3100006.3%
GPT-5.4 Mini (Reasoning, Low)3100006.3%
GPT-5.22300004.7%
Qwen 3.5 Flash2100004.2%
Inception Mercury1200002.4%
Qwen 3.5 27B000000.0%
Aion 2.0000000.0%
ByteDance Seed 2.0 Lite000000.0%
Nemotron 3 Super000000.0%
Claude 3.5 Haiku000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Mistral Small 3.2 24B000000.0%
Ministral 3 3B000000.0%
Ministral 3B000000.0%
LFM2 24B000000.0%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Qwen 3.5 397B A17B1001001001006091.9%
Gemini 3.1 Pro (Preview)100100100635283.1%
GPT-4o Mini (temp=0)1007977756980.1%
Z.AI GLM 4.5100100100100080.0%
Stealth: Hunter Alpha10010093643979.3%
GPT-5.4 (Reasoning)1009491604578.2%
GPT-5.11009085793277.3%
Claude Opus 410010066535274.1%
Claude Sonnet 41001008671071.5%
Claude 3.7 Sonnet10010010052070.4%
GPT-4o Mini (temp=1)1001007866068.8%
Z.AI GLM 4.71001009546068.3%
GPT-5.4 Nano1009774412968.2%
ByteDance Seed 1.6 Flash10010010038067.5%
Qwen 3.5 Plus (2026-02-15)1001007062066.4%
GPT-5.21009977282765.9%
GPT-5.4 (Reasoning, Low)1008365502765.0%
Z.AI GLM 4.7 Flash1001006357064.0%
Qwen 3.5 9B1001006355063.6%
Z.AI GLM 5 Turbo100827260062.9%
Claude Sonnet 4.579777768060.3%
Qwen 3.5 27B1001001000060.0%
Gemini 3 Flash (Preview, Reasoning)1001001000060.0%
Z.AI GLM 4.61001001000060.0%
Qwen 3.5 35B1001001000060.0%
Hermes 3 405B1001001000060.0%
Gemini 2.5 Flash Lite1001001000060.0%
Rocinante 12B1001001000060.0%
Z.AI GLM 5100716861060.0%
Gemini 2.5 Pro100865653058.9%
Claude Opus 4.5100716359058.7%
Gemini 2.5 Flash1005050474558.2%
GPT-5 Mini1007544373457.9%
GPT-4o, Aug. 6th (temp=1)10096890057.1%
GPT-5.4 Mini (Reasoning, Low)100896229055.9%
Arcee AI: Trinity Mini100100750054.9%
ByteDance Seed 2.0 Mini10094740053.6%
Cohere Command R+ (Aug. 2024)10098680053.3%
Claude Opus 4.6100565149051.1%
GPT-4.1 Mini8585760049.0%
Mistral Large100100430048.6%
Gemini 2.5 Flash (Reasoning)10094450047.8%
GPT-5.4 Nano (Reasoning)665249432547.0%
MoonshotAI: Kimi K2.59578600046.6%
Mistral Small 410089420046.3%
Claude Sonnet 4.6 (Reasoning)10068580045.1%
GPT-5.4 Mini (Reasoning)993531302945.0%
GPT-5.4 Nano (Reasoning, Low)734739372644.4%
Gemma 3 12B10066540044.0%
Mistral Large 310076440044.0%
Claude Sonnet 4.610061590044.0%
GPT-4.110061550043.2%
MiniMax M2.510059570043.1%
Grok 4.20 (Beta)10077360042.6%
Qwen3 235B A22B Instruct 25078875490042.3%
DeepSeek V3 (2024-12-26)8364620041.8%
Claude Opus 4.6 (Reasoning)8862530040.7%
Qwen 3.5 122B10063400040.6%
Claude 3.5 Sonnet10010000040.0%
DeepSeek V3.110010000040.0%
Llama 3.1 70B10010000040.0%
Llama 3.1 Nemotron 70B10010000040.0%
GPT-5.48579320039.2%
Gemini 3 Pro (Preview)10051420038.5%
Gemini 2.5 Flash Lite (Reasoning)53524833037.0%
Hermes 3 70B1008500036.9%
GPT-5.4 Mini1008400036.8%
GPT-4.1 Nano868300033.9%
Writer: Palmyra X55854520032.9%
Ministral 3 14B6859370032.9%
Gemini 3 Flash (Preview)1006400032.8%
DeepSeek V3 (2025-03-24)1006200032.3%
DeepSeek-V2 Chat1005700031.4%
Claude 3 Haiku817600031.3%
Qwen 3.5 Flash817500031.2%
Mistral Large 21005600031.1%
GPT-4o, Aug. 6th (temp=0)797400030.6%
Aion 2.0945100029.0%
Qwen 3 32B826000028.3%
Claude Haiku 4.5706500027.1%
o4 Mini874800026.9%
GPT-5 Nano7528240025.4%
Stealth: Healer Alpha794700025.2%
Ministral 3 8B6842100023.9%
Mistral NeMO635600023.7%
Mistral Medium 3.1634400021.5%
Grok 4.1 Fast535200020.9%
DeepSeek V3.2515000020.1%
ByteDance Seed 1.6100000020.0%
MiniMax M2.7100000020.0%
ByteDance Seed 2.0 Lite100000020.0%
Llama 3.1 8B100000020.0%
Ministral 3B100000020.0%
Ministral 3 3B96000019.2%
WizardLM 2 8x22b93000018.5%
GPT-5494300018.3%
Gemma 3 4B86000017.2%
Mistral Small Creative81000016.1%
Mistral Small 4 (Reasoning)77000015.3%
o4 Mini High403400014.8%
Arcee AI: Trinity Large (Preview)74000014.7%
Ministral 8B71000014.2%
Gemma 3 27B69000013.9%
Grok 4.20 (Beta, Reasoning)323100012.6%
GPT-4o, May 13th (temp=1)60000012.0%
Nemotron 3 Super58000011.6%
Grok 4 Fast54000010.9%
GPT-4o, May 13th (temp=0)54000010.9%
LFM2 24B51000010.1%
Nemotron 3 Nano5000009.9%
Mistral Small 3.2 24B200000.4%
Grok 4000000.0%
Inception Mercury 2000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Haiku000000.0%
Inception Mercury000000.0%
Qwen 2.5 72B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.4100100100100100100.0%
Z.AI GLM 4.71001001001009098.1%
Writer: Palmyra X51001001001007294.4%
Claude Opus 4.6 (Reasoning)100100100906991.8%
GPT-5.4 (Reasoning, Low)100100100955890.5%
Claude Opus 4.5100100100795586.9%
GPT-5.4 Nano (Reasoning, Low)999999845286.4%
Ministral 3 14B10010094736285.8%
Claude Opus 4.610010074717083.1%
Qwen 3 32B100100100575482.2%
Claude Sonnet 4.6 (Reasoning)10010087685381.5%
Z.AI GLM 510010091753480.1%
GPT-5.4 Nano100100100533777.9%
Gemma 3 27B10010091534577.9%
Gemini 2.5 Flash (Reasoning)1008277715877.8%
Qwen 3.5 397B A17B10010074664877.6%
GPT-5 Nano10010088722276.4%
GPT-5.4 Nano (Reasoning)100100100631976.3%
Qwen 3.5 Plus (2026-02-15)100100100413875.7%
GPT-51009865555474.4%
Gemini 3 Pro (Preview)1009383563573.4%
GPT-5.11009474534272.6%
Gemini 3.1 Pro (Preview)1009769642971.9%
Nemotron 3 Nano10010010051070.2%
GPT-4.110010010044068.8%
Llama 3.1 8B1001007568068.4%
GPT-4o, Aug. 6th (temp=1)1001007268068.2%
Mistral Small Creative1001009835066.6%
Gemini 2.5 Flash1001009733066.0%
Mistral Medium 3.11001006960065.9%
GPT-5 Mini10010010027065.4%
GPT-5.4 Mini (Reasoning, Low)1001006359064.3%
Claude Sonnet 4686866615664.0%
MiniMax M2.71001007938063.4%
Gemini 2.5 Flash Lite (Reasoning)1001008234063.2%
GPT-5.4 (Reasoning)1006060524262.9%
Mistral Small 4 (Reasoning)1001005955062.8%
DeepSeek V3 (2025-03-24)100796760061.1%
GPT-5.21001006739061.0%
Ministral 3 8B1006453434160.3%
Gemini 3 Flash (Preview)878443434259.9%
GPT-5.4 Mini797371502659.7%
Llama 3.1 Nemotron 70B100100960059.2%
Qwen 3.5 Flash1001006531059.0%
GPT-4o Mini (temp=1)100726360058.9%
Gemma 3 12B100965345058.8%
Stealth: Healer Alpha100877531058.7%
Qwen 3.5 35B100817634058.2%
MoonshotAI: Kimi K2.5100756051057.2%
GPT-5.4 Mini (Reasoning)100847526056.8%
Grok 41001004341056.8%
Gemini 3 Flash (Preview, Reasoning)846945453856.2%
Gemini 3.1 Flash Lite (Preview)100645754055.1%
Claude Sonnet 4.695646352054.8%
Z.AI GLM 5 Turbo97765640054.0%
Mistral Large 2100636342053.5%
Qwen 3.5 9B100765031051.4%
GPT-4.1 Nano10075720049.4%
Mistral Large 310081650049.1%
GPT-4o, Aug. 6th (temp=0)10075700049.0%
Qwen 3.5 27B100100390047.8%
Mistral Small 4100100350047.1%
DeepSeek V3.1100504240046.3%
Stealth: Aurora Alpha100100310046.2%
Stealth: Hunter Alpha82595535046.2%
Gemma 3 4B10065630045.6%
ByteDance Seed 1.610068570045.2%
GPT-4o, May 13th (temp=1)10067590045.1%
Mistral Large10072500044.4%
Qwen 3.5 122B100443936043.9%
Arcee AI: Trinity Mini7968650042.6%
ByteDance Seed 1.6 Flash10064480042.3%
o4 Mini100393635041.9%
Gemini 2.5 Pro8684360041.3%
Aion 2.071663533041.0%
Qwen3 235B A22B Instruct 25079474350040.7%
Claude 3.5 Haiku10010000040.0%
Rocinante 12B10010000040.0%
Cohere Command R+ (Aug. 2024)8361560040.0%
GPT-4o Mini (temp=0)7264590039.1%
Grok 4.20 (Beta)77543427038.5%
Arcee AI: Trinity Large (Preview)7663530038.3%
Gemini 2.5 Flash Lite10047430038.0%
Z.AI GLM 4.7 Flash8755440037.2%
GPT-4.1 Mini6355540034.5%
ByteDance Seed 2.0 Mini1006800033.7%
Z.AI GLM 4.56154520033.4%
DeepSeek V3 (2024-12-26)1006700033.3%
LFM2 24B1006500033.0%
o4 Mini High60413430032.9%
Grok 4.20 (Beta, Reasoning)10033300032.5%
Nemotron 3 Super1006000032.0%
Hermes 3 70B1005800031.6%
Ministral 8B1005800031.6%
ByteDance Seed 2.0 Lite857000031.0%
Mistral NeMO826600029.6%
Claude Sonnet 4.51004500028.9%
Z.AI GLM 4.65847380028.5%
Ministral 3 3B706300026.7%
Grok 4 Fast4843390026.0%
Claude 3.7 Sonnet4843360025.5%
GPT-4o, May 13th (temp=0)685500024.5%
Inception Mercury100600021.2%
DeepSeek-V2 Chat545200021.2%
Hermes 3 405B100000020.0%
Llama 3.1 70B100000020.0%
Claude Haiku 4.5544200019.2%
Inception Mercury 295000019.0%
DeepSeek V3.23530280018.7%
MiniMax M2.5494300018.3%
Claude 3.5 Sonnet88000017.5%
Grok 4.1 Fast533400017.2%
WizardLM 2 8x22b57000011.5%
Claude Opus 4000000.0%
Mistral Small 3.2 24B000000.0%
Qwen 2.5 72B000000.0%
Claude 3 Haiku000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Mistral Small Creative1009582715480.6%
Ministral 3 8B100100100100080.0%
Mistral Large10010010096079.2%
Llama 3.1 Nemotron 70B10010010091078.2%
ByteDance Seed 1.6 Flash100969690076.5%
Grok 4.20 (Beta, Reasoning)10010081424273.1%
Gemini 3.1 Pro (Preview)1008180544471.8%
Nemotron 3 Super10010010053070.5%
GPT-5.4 Mini (Reasoning)1009999312270.1%
GPT-5.4 Mini (Reasoning, Low)1008482562669.6%
Ministral 8B10010010048069.5%
Grok 4 Fast10010010046069.2%
GPT-5.4 Mini10010062572568.9%
Grok 4.20 (Beta)1001007553065.6%
Qwen 3.5 397B A17B1008353493964.9%
MoonshotAI: Kimi K2.51001006161064.4%
Qwen 3 32B1001001000060.0%
Mistral Medium 3.11001001000060.0%
Nemotron 3 Nano1001001000060.0%
GPT-5.490896843058.4%
Stealth: Hunter Alpha100965441058.2%
GPT-5.4 Nano1007344383557.9%
Gemini 2.5 Flash Lite (Reasoning)92915340055.1%
GPT-5.21006447422054.7%
Claude Opus 4100100660053.2%
Rocinante 12B82786637052.6%
Grok 4100545450051.6%
Claude Sonnet 4100100570051.5%
Mistral Small 4100684537050.1%
Z.AI GLM 4.7100534947049.7%
GPT-5.4 Nano (Reasoning)1001002521049.3%
Qwen 3.5 27B100100460049.2%
GPT-5.4 (Reasoning, Low)98744825049.0%
Ministral 3 14B100100420048.5%
o4 Mini10086430045.9%
Stealth: Aurora Alpha65625050045.2%
Qwen 3.5 9B10061590044.0%
Qwen 3.5 122B8271660043.8%
Gemini 3 Flash (Preview, Reasoning)7668680042.5%
Claude 3.7 Sonnet10057520041.8%
Gemini 3.1 Flash Lite (Preview)10056510041.4%
GPT-5 Mini10064420041.1%
GPT-575683327040.6%
Gemini 2.5 Flash Lite10054480040.5%
ByteDance Seed 1.610010000040.0%
Qwen 3.5 Flash10010000040.0%
Claude 3.5 Haiku10010000040.0%
Arcee AI: Trinity Large (Preview)10010000040.0%
Arcee AI: Trinity Mini10010000040.0%
Ministral 3B10010000040.0%
Mistral Large 210052480039.8%
Grok 4.1 Fast1009900039.8%
Llama 3.1 8B1009800039.6%
GPT-5.4 Nano (Reasoning, Low)100522422039.5%
Hermes 3 70B1009600039.2%
Claude Opus 4.69653450038.8%
Hermes 3 405B1009300038.5%
GPT-4.1 Mini1009100038.2%
Qwen 2.5 72B1008200036.4%
GPT-5.4 (Reasoning)8466270035.6%
Inception Mercury 26558470034.0%
GPT-4o, Aug. 6th (temp=1)868300033.9%
GPT-5.1712523232232.7%
ByteDance Seed 2.0 Mini1006300032.7%
DeepSeek V3 (2025-03-24)937000032.6%
Qwen3 235B A22B Instruct 2507986100031.8%
GPT-4o, May 13th (temp=1)827200030.9%
o4 Mini High5151500030.2%
Claude Sonnet 4.6786800029.3%
Gemini 3 Pro (Preview)1004600029.3%
DeepSeek V3.15049430028.3%
Gemini 2.5 Flash5447390028.1%
GPT-5 Nano6140340026.8%
Mistral Large 3686000025.4%
DeepSeek V3.2893700025.2%
Mistral NeMO824100024.5%
Aion 2.04140380023.8%
Gemma 3 27B535200021.1%
Z.AI GLM 5100000020.0%
Claude Opus 4.5100000020.0%
MiniMax M2.5100000020.0%
Gemini 2.5 Flash (Reasoning)100000020.0%
Inception Mercury100000020.0%
GPT-4o Mini (temp=1)100000020.0%
Llama 3.1 70B100000020.0%
WizardLM 2 8x22b100000020.0%
Gemma 3 4B100000020.0%
ByteDance Seed 2.0 Lite98000019.6%
Z.AI GLM 4.7 Flash494800019.4%
Cohere Command R+ (Aug. 2024)96000019.2%
Stealth: Healer Alpha514200018.5%
Qwen 3.5 35B464200017.7%
Claude Opus 4.6 (Reasoning)434100016.8%
Claude 3 Haiku83000016.7%
GPT-4o, Aug. 6th (temp=0)81000016.1%
Gemini 2.5 Pro80000016.0%
Qwen 3.5 Plus (2026-02-15)75000014.9%
Writer: Palmyra X569000013.9%
Z.AI GLM 5 Turbo68000013.7%
Z.AI GLM 4.568000013.7%
Claude Haiku 4.568000013.7%
Claude Sonnet 4.6 (Reasoning)68000013.5%
GPT-4o, May 13th (temp=0)66000013.2%
Gemini 3 Flash (Preview)66000013.2%
DeepSeek-V2 Chat60000012.0%
Claude Sonnet 4.559000011.8%
Gemma 3 12B58000011.6%
GPT-4.157000011.4%
DeepSeek V3 (2024-12-26)56000011.1%
Mistral Small 3.2 24B56000011.1%
Z.AI GLM 4.652000010.3%
MiniMax M2.74600009.2%
Mistral Small 4 (Reasoning)4200008.3%
Ministral 3 3B3100006.2%
Claude 3.5 Sonnet000000.0%
GPT-4o Mini (temp=0)000000.0%
GPT-4.1 Nano000000.0%
LFM2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Qwen 3.5 397B A17B1001001001004889.5%
Claude Sonnet 4.6100100100726988.4%
Qwen 3 32B10010094934386.0%
GPT-5.4 Nano100100100572376.2%
Mistral NeMO10010010081076.1%
GPT-5.4 (Reasoning)10010097542975.9%
GPT-5 Mini1008875644073.5%
GPT-4o Mini (temp=1)100918682071.8%
GPT-5.41001008271070.6%
Gemini 3.1 Pro (Preview)1001009452069.2%
Gemini 2.5 Flash Lite (Reasoning)10010010045069.0%
GPT-5.4 Nano (Reasoning, Low)1008887442268.3%
GPT-5.4 (Reasoning, Low)10010083282567.2%
Grok 4.20 (Beta, Reasoning)10010010036067.2%
Grok 4 Fast1001007263067.0%
GPT-5.4 Mini100817760063.6%
Qwen 3.5 35B1001005348060.3%
Claude 3.7 Sonnet1001001000060.0%
Hermes 3 70B1001001000060.0%
GPT-4o, Aug. 6th (temp=1)100100930058.5%
Claude 3 Haiku100100930058.5%
Mistral Small Creative100686757058.2%
Aion 2.090885555057.5%
Qwen 3.5 Flash100100810056.2%
Gemini 2.5 Flash (Reasoning)9595880055.8%
ByteDance Seed 2.0 Mini10091830054.8%
Rocinante 12B100100710054.3%
ByteDance Seed 1.6100100680053.7%
GPT-4.1 Nano10086760052.4%
GPT-5 Nano100745730052.2%
Claude Opus 4100100600052.0%
Gemini 2.5 Flash100664745051.6%
GPT-5965750292551.5%
Qwen 3.5 122B100100550051.0%
GPT-5.4 Mini (Reasoning)100823932050.8%
Mistral Small 4 (Reasoning)100100540050.8%
Stealth: Healer Alpha100100510050.2%
ByteDance Seed 1.6 Flash686449373249.9%
Qwen 3.5 Plus (2026-02-15)75635951049.4%
DeepSeek V3.2100100440048.8%
GPT-4o, May 13th (temp=0)10079600047.8%
Gemini 3 Flash (Preview)72575650047.2%
Z.AI GLM 4.58685640047.0%
o4 Mini High88525040046.0%
Claude Sonnet 4.510069570045.4%
Gemini 2.5 Flash Lite9279530044.7%
Mistral Large 310067520043.6%
Grok 4.1 Fast78544240042.7%
Writer: Palmyra X557575345042.5%
Claude Opus 4.6 (Reasoning)72514542042.0%
GPT-5.4 Nano (Reasoning)10064430041.4%
Qwen3 235B A22B Instruct 25077866630041.3%
Z.AI GLM 5 Turbo10061440041.0%
MoonshotAI: Kimi K2.510010000040.0%
DeepSeek-V2 Chat10010000040.0%
Z.AI GLM 4.7 Flash10010000040.0%
Claude 3.5 Sonnet10010000040.0%
Llama 3.1 Nemotron 70B10010000040.0%
GPT-5.2100512523039.7%
Z.AI GLM 51009400038.9%
Qwen 3.5 27B7068560038.6%
o4 Mini8954450037.6%
Gemma 3 12B6861580037.3%
DeepSeek V3 (2024-12-26)1008300036.7%
Grok 4.20 (Beta)10048340036.3%
Gemini 2.5 Pro49464540036.0%
Qwen 2.5 72B1007800035.6%
Cohere Command R+ (Aug. 2024)1007700035.4%
Z.AI GLM 4.66160550035.2%
GPT-5.4 Mini (Reasoning, Low)10043300034.5%
Inception Mercury 21007000034.0%
Ministral 3 8B6854460033.7%
Hermes 3 405B1006600033.2%
DeepSeek V3.11006600033.2%
GPT-4o Mini (temp=0)887700032.9%
Arcee AI: Trinity Large (Preview)1006200032.3%
Nemotron 3 Nano1005500031.0%
GPT-4.11005300030.5%
Ministral 3 14B1005300030.5%
GPT-4o, May 13th (temp=1)777500030.3%
Gemini 3 Pro (Preview)5951410030.1%
Grok 41005000030.0%
Gemma 3 27B895400028.7%
GPT-5.163292625028.7%
Z.AI GLM 4.75346420028.2%
Mistral Small 41004100028.1%
Stealth: Hunter Alpha5145430027.9%
Nemotron 3 Super776200027.7%
Claude Opus 4.64946420027.4%
Claude Haiku 4.5785700027.1%
Mistral Small 3.2 24B645680025.5%
MiniMax M2.5725300025.0%
Mistral Large753900022.8%
Qwen 3.5 9B712900020.1%
Gemini 3 Flash (Preview, Reasoning)100000020.0%
GPT-4o, Aug. 6th (temp=0)100000020.0%
Llama 3.1 70B100000020.0%
WizardLM 2 8x22b100000020.0%
Arcee AI: Trinity Mini100000020.0%
Llama 3.1 8B100000020.0%
ByteDance Seed 2.0 Lite98000019.6%
Claude Opus 4.5524400019.2%
Mistral Medium 3.1464400018.0%
LFM2 24B85000016.9%
Ministral 3B83000016.7%
Gemma 3 4B72000014.5%
Mistral Large 271000014.3%
Ministral 8B68000013.5%
Stealth: Aurora Alpha363000013.4%
MiniMax M2.763000012.7%
Ministral 3 3B55000011.0%
Claude Sonnet 4.6 (Reasoning)5000009.9%
Claude Sonnet 4000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Inception Mercury000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Llama 3.1 8B10010010093078.5%
GPT-5.4 Mini (Reasoning, Low)1009765585374.5%
GPT-5.4 (Reasoning, Low)1009466535373.4%
Gemma 3 12B10010010065073.0%
Qwen 3 32B96938883072.1%
MoonshotAI: Kimi K2.510010010054070.9%
ByteDance Seed 1.6 Flash10010010054070.9%
GPT-5.410010093372170.4%
MiniMax M2.71007957575469.4%
GPT-5.4 Mini (Reasoning)10010088302869.2%
Gemini 2.5 Flash (Reasoning)1001007665068.2%
Qwen 3.5 397B A17B10010010032066.3%
Rocinante 12B1001007952066.3%
Z.AI GLM 5 Turbo1001006357064.0%
Aion 2.01009154423063.3%
GPT-5.4 (Reasoning)10010044432863.0%
Qwen 3.5 35B1001007040062.0%
GPT-5.11008452472461.3%
Grok 4 Fast1001005449060.7%
Grok 494916355060.5%
Hermes 3 70B1001001000060.0%
Mistral Small 4838245453958.6%
Mistral Small Creative100776548058.0%
ByteDance Seed 2.0 Mini100100830056.7%
Hermes 3 405B100100770055.4%
DeepSeek-V2 Chat100100690053.9%
Gemma 3 27B100595651053.2%
Gemini 2.5 Pro100874138053.2%
Gemini 3.1 Pro (Preview)100100560051.2%
DeepSeek V3.2100100550051.0%
ByteDance Seed 1.610077760050.5%
Claude Opus 4.563636362050.3%
LFM2 24B10076680048.9%
Ministral 3 8B100962917048.4%
Stealth: Hunter Alpha9695460047.5%
GPT-5 Nano864241333046.4%
Z.AI GLM 4.510068600045.6%
Mistral Large10075510045.1%
GPT-5.4 Nano946823231845.1%
Mistral Small 4 (Reasoning)100593329044.3%
Claude Opus 4.6 (Reasoning)9180490044.0%
Nemotron 3 Nano10060600043.8%
Qwen 3.5 Plus (2026-02-15)7871690043.8%
GPT-5.4 Mini80585427043.7%
Gemini 3.1 Flash Lite (Preview)9965530043.3%
Claude 3.7 Sonnet10056540042.0%
ByteDance Seed 2.0 Lite10010000040.0%
Ministral 3B1009800039.6%
GPT-5.2656228211938.9%
GPT-5.4 Nano (Reasoning)77494623038.7%
o4 Mini61504934038.6%
GPT-4o, Aug. 6th (temp=1)1009300038.5%
Ministral 8B78494321037.9%
Claude 3 Haiku1008900037.9%
GPT-5.4 Nano (Reasoning, Low)1008800037.7%
Qwen 3.5 Flash6755550035.4%
Stealth: Aurora Alpha1007500035.0%
Writer: Palmyra X51006700033.3%
Mistral Large 31006100032.2%
DeepSeek V3 (2024-12-26)1006100032.2%
Qwen 3.5 122B1006000032.0%
Claude Sonnet 41006000031.9%
Ministral 3 3B916800031.9%
GPT-5 Mini7545350030.8%
Qwen 2.5 72B797100030.2%
Z.AI GLM 4.65353410029.2%
Z.AI GLM 4.71004100028.2%
GPT-5825400027.1%
GPT-4.1675800025.0%
Arcee AI: Trinity Large (Preview)685000023.5%
Gemini 3 Flash (Preview, Reasoning)635300023.3%
Gemini 2.5 Flash Lite644800022.4%
Claude Opus 4.6575000021.3%
Gemini 2.5 Flash Lite (Reasoning)555000020.9%
DeepSeek V3.1545100020.9%
Claude Opus 4544800020.4%
Z.AI GLM 5544700020.2%
Claude Sonnet 4.6 (Reasoning)100000020.0%
Qwen 3.5 27B100000020.0%
Claude Sonnet 4.5100000020.0%
GPT-4o, May 13th (temp=1)100000020.0%
Claude 3.5 Haiku100000020.0%
GPT-4.1 Mini100000020.0%
Llama 3.1 70B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Ministral 3 14B100000020.0%
WizardLM 2 8x22b100000020.0%
Cohere Command R+ (Aug. 2024)100000020.0%
Gemma 3 4B100000020.0%
Mistral NeMO100000020.0%
o4 Mini High524700019.9%
Arcee AI: Trinity Mini98000019.6%
DeepSeek V3 (2025-03-24)94000018.9%
GPT-4.1 Nano94000018.9%
Claude 3.5 Sonnet89000017.9%
Z.AI GLM 4.7 Flash79000015.9%
Grok 4.20 (Beta)413300014.8%
Gemini 3 Pro (Preview)413300014.8%
GPT-4o Mini (temp=1)70000014.1%
GPT-4o Mini (temp=0)69000013.9%
GPT-4o, May 13th (temp=0)68000013.5%
Claude Haiku 4.567000013.3%
Nemotron 3 Super67000013.3%
Gemini 3 Flash (Preview)63000012.5%
Qwen3 235B A22B Instruct 250753000010.5%
Grok 4.1 Fast4500008.9%
Stealth: Healer Alpha4400008.8%
Mistral Medium 3.14200008.3%
Grok 4.20 (Beta, Reasoning)3400006.8%
Inception Mercury1700003.4%
Claude Sonnet 4.6000000.0%
MiniMax M2.5000000.0%
Qwen 3.5 9B000000.0%
Inception Mercury 2000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Mistral Large 2000000.0%
Gemini 2.5 Flash000000.0%
Mistral Small 3.2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 35B1001001001008196.3%
GPT-5.4 (Reasoning)100100100908695.2%
Qwen 3.5 Flash1001001001007194.3%
Qwen 3.5 27B1001001001005991.8%
Gemini 3 Flash (Preview, Reasoning)1001001001005390.5%
GPT-5 Nano10010092766586.6%
GPT-5.4 Nano (Reasoning, Low)100100100664782.6%
o4 Mini High100100100684181.7%
Hermes 3 405B100100100100080.0%
Hermes 3 70B100100100100080.0%
Qwen 3.5 122B10010093574779.4%
Qwen 3.5 Plus (2026-02-15)10010010097079.4%
GPT-5.4 (Reasoning, Low)100100100722579.4%
Gemini 3.1 Pro (Preview)10010010096079.2%
LFM2 24B868578686776.8%
Claude Sonnet 410010010083076.7%
Gemini 3.1 Flash Lite (Preview)10010010077075.4%
Z.AI GLM 4.7 Flash10010010077075.4%
GPT-5.4 Mini (Reasoning, Low)100100100323172.7%
o4 Mini1009983423972.6%
MoonshotAI: Kimi K2.510010010057071.5%
Claude Opus 4.6 (Reasoning)100968968070.4%
GPT-5.4 Mini (Reasoning)10010063543169.6%
Gemini 3 Pro (Preview)1008578444069.5%
Qwen 3.5 9B1001008957069.4%
Gemini 2.5 Flash Lite (Reasoning)10010066403668.4%
Stealth: Hunter Alpha1009072433668.2%
Grok 4.20 (Beta, Reasoning)97927473067.1%
ByteDance Seed 1.693868370066.5%
Claude Opus 4.51001008642065.6%
GPT-5.41008858532965.5%
GPT-5.11009661412664.7%
Gemma 3 4B100886868064.6%
GPT-5.4 Nano1009045444163.9%
Aion 2.0100767567063.5%
Z.AI GLM 4.793767471062.9%
Gemini 2.5 Flash (Reasoning)10010039383662.7%
Qwen3 235B A22B Instruct 25071001006746062.6%
Nemotron 3 Super1001006943062.4%
GPT-4o, Aug. 6th (temp=0)88817262060.5%
Claude Opus 41001001000060.0%
DeepSeek-V2 Chat1001001000060.0%
ByteDance Seed 2.0 Lite1001001000060.0%
Claude 3.5 Haiku1001001000060.0%
Gemini 2.5 Flash1001001000060.0%
Llama 3.1 70B1001001000060.0%
Llama 3.1 Nemotron 70B1001001000060.0%
GPT-4.1 Nano1001001000060.0%
Mistral NeMO100100970059.4%
GPT-4.1100915452059.2%
Z.AI GLM 5 Turbo100965147058.9%
Ministral 3 8B100675958056.7%
Ministral 3B100100830056.6%
GPT-51007851231854.1%
Gemini 2.5 Pro1004442393952.9%
Gemini 3 Flash (Preview)100100610052.2%
Arcee AI: Trinity Mini100100610052.2%
Claude 3 Haiku100100580051.6%
Claude Opus 4.6100694542051.2%
GPT-5.4 Mini100735427050.8%
Z.AI GLM 4.610093560049.6%
Z.AI GLM 4.510085630049.4%
Grok 4.20 (Beta)100635824048.9%
Gemma 3 27B10092460047.6%
Mistral Small 481625736047.4%
Qwen 3 32B10079560047.0%
GPT-5.4 Nano (Reasoning)88724523045.6%
Claude Sonnet 4.610077470044.8%
MiniMax M2.510064580044.4%
GPT-4o, May 13th (temp=0)10068540044.4%
DeepSeek V3 (2024-12-26)10071470043.7%
Qwen 2.5 72B9363620043.5%
GPT-5 Mini9963550043.4%
Claude Haiku 4.510059480041.4%
DeepSeek V3.110057470040.9%
Writer: Palmyra X58961520040.4%
Ministral 8B10052500040.3%
ByteDance Seed 2.0 Mini10010000040.0%
Cohere Command R+ (Aug. 2024)10010000040.0%
ByteDance Seed 1.6 Flash100383226039.2%
Nemotron 3 Nano6865560037.9%
Grok 4 Fast1008900037.9%
GPT-4o, Aug. 6th (temp=1)1008800037.5%
GPT-4.1 Mini1008300036.7%
DeepSeek V3 (2025-03-24)1007600035.2%
Claude 3.5 Sonnet938200034.9%
GPT-5.28067230034.1%
Llama 3.1 8B917900034.1%
Mistral Small Creative8740380033.1%
Claude Sonnet 4.6 (Reasoning)1006300032.7%
Mistral Large6561350032.2%
MiniMax M2.71005600031.2%
GPT-4o Mini (temp=1)836900030.6%
Gemini 2.5 Flash Lite1005100030.1%
Gemma 3 12B1005000030.0%
Z.AI GLM 55746420029.2%
Stealth: Healer Alpha4545400026.1%
GPT-4o Mini (temp=0)635400023.4%
Claude Sonnet 4.5100000020.0%
Claude 3.7 Sonnet100000020.0%
Mistral Large 2100000020.0%
Arcee AI: Trinity Large (Preview)100000020.0%
Ministral 3 14B593900019.6%
DeepSeek V3.2583800019.2%
Mistral Small 4 (Reasoning)3731280019.1%
Grok 4.1 Fast78000015.6%
Rocinante 12B75000014.9%
Stealth: Aurora Alpha462800014.7%
Ministral 3 3B68000013.5%
GPT-4o, May 13th (temp=1)57000011.5%
Inception Mercury 23100006.3%
Mistral Medium 3.13100006.3%
Mistral Small 3.2 24B200000.5%
Grok 4000000.0%
Mistral Large 3000000.0%
Inception Mercury000000.0%
WizardLM 2 8x22b000000.0%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemma 3 4B797572676371.1%
DeepSeek V3.1100767670064.4%
ByteDance Seed 1.6 Flash1001006846062.8%
Gemini 3.1 Flash Lite (Preview)100706664060.1%
Claude 3.7 Sonnet1001001000060.0%
Cohere Command R+ (Aug. 2024)1001001000060.0%
Claude Opus 4.6 (Reasoning)100855252057.8%
MiniMax M2.510098790055.5%
Gemini 2.5 Flash Lite (Reasoning)100100740054.7%
Hermes 3 70B100100710054.3%
Claude Haiku 4.5100100670053.3%
Gemini 2.5 Flash82655554051.2%
WizardLM 2 8x22b100674541050.6%
GPT-5.410094280044.4%
Z.AI GLM 510064540043.7%
Mistral Small Creative9870410041.8%
Qwen 3.5 35B57574745041.3%
Gemini 2.5 Flash Lite7871560041.1%
Gemini 3 Flash (Preview, Reasoning)8858540040.0%
Claude 3.5 Sonnet10010000040.0%
Hermes 3 405B10010000040.0%
Arcee AI: Trinity Mini10010000040.0%
GPT-4.11009800039.6%
Z.AI GLM 4.51009600039.2%
Arcee AI: Trinity Large (Preview)1009600039.2%
Llama 3.1 8B1009600039.2%
GPT-4o, Aug. 6th (temp=1)1009300038.5%
Ministral 3 14B8560440037.8%
Qwen 3 32B10062260037.6%
Z.AI GLM 4.61008600037.2%
GPT-5.4 Nano68633219036.4%
Claude Opus 46968450036.3%
Gemma 3 12B6160570035.5%
Mistral Small 47655450035.3%
DeepSeek V3 (2024-12-26)1007200034.5%
Stealth: Healer Alpha6360480034.2%
Grok 4.20 (Beta)8947260032.5%
GPT-4o, May 13th (temp=1)1005900031.8%
DeepSeek V3.26052450031.4%
Qwen3 235B A22B Instruct 25076845440031.4%
GPT-5.160472622030.7%
Gemini 2.5 Flash (Reasoning)5554430030.6%
GPT-4o Mini (temp=1)767200029.6%
Writer: Palmyra X55450420029.1%
Claude Sonnet 4.6 (Reasoning)747100029.0%
GPT-5.4 Nano (Reasoning)10021200028.3%
Gemini 2.5 Pro686000025.7%
MiniMax M2.7715600025.5%
Gemini 3 Pro (Preview)4441380024.8%
GPT-5.4 (Reasoning)6432260024.4%
Stealth: Hunter Alpha744700024.1%
MoonshotAI: Kimi K2.5695100024.0%
Z.AI GLM 4.7575600022.7%
Gemini 3 Flash (Preview)634900022.3%
Qwen 3.5 397B A17B6231170022.1%
Qwen 3.5 Flash931350022.1%
GPT-5.25030240020.8%
GPT-5.4 (Reasoning, Low)535100020.8%
GPT-5 Mini633700020.0%
Claude Opus 4.5100000020.0%
Claude Sonnet 4100000020.0%
ByteDance Seed 2.0 Mini100000020.0%
GPT-4.1 Mini100000020.0%
Llama 3.1 70B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Mistral NeMO100000020.0%
Nemotron 3 Nano683100019.9%
Aion 2.0504900019.7%
Z.AI GLM 4.7 Flash97000019.4%
DeepSeek-V2 Chat623300018.9%
Claude 3 Haiku94000018.9%
Claude Sonnet 4.689000017.9%
DeepSeek V3 (2025-03-24)89000017.9%
GPT-4.1 Nano89000017.9%
Claude Opus 4.6454400017.8%
Mistral Medium 3.1434300017.2%
GPT-5.4 Nano (Reasoning, Low)23221917016.3%
GPT-5.4 Mini81000016.1%
GPT-4o, Aug. 6th (temp=0)76000015.2%
o4 Mini452800014.5%
LFM2 24B67000013.3%
Ministral 8B60000011.9%
GPT-4o Mini (temp=0)57000011.4%
GPT-5282600010.8%
Mistral Small 4 (Reasoning)51000010.2%
ByteDance Seed 1.64800009.6%
Qwen 3.5 27B4500009.0%
Mistral Large 24400008.8%
Grok 4.1 Fast4000008.1%
Grok 43800007.7%
Rocinante 12B3300006.7%
Grok 4.20 (Beta, Reasoning)3000006.1%
o4 Mini High2900005.7%
GPT-5 Nano2500005.0%
Gemini 3.1 Pro (Preview)000000.0%
Z.AI GLM 5 Turbo000000.0%
Qwen 3.5 122B000000.0%
GPT-5.4 Mini (Reasoning)000000.0%
Claude Sonnet 4.5000000.0%
Grok 4 Fast000000.0%
Qwen 3.5 9B000000.0%
Qwen 3.5 Plus (2026-02-15)000000.0%
GPT-5.4 Mini (Reasoning, Low)000000.0%
Mistral Large 3000000.0%
GPT-4o, May 13th (temp=0)000000.0%
ByteDance Seed 2.0 Lite000000.0%
Nemotron 3 Super000000.0%
Inception Mercury 2000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Haiku000000.0%
Mistral Large000000.0%
Inception Mercury000000.0%
Mistral Small 3.2 24B000000.0%
Gemma 3 27B000000.0%
Qwen 2.5 72B000000.0%
Ministral 3 8B000000.0%
Ministral 3 3B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-4o, Aug. 6th (temp=1)10010076747083.9%
Mistral NeMO1009482765781.9%
Hermes 3 70B1001009685076.2%
Qwen 3 32B1008875624774.4%
Aion 2.010010086413873.0%
Llama 3.1 8B10010010060071.9%
GPT-5.4 (Reasoning)1009683292967.5%
WizardLM 2 8x22b1008476373566.3%
GPT-4.1 Nano1001006561065.2%
Claude Sonnet 4.6 (Reasoning)1001006657064.7%
Qwen3 235B A22B Instruct 25071001007033060.6%
GPT-5.4 Nano988942403059.8%
Gemini 3 Pro (Preview)100100930058.7%
Claude Haiku 4.5100100860057.2%
DeepSeek V3.210095890056.9%
Gemini 2.5 Flash Lite100676353056.6%
GPT-4o, Aug. 6th (temp=0)82776757056.5%
Rocinante 12B100100810056.1%
Claude Sonnet 4100100750054.9%
Arcee AI: Trinity Mini100100740054.7%
Qwen 3.5 Plus (2026-02-15)1006638363254.3%
Ministral 3 14B100635848053.7%
ByteDance Seed 1.6 Flash100685343052.8%
Grok 4.20 (Beta)1006354232352.5%
Arcee AI: Trinity Large (Preview)100100620052.3%
Z.AI GLM 59785760051.6%
MiniMax M2.5100100550051.0%
Gemma 3 12B100525250050.6%
Claude Opus 410094540049.6%
Z.AI GLM 4.7 Flash100724032048.9%
GPT-5.4 (Reasoning, Low)865352272648.7%
GPT-51006527251947.3%
Cohere Command R+ (Aug. 2024)10069630046.4%
Claude Opus 4.6 (Reasoning)100723128046.2%
Claude Opus 4.610094330045.3%
GPT-4o, May 13th (temp=1)10069540044.8%
Claude Sonnet 4.67472620041.5%
Stealth: Healer Alpha82424040040.6%
Gemma 3 4B10058430040.2%
GPT-4o Mini (temp=1)10010000040.0%
Gemma 3 27B10010000040.0%
Llama 3.1 Nemotron 70B10010000040.0%
Writer: Palmyra X57872460039.1%
Gemini 2.5 Flash1008800037.5%
GPT-4.1 Mini1008600037.2%
DeepSeek V3 (2024-12-26)6961550037.1%
Claude Opus 4.51008300036.5%
GPT-5 Nano977800034.9%
Ministral 8B1007100034.3%
Gemini 3 Flash (Preview)414034342033.7%
Mistral Medium 3.16057500033.3%
Grok 4.20 (Beta, Reasoning)8649310033.3%
Stealth: Hunter Alpha473229282832.8%
ByteDance Seed 2.0 Mini1006300032.5%
Mistral Large 35756490032.3%
GPT-4.11006000031.9%
GPT-5.4 Mini49473229031.1%
o4 Mini High66352725030.6%
GPT-5.47251290030.4%
DeepSeek V3.15350430029.1%
Qwen 3.5 397B A17B10026190029.0%
Gemini 3.1 Pro (Preview)38383730028.7%
Hermes 3 405B796400028.7%
Gemini 2.5 Flash Lite (Reasoning)1004100028.3%
Claude Sonnet 4.54646460027.8%
Z.AI GLM 4.65249350027.1%
Nemotron 3 Nano825300027.0%
Grok 4795400026.7%
Grok 4 Fast884000025.8%
Qwen 2.5 72B646300025.3%
Mistral Small 4 (Reasoning)824300025.1%
MiniMax M2.7804300024.5%
Gemini 2.5 Flash (Reasoning)645700024.3%
GPT-5.1754400023.8%
GPT-5.2575500022.5%
GPT-5.4 Mini (Reasoning, Low)31292624022.0%
Z.AI GLM 4.74037330021.9%
DeepSeek-V2 Chat604100020.1%
Claude 3.5 Haiku100000020.0%
Mistral Large 2100000020.0%
Ministral 3 8B100000020.0%
LFM2 24B100000020.0%
Claude 3.7 Sonnet98000019.6%
GPT-5.4 Mini (Reasoning)3232320019.3%
Claude 3 Haiku96000019.2%
Qwen 3.5 Flash702300018.6%
MoonshotAI: Kimi K2.5484200018.1%
GPT-5.4 Nano (Reasoning, Low)29271914017.8%
GPT-5 Mini88000017.5%
Qwen 3.5 122B661800016.8%
Mistral Large71000014.3%
Qwen 3.5 35B2219108011.8%
Mistral Small 456000011.1%
Gemini 2.5 Pro54000010.9%
Gemini 3.1 Flash Lite (Preview)54000010.9%
Z.AI GLM 4.551000010.1%
Z.AI GLM 5 Turbo4900009.7%
Mistral Small Creative4800009.5%
GPT-4o, May 13th (temp=0)4700009.4%
Gemini 3 Flash (Preview, Reasoning)4600009.2%
GPT-5.4 Nano (Reasoning)22180007.9%
Grok 4.1 Fast3200006.4%
o4 Mini2900005.8%
Qwen 3.5 27B000000.0%
ByteDance Seed 1.6000000.0%
Qwen 3.5 9B000000.0%
ByteDance Seed 2.0 Lite000000.0%
Nemotron 3 Super000000.0%
Claude 3.5 Sonnet000000.0%
Inception Mercury 2000000.0%
Stealth: Aurora Alpha000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Inception Mercury000000.0%
Mistral Small 3.2 24B000000.0%
Llama 3.1 70B000000.0%
GPT-4o Mini (temp=0)000000.0%
Ministral 3 3B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4 Fast100100100924988.1%
Ministral 3 14B100100100100080.0%
GPT-5 Nano10010010067073.4%
Claude Sonnet 41001007667068.5%
Claude Opus 41001007763068.0%
Gemini 2.5 Flash Lite100888658066.4%
Qwen 3.5 397B A17B1008576462366.0%
Z.AI GLM 4.7 Flash100936258062.5%
ByteDance Seed 2.0 Lite1001001000060.0%
Llama 3.1 Nemotron 70B1001001000060.0%
Llama 3.1 8B1001001000060.0%
Mistral Small Creative100100870057.4%
Hermes 3 405B100100810056.1%
MoonshotAI: Kimi K2.5100100740054.7%
Mistral Large 310083690050.6%
Z.AI GLM 510068630046.2%
ByteDance Seed 1.6 Flash62605650045.4%
Grok 4.20 (Beta)100603131044.5%
Z.AI GLM 4.610067520043.6%
Qwen 3.5 122B10068500043.5%
Claude 3.7 Sonnet10010000040.0%
Ministral 3 8B10010000040.0%
Cohere Command R+ (Aug. 2024)10010000040.0%
Qwen 3.5 35B1009900039.9%
Gemini 2.5 Flash8160570039.7%
Ministral 3B1009300038.5%
GPT-5.4565230282638.3%
Stealth: Hunter Alpha8959430038.2%
Z.AI GLM 5 Turbo1008900037.9%
GPT-4.1 Nano1008800037.5%
GPT-5.4 (Reasoning)58553835037.5%
GPT-5.4 Mini78393533037.0%
GPT-4.1 Mini939100036.7%
Ministral 3 3B60444231035.5%
MiniMax M2.77059450034.9%
Mistral Small 4 (Reasoning)1007400034.7%
GPT-4o Mini (temp=1)868600034.5%
o4 Mini10040330034.5%
Qwen 3.5 27B10038340034.5%
Mistral Large 21007100034.3%
Z.AI GLM 4.71007000034.1%
Grok 4.20 (Beta, Reasoning)10037330034.0%
Grok 4.1 Fast7160360033.5%
Mistral Medium 3.16754450033.1%
GPT-4o, Aug. 6th (temp=0)857800032.6%
WizardLM 2 8x22b926800032.0%
DeepSeek-V2 Chat936700031.9%
Mistral Large1005700031.4%
Qwen3 235B A22B Instruct 25071005700031.4%
Writer: Palmyra X51005600031.1%
Gemini 3.1 Flash Lite (Preview)1005400030.8%
Qwen 3.5 Plus (2026-02-15)777400030.1%
Qwen 3.5 9B5848390029.1%
Qwen 3.5 Flash776300028.0%
o4 Mini High1003700027.5%
Gemini 2.5 Flash Lite (Reasoning)696300026.4%
GPT-4o, May 13th (temp=0)686300026.2%
Gemma 3 27B685600024.7%
GPT-5.4 Nano7026210023.4%
Mistral Small 4684700023.0%
Claude Opus 4.6585000021.6%
GPT-5.13937290021.1%
ByteDance Seed 1.6544900020.7%
GPT-5.4 Nano (Reasoning, Low)4138250020.6%
Claude Opus 4.5100000020.0%
Claude Sonnet 4.5100000020.0%
Claude Haiku 4.5100000020.0%
GPT-4o, May 13th (temp=1)100000020.0%
GPT-4o, Aug. 6th (temp=1)100000020.0%
DeepSeek V3.1100000020.0%
Qwen 3 32B100000020.0%
Inception Mercury100000020.0%
Arcee AI: Trinity Large (Preview)100000020.0%
Arcee AI: Trinity Mini100000020.0%
Mistral NeMO100000020.0%
Claude Opus 4.6 (Reasoning)534400019.4%
Gemini 3 Flash (Preview)544100019.1%
Gemini 3 Flash (Preview, Reasoning)95000019.0%
Claude Sonnet 4.691000018.2%
LFM2 24B89000017.9%
Nemotron 3 Super88000017.5%
Llama 3.1 70B88000017.5%
GPT-5.4 (Reasoning, Low)582900017.3%
DeepSeek V3 (2025-03-24)83000016.7%
Stealth: Healer Alpha82000016.4%
Claude 3 Haiku81000016.1%
Hermes 3 70B76000015.2%
GPT-5 Mini423300015.0%
Rocinante 12B75000014.9%
ByteDance Seed 2.0 Mini72000014.5%
Z.AI GLM 4.570000014.1%
Gemma 3 12B70000014.1%
Gemma 3 4B67000013.3%
GPT-5.4 Mini (Reasoning, Low)323100012.6%
DeepSeek V3.262000012.3%
Stealth: Aurora Alpha59000011.8%
MiniMax M2.558000011.6%
Grok 458000011.6%
Gemini 3 Pro (Preview)51000010.2%
GPT-529150008.7%
GPT-5.4 Nano (Reasoning)2100004.2%
Gemini 3.1 Pro (Preview)000000.0%
Claude Sonnet 4.6 (Reasoning)000000.0%
GPT-5.4 Mini (Reasoning)000000.0%
GPT-5.2000000.0%
Aion 2.0000000.0%
GPT-4.1000000.0%
Gemini 2.5 Pro000000.0%
Gemini 2.5 Flash (Reasoning)000000.0%
Claude 3.5 Sonnet000000.0%
Inception Mercury 2000000.0%
Claude 3.5 Haiku000000.0%
DeepSeek V3 (2024-12-26)000000.0%
Mistral Small 3.2 24B000000.0%
GPT-4o Mini (temp=0)000000.0%
Nemotron 3 Nano000000.0%
Qwen 2.5 72B000000.0%
Ministral 8B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 2.0 Lite1001001001007094.1%
ByteDance Seed 1.610010010086077.2%
Writer: Palmyra X51001006761065.5%
Mistral NeMO100856648059.6%
ByteDance Seed 2.0 Mini100100960059.2%
Claude 3 Haiku100100910058.2%
Hermes 3 70B100100860057.2%
Gemini 3.1 Flash Lite (Preview)100756546057.1%
Cohere Command R+ (Aug. 2024)100100790055.9%
Rocinante 12B100100770055.4%
Qwen 3 32B100766729054.3%
GPT-4o, Aug. 6th (temp=0)10098720054.1%
Claude Sonnet 4.610093740053.2%
Claude 3.7 Sonnet10076760050.3%
Stealth: Healer Alpha100624442049.5%
Z.AI GLM 586545243046.9%
Claude Sonnet 4.6 (Reasoning)7976690044.9%
Gemma 3 27B8679570044.5%
Gemini 2.5 Flash Lite (Reasoning)8277610044.0%
GPT-5.18064590040.5%
Claude Sonnet 410010000040.0%
Grok 4.20 (Beta)7765570040.0%
GPT-5.410066300039.3%
Gemini 3 Flash (Preview)1009600039.2%
Claude Opus 4.510051450039.0%
o4 Mini7976390038.9%
GPT-4o, May 13th (temp=1)1009400038.9%
Qwen 2.5 72B1009400038.9%
DeepSeek V3 (2025-03-24)1009100038.2%
GPT-5.4 (Reasoning)63613630038.0%
GPT-510051390037.9%
Gemma 3 4B8158500037.7%
GPT-4o, Aug. 6th (temp=1)968900037.1%
Gemini 3 Pro (Preview)10047370036.9%
Qwen 3.5 Flash7973300036.6%
GPT-5.4 Nano (Reasoning, Low)80363433036.5%
Z.AI GLM 5 Turbo1008100036.1%
DeepSeek-V2 Chat837900032.5%
Grok 4 Fast1006100032.2%
GPT-4o, May 13th (temp=0)946600032.0%
GPT-5.4 (Reasoning, Low)887000031.6%
Z.AI GLM 4.61005800031.6%
GPT-5 Nano10036210031.6%
Mistral Large896400030.7%
Claude Opus 4.65351420029.1%
Claude Opus 4856000029.0%
Stealth: Hunter Alpha786700029.0%
ByteDance Seed 1.6 Flash1004300028.7%
Claude Opus 4.6 (Reasoning)884200025.9%
Qwen 3.5 397B A17B1002100024.2%
GPT-5.4 Nano413015141322.7%
Qwen3 235B A22B Instruct 2507615000022.1%
Claude Haiku 4.5575300021.9%
GPT-5.4 Nano (Reasoning)42301818021.6%
DeepSeek V3.2574800021.1%
MiniMax M2.5100000020.0%
Grok 4100000020.0%
Z.AI GLM 4.5100000020.0%
Z.AI GLM 4.7 Flash100000020.0%
Nemotron 3 Super100000020.0%
Claude 3.5 Sonnet100000020.0%
DeepSeek V3.1100000020.0%
Mistral Small 3.2 24B100000020.0%
GPT-4o Mini (temp=0)100000020.0%
Mistral Medium 3.1100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Arcee AI: Trinity Large (Preview)100000020.0%
WizardLM 2 8x22b100000020.0%
Gemini 2.5 Flash (Reasoning)544500020.0%
Hermes 3 405B91000018.2%
Ministral 3 14B89000017.9%
Ministral 3 3B89000017.9%
Qwen 3.5 122B3925250017.8%
Llama 3.1 8B88000017.5%
Inception Mercury 2454000017.0%
Llama 3.1 70B83000016.7%
Mistral Small 4 (Reasoning)78000015.6%
GPT-5.4 Mini (Reasoning)463000015.2%
GPT-5.22825200014.8%
GPT-4o Mini (temp=1)74000014.7%
GPT-4.1 Nano69000013.9%
MiniMax M2.768000013.7%
Grok 4.20 (Beta, Reasoning)383000013.6%
MoonshotAI: Kimi K2.568000013.5%
Mistral Small Creative68000013.5%
Aion 2.061000012.2%
Gemini 2.5 Flash Lite60000012.0%
Gemma 3 12B60000012.0%
Qwen 3.5 35B59000011.7%
Gemini 2.5 Flash57000011.4%
Mistral Large 254000010.8%
Qwen 3.5 Plus (2026-02-15)53000010.5%
Z.AI GLM 4.752000010.4%
Mistral Large 352000010.4%
Stealth: Aurora Alpha4300008.7%
GPT-5.4 Mini (Reasoning, Low)3500007.0%
GPT-5 Mini2900005.7%
GPT-5.4 Mini2800005.7%
Qwen 3.5 27B1600003.1%
Gemini 3.1 Pro (Preview)000000.0%
Gemini 3 Flash (Preview, Reasoning)000000.0%
o4 Mini High000000.0%
Grok 4.1 Fast000000.0%
GPT-4.1000000.0%
Gemini 2.5 Pro000000.0%
Claude Sonnet 4.5000000.0%
Qwen 3.5 9B000000.0%
Claude 3.5 Haiku000000.0%
DeepSeek V3 (2024-12-26)000000.0%
GPT-4.1 Mini000000.0%
Inception Mercury000000.0%
Nemotron 3 Nano000000.0%
Mistral Small 4000000.0%
Ministral 3 8B000000.0%
Arcee AI: Trinity Mini000000.0%
Ministral 8B000000.0%
Ministral 3B000000.0%
LFM2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 122B10010010063072.7%
Claude Sonnet 41001008572071.4%
Mistral Small Creative1001006659064.9%
Qwen 3.5 35B1001007448064.4%
Qwen 3.5 397B A17B10010010020063.9%
Llama 3.1 Nemotron 70B1001001000060.0%
Gemini 3.1 Flash Lite (Preview)100100930058.7%
Qwen 3.5 Flash100100930058.6%
ByteDance Seed 1.6 Flash100626240052.7%
Gemini 3 Pro (Preview)100804038051.7%
Rocinante 12B100100570051.4%
ByteDance Seed 2.0 Lite88645745050.8%
Qwen 3 32B10096530049.8%
Stealth: Healer Alpha96574746049.2%
Claude Opus 410061600044.1%
Gemma 3 12B7270670041.9%
GPT-5.4 Mini68623833040.2%
Mistral Large 310010000040.0%
Hermes 3 70B10010000040.0%
Llama 3.1 8B10010000040.0%
Cohere Command R+ (Aug. 2024)1009800039.6%
Qwen 2.5 72B1008800037.5%
Z.AI GLM 5 Turbo7654510036.2%
WizardLM 2 8x22b1007800035.6%
GPT-4o, Aug. 6th (temp=1)967800034.9%
ByteDance Seed 2.0 Mini1007400034.7%
MiniMax M2.51007200034.5%
Claude Sonnet 4.5967600034.4%
o4 Mini8946350034.2%
GPT-5937800034.1%
Claude 3.7 Sonnet1006900033.9%
Z.AI GLM 4.76360460033.7%
GPT-4o Mini (temp=1)868100033.4%
Ministral 3B1006600033.2%
Claude 3 Haiku837900032.5%
Gemma 3 27B837700032.1%
Qwen 3.5 9B867000031.3%
GPT-5.4 (Reasoning, Low)54413228031.1%
Grok 4.20 (Beta, Reasoning)9933230031.0%
ByteDance Seed 1.61005300030.6%
DeepSeek V3.2935800030.3%
GPT-5.4 (Reasoning)6257320030.3%
Mistral Small 41004600029.3%
GPT-5 Nano1004500028.9%
Z.AI GLM 4.7 Flash756100027.1%
GPT-5.47926250026.2%
Z.AI GLM 5636300025.0%
o4 Mini High903200024.4%
Grok 4.20 (Beta)5037290023.2%
Aion 2.0704500023.2%
Mistral Small 4 (Reasoning)4435340022.6%
Gemini 2.5 Pro634900022.2%
MoonshotAI: Kimi K2.5614800021.8%
Gemini 2.5 Flash (Reasoning)634400021.3%
Gemini 2.5 Flash535000020.5%
Claude Opus 4.6554600020.2%
Claude Opus 4.6 (Reasoning)524900020.1%
Qwen 3.5 27B100000020.0%
MiniMax M2.7100000020.0%
Stealth: Hunter Alpha100000020.0%
Gemini 2.5 Flash Lite (Reasoning)100000020.0%
Nemotron 3 Super100000020.0%
GPT-4.1 Mini100000020.0%
Mistral Large100000020.0%
Llama 3.1 70B100000020.0%
Ministral 3 8B100000020.0%
Mistral NeMO100000020.0%
Arcee AI: Trinity Mini96000019.2%
GPT-5.4 Nano4335170019.2%
GPT-4o, May 13th (temp=1)94000018.9%
Grok 4 Fast633100018.7%
DeepSeek V3 (2024-12-26)86000017.2%
GPT-5.4 Mini (Reasoning, Low)393700015.1%
GPT-5.4 Mini (Reasoning)393600015.1%
Claude Haiku 4.575000014.9%
Z.AI GLM 4.572000014.5%
Arcee AI: Trinity Large (Preview)65000013.0%
Qwen 3.5 Plus (2026-02-15)64000012.8%
Writer: Palmyra X564000012.8%
Mistral Large 263000012.5%
GPT-5.4 Nano (Reasoning, Low)313000012.3%
Gemini 3 Flash (Preview, Reasoning)56000011.1%
DeepSeek V3.154000010.9%
GPT-5 Mini3900007.8%
GPT-5.13200006.4%
GPT-5.22100004.3%
Gemini 3.1 Pro (Preview)000000.0%
Claude Sonnet 4.6 (Reasoning)000000.0%
Claude Sonnet 4.6000000.0%
Claude Opus 4.5000000.0%
Grok 4.1 Fast000000.0%
Z.AI GLM 4.6000000.0%
GPT-4.1000000.0%
Grok 4000000.0%
GPT-4o, May 13th (temp=0)000000.0%
Gemini 3 Flash (Preview)000000.0%
DeepSeek-V2 Chat000000.0%
Claude 3.5 Sonnet000000.0%
Inception Mercury 2000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Haiku000000.0%
Hermes 3 405B000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
DeepSeek V3 (2025-03-24)000000.0%
GPT-5.4 Nano (Reasoning)000000.0%
Gemini 2.5 Flash Lite000000.0%
Qwen3 235B A22B Instruct 2507000000.0%
Inception Mercury000000.0%
Mistral Small 3.2 24B000000.0%
GPT-4o Mini (temp=0)000000.0%
Mistral Medium 3.1000000.0%
Nemotron 3 Nano000000.0%
Ministral 3 14B000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
Ministral 3 3B000000.0%
Ministral 8B000000.0%
LFM2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Rocinante 12B100100100918194.3%
Z.AI GLM 4.7 Flash10010088856687.7%
Gemini 3.1 Flash Lite (Preview)1001007974070.6%
GPT-4.1 Nano1001007771069.7%
Stealth: Healer Alpha937971464366.5%
Gemini 2.5 Flash Lite1006969493564.6%
Hermes 3 70B100838151062.9%
Gemma 3 4B1001005454061.7%
DeepSeek V3.2100817451061.1%
MiniMax M2.5100985744060.0%
Cohere Command R+ (Aug. 2024)100100880057.5%
Z.AI GLM 5 Turbo96756050056.1%
GPT-5.4 (Reasoning)908051302454.9%
Claude Haiku 4.597635750053.4%
GPT-5 Nano100715139052.1%
Grok 4.20 (Beta)1007137272451.6%
Ministral 3 3B10083750051.6%
Aion 2.0100575048051.0%
Qwen 3.5 Plus (2026-02-15)99764534050.8%
Claude Opus 4.6100724238050.5%
WizardLM 2 8x22b100624740049.7%
Claude Opus 4.592734340049.6%
Gemini 3 Flash (Preview, Reasoning)10074650047.8%
Qwen3 235B A22B Instruct 250710097370046.9%
Z.AI GLM 4.779754436046.9%
Gemini 3 Pro (Preview)10089440046.5%
Qwen 3.5 35B10068540044.3%
GPT-4o, Aug. 6th (temp=1)8876570044.1%
GPT-5.4 (Reasoning, Low)10059550042.8%
Z.AI GLM 4.610066450042.2%
GPT-4o Mini (temp=1)7472620041.5%
Qwen 3.5 397B A17B805029271840.9%
ByteDance Seed 1.610010000040.0%
ByteDance Seed 2.0 Mini10010000040.0%
Gemini 2.5 Flash (Reasoning)10010000040.0%
DeepSeek V3.110010000040.0%
GPT-4o, Aug. 6th (temp=0)6861600037.8%
Z.AI GLM 58761410037.8%
Claude 3 Haiku988900037.5%
Qwen 3.5 122B1008300036.6%
GPT-5.4 Mini (Reasoning)9850320035.9%
Claude Sonnet 4.61007900035.9%
Stealth: Hunter Alpha9156300035.5%
GPT-5.4 Nano (Reasoning, Low)9956190034.9%
DeepSeek-V2 Chat1007400034.7%
Claude Opus 4.6 (Reasoning)8247410033.9%
GPT-4o, May 13th (temp=1)1006800033.5%
GPT-5.47659280032.7%
Claude 3.7 Sonnet1006300032.7%
Mistral Large818100032.3%
ByteDance Seed 2.0 Lite936800032.2%
Qwen 2.5 72B1005600031.2%
Mistral NeMO1005500031.0%
GPT-5 Mini10028260030.9%
GPT-4.1 Mini886500030.5%
DeepSeek V3 (2024-12-26)6052380030.0%
Arcee AI: Trinity Mini816800029.6%
Writer: Palmyra X5895300028.5%
Z.AI GLM 4.5726900028.4%
Claude Sonnet 4.5825900028.2%
o4 Mini5852270027.5%
GPT-5.4 Nano53481816026.9%
GPT-5.4 Mini1003300026.7%
Qwen 3.5 9B943340026.3%
Qwen 3 32B765600026.3%
GPT-4o Mini (temp=0)686300026.2%
GPT-5.4 Mini (Reasoning, Low)7029290025.8%
Qwen 3.5 Flash1002900025.7%
GPT-5.142333023025.6%
ByteDance Seed 1.6 Flash952800024.7%
GPT-4o, May 13th (temp=0)784300024.2%
Grok 4 Fast4937340024.1%
GPT-4.1645300023.5%
Grok 4.20 (Beta, Reasoning)684900023.4%
Gemini 2.5 Pro625400023.2%
Gemma 3 27B544800020.4%
Claude Sonnet 4.6 (Reasoning)100000020.0%
Qwen 3.5 27B100000020.0%
Claude Sonnet 4100000020.0%
Claude Opus 4100000020.0%
Claude 3.5 Haiku100000020.0%
Hermes 3 405B100000020.0%
Llama 3.1 70B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Llama 3.1 8B100000020.0%
Nemotron 3 Super96000019.2%
Arcee AI: Trinity Large (Preview)85000016.9%
Mistral Small 4 (Reasoning)79000015.7%
MoonshotAI: Kimi K2.576000015.2%
Grok 475000014.9%
o4 Mini High69000013.8%
Ministral 3 14B68000013.5%
Mistral Large 363000012.7%
Gemini 2.5 Flash Lite (Reasoning)62000012.3%
Gemini 2.5 Flash61000012.2%
Mistral Medium 3.152000010.4%
Gemini 3.1 Pro (Preview)4100008.2%
GPT-5.4 Nano (Reasoning)4000008.1%
MiniMax M2.73400006.8%
GPT-5.22700005.5%
GPT-52700005.3%
Gemini 3 Flash (Preview)2200004.3%
Grok 4.1 Fast000000.0%
Claude 3.5 Sonnet000000.0%
Inception Mercury 2000000.0%
Stealth: Aurora Alpha000000.0%
Mistral Large 2000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Inception Mercury000000.0%
Mistral Small 3.2 24B000000.0%
Gemma 3 12B000000.0%
Nemotron 3 Nano000000.0%
Mistral Small 4000000.0%
Mistral Small Creative000000.0%
Ministral 3 8B000000.0%
Ministral 8B000000.0%
Ministral 3B000000.0%
LFM2 24B000000.0%