AI-ism word frequency

Test: Bad Writing Habits

Avg. Score
36.3%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1ByteDance Seed 1.6 Flash69.0%$0.001327.3s51%
2GPT-5.4 Mini (Reasoning)71.2%$0.02228.1s55%
3GPT-5.4 Mini68.1%$0.01516.8s53%
4GPT-5.4 Mini (Reasoning, Low)68.0%$0.01516.8s51%
5ByteDance Seed 2.0 Lite76.3%$0.0122.2m63%
6GPT-5 Mini66.0%$0.010057.4s49%
7Claude Sonnet 4.670.0%$0.03139.3s48%
8GPT-5.473.9%$0.0491.4m57%
9GPT-5.4 (Reasoning, Low)73.0%$0.0551.4m58%
10GPT-5.4 Nano (Reasoning, Low)57.9%$0.005520.6s42%
11GPT-5.4 Nano55.5%$0.005726.3s42%
12GPT-5.4 Nano (Reasoning)56.3%$0.006124.5s40%
13MiniMax M2.560.5%$0.00341.3m44%
14Z.AI GLM 5 Turbo60.3%$0.008133.2s37%
15Claude Haiku 4.557.4%$0.01121.6s38%
16Claude Sonnet 4.6 (Reasoning)71.3%$0.0601.2m50%
17MiniMax M2.758.4%$0.00401.1m40%
18Grok 4.20 (Beta)51.0%$0.01815.8s38%
19GPT-574.2%$0.0652.8m61%
20Z.AI GLM 554.8%$0.00841.2m39%
21GPT-5.261.5%$0.0561.5m48%
22GPT-5.163.4%$0.0541.8m48%
23Qwen 3.5 Flash47.5%$0.002547.5s34%
24GPT-5.4 (Reasoning)73.5%$0.0892.6m58%
25Writer: Palmyra X546.6%$0.01122.0s32%
26Claude Sonnet 4.554.6%$0.03538.1s35%
27Claude Opus 4.663.6%$0.0781.2m47%
28Grok 4.20 (Beta, Reasoning)53.1%$0.03934.0s37%
29Qwen 3.5 9B55.4%$0.00111.4m32%
30Gemini 3 Flash (Preview)41.1%$0.007819.6s30%
31GPT-5 Nano46.6%$0.00421.4m36%
32Claude Opus 4.558.0%$0.07053.4s41%
33Z.AI GLM 4.7 Flash45.8%$0.00171.2m33%
34Mistral Small 4 (Reasoning)41.5%$0.002230.2s27%
35Z.AI GLM 4.543.0%$0.005142.1s28%
36Claude Opus 4.6 (Reasoning)62.7%$0.0881.4m44%
37Z.AI GLM 4.747.8%$0.0101.4m34%
38Qwen3 235B A22B Instruct 250743.8%$0.001159.2s28%
39Claude 3.7 Sonnet49.1%$0.04246.7s33%
40Mistral Small Creative35.9%$0.00079.1s24%
41Mistral Small 437.0%$0.001418.2s25%
42Mistral Medium 3.139.0%$0.004836.5s27%
43ByteDance Seed 2.0 Mini68.6%$0.00454.9m51%
44Qwen 3.5 Plus (2026-02-15)38.7%$0.006031.5s26%
45Qwen 3.5 122B46.4%$0.0251.1m31%
46Qwen 3.5 35B47.0%$0.0181.0m27%
47Stealth: Healer Alpha37.4%$0.000023.7s22%
48Qwen 3 32B40.4%$0.001554.6s25%
49Rocinante 12B43.1%$0.001438.4s20%
50ByteDance Seed 1.654.6%$0.0132.5m34%
51Gemini 3 Flash (Preview, Reasoning)37.3%$0.01230.1s26%
52Stealth: Hunter Alpha39.5%$0.000055.0s24%
53Ministral 3 14B33.3%$0.000711.7s20%
54Aion 2.040.3%$0.00641.3m28%
55Llama 3.1 8B38.7%$0.00031.3m26%
56GPT-4.138.0%$0.01844.7s25%
57DeepSeek V3 (2025-03-24)35.9%$0.001439.4s20%
58Mistral Large 236.8%$0.01329.4s21%
59Mistral Large 334.9%$0.003330.3s20%
60Gemini 3 Pro (Preview)45.0%$0.05554.4s31%
61Mistral Large36.5%$0.01430.9s21%
62Qwen 3.5 27B44.5%$0.0201.6m28%
63Arcee AI: Trinity Large (Preview)35.7%$0.000043.6s19%
64Claude Sonnet 439.2%$0.03243.7s24%
65DeepSeek V3.239.2%$0.00141.9m26%
66Ministral 8B27.1%$0.000410.4s15%
67Ministral 3 8B28.3%$0.000819.6s14%
68Z.AI GLM 4.631.3%$0.006551.5s18%
69Gemini 3.1 Flash Lite (Preview)23.8%$0.00308.4s15%
70Qwen 3.5 397B A17B46.8%$0.0143.0m31%
71Hermes 3 405B31.9%$0.003253.2s14%
72Hermes 3 70B29.9%$0.00101.2m18%
73MoonshotAI: Kimi K2.546.4%$0.0193.2m31%
74Claude 3.5 Sonnet33.9%$0.04835.5s19%
75WizardLM 2 8x22b33.8%$0.00261.8m17%
76Gemma 3 27B21.0%$0.000652.6s16%
77DeepSeek V3.130.3%$0.00201.8m19%
78DeepSeek V3 (2024-12-26)23.8%$0.002154.6s13%
79Gemini 2.5 Pro26.5%$0.03636.2s16%
80DeepSeek-V2 Chat21.9%$0.002153.3s12%
81Ministral 3B17.3%$0.00018.1s7%
82Grok 4 Fast15.1%$0.001724.1s11%
83o4 Mini17.6%$0.01525.7s12%
84LFM2 24B18.4%$0.000228.4s7%
85Grok 4.1 Fast16.9%$0.001837.8s10%
86Ministral 3 3B13.5%$0.000511.1s5%
87Arcee AI: Trinity Mini12.9%$0.00039.2s0%
88Gemini 2.5 Flash12.6%$0.005210.6s2%
89Mistral NeMO10.6%$0.000510.1s1%
90Gemma 3 12B13.0%$0.000441.3s5%
91Nemotron 3 Super16.4%$0.00001.4m10%
92o4 Mini High16.1%$0.02547.2s9%
93GPT-4.1 Mini11.7%$0.002719.0s0%
94Claude 3.5 Haiku10.0%$0.003510.8s0%
95Llama 3.1 70B13.3%$0.001529.4s0%
96Gemini 2.5 Flash Lite7.9%$0.00099.5s0%
97Claude 3 Haiku9.6%$0.002514.9s0%
98Cohere Command R+ (Aug. 2024)19.9%$0.02052.5s4%
99GPT-4.1 Nano8.0%$0.000713.3s0%
100Inception Mercury12.7%$0.01117.6s0%
101Claude Opus 454.6%$0.2091.4m37%
102Gemma 3 4B8.2%$0.000220.0s0%
103Gemini 2.5 Flash (Reasoning)11.1%$0.01121.5s0%
104Inception Mercury 25.1%$0.00327.0s0%
105Stealth: Aurora Alpha4.5%$0.00009.8s0%
106Gemini 2.5 Flash Lite (Reasoning)8.9%$0.002830.8s0%
107Qwen 2.5 72B7.9%$0.001036.7s0%
108Gemini 3.1 Pro (Preview)34.5%$0.1071.8m23%
109Llama 3.1 Nemotron 70B4.9%$0.003831.7s0%
110Grok 419.2%$0.0481.7m13%
111GPT-4o Mini (temp=1)2.3%$0.001234.8s0%
112GPT-4o, May 13th (temp=0)8.9%$0.03514.1s0%
113GPT-4o Mini (temp=0)0.8%$0.001234.8s0%
114GPT-4o, May 13th (temp=1)7.4%$0.03314.4s0%
115GPT-4o, Aug. 6th (temp=1)2.8%$0.01824.4s0%
116Nemotron 3 Nano6.2%$0.00101.1m0%
117GPT-4o, Aug. 6th (temp=0)3.1%$0.02322.7s0%
118Mistral Small 3.2 24B11.9%$0.00695.7m0%
36.33%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 2.0 Lite897977757078.0%
GPT-5.4777269666169.0%
ByteDance Seed 1.6 Flash827769554866.1%
GPT-5.4 Mini716665585362.7%
GPT-5756363565662.4%
GPT-5.4 Mini (Reasoning)726562605162.0%
GPT-5.4 (Reasoning)716459584960.3%
Claude Sonnet 4.6 (Reasoning)827862463260.1%
GPT-5.4 Mini (Reasoning, Low)666261555058.9%
GPT-5.4 (Reasoning, Low)716158525258.8%
GPT-5 Mini766059514858.7%
GPT-5.2676659514457.4%
ByteDance Seed 2.0 Mini786059423554.8%
Claude Sonnet 4.6656550502851.7%
GPT-5.1686043413749.8%
Claude Opus 4.6644946454449.5%
Grok 4.20 (Beta, Reasoning)615249443648.5%
Qwen 3.5 35B736436322947.0%
Claude Opus 4.5645453391545.1%
Claude Opus 4615147353245.0%
Claude Opus 4.6 (Reasoning)635753272144.4%
Qwen 3.5 9B66545049244.1%
ByteDance Seed 1.6635037363043.0%
MiniMax M2.5676234272542.8%
GPT-5.4 Nano (Reasoning)544342383141.6%
Qwen 3.5 Flash575141391641.0%
Claude 3.7 Sonnet533939383440.7%
Z.AI GLM 4.7534640333040.2%
GPT-5.4 Nano (Reasoning, Low)494444352940.2%
Z.AI GLM 4.7 Flash59544641040.0%
Writer: Palmyra X563585421039.2%
Z.AI GLM 4.5545448201939.1%
Qwen 3.5 397B A17B565143252038.9%
Grok 4.20 (Beta)593837342638.9%
Qwen3 235B A22B Instruct 2507504330292836.1%
Qwen 3.5 Plus (2026-02-15)694632241036.0%
GPT-5 Nano414136342735.7%
Z.AI GLM 5 Turbo57503130835.3%
Claude Haiku 4.5453532322433.6%
MiniMax M2.7463934341533.5%
DeepSeek V3 (2025-03-24)54433732033.3%
Rocinante 12B58573911033.0%
Qwen 3.5 27B454430242232.7%
GPT-5.4 Nano433632322132.6%
Qwen 3.5 122B484327201530.6%
Z.AI GLM 5433527251829.4%
Gemini 3 Flash (Preview)453928201529.2%
MoonshotAI: Kimi K2.5622423211328.7%
Hermes 3 70B58332825028.7%
Claude Sonnet 4.5403530241328.1%
Qwen 3 32B4846377027.7%
Gemini 3 Pro (Preview)523131131127.4%
Llama 3.1 8B5443380027.0%
Mistral Small 4 (Reasoning)52372714226.4%
Aion 2.0443417161324.8%
Mistral Medium 3.1353226181224.7%
WizardLM 2 8x22b5338310024.4%
Gemini 3.1 Pro (Preview)43402610023.8%
Mistral Large 3343318161322.9%
Ministral 3 14B4943211022.7%
DeepSeek V3.247242118022.0%
Mistral Large41232114821.5%
Mistral Small 44440200020.9%
Hermes 3 405B39282414020.9%
DeepSeek V3 (2024-12-26)4139250020.9%
GPT-4.1302019141319.5%
Mistral Large 241221915019.5%
Stealth: Healer Alpha37291413219.0%
Inception Mercury94000018.9%
Llama 3.1 70B88000017.6%
Gemini 2.5 Pro33281413017.5%
Ministral 3B672000017.4%
Claude 3.5 Sonnet27241616116.7%
Mistral Small Creative24231414015.0%
Stealth: Hunter Alpha4415130014.4%
Arcee AI: Trinity Large (Preview)403120014.4%
Ministral 3 8B2826150013.9%
Z.AI GLM 4.61914125010.1%
Cohere Command R+ (Aug. 2024)50000010.1%
Gemini 3.1 Flash Lite (Preview)232080010.0%
Claude Sonnet 43753009.0%
DeepSeek-V2 Chat30140008.7%
Grok 4.1 Fast17129007.7%
Arcee AI: Trinity Mini3700007.3%
DeepSeek V3.118127007.3%
Gemini 3 Flash (Preview, Reasoning)13129207.2%
Ministral 8B2950006.8%
Mistral NeMO11100004.3%
Gemma 3 27B1370004.1%
LFM2 24B1700003.4%
Nemotron 3 Super1500003.1%
Ministral 3 3B1200002.4%
Grok 4 Fast1010002.2%
Gemini 2.5 Flash500001.1%
GPT-4o, May 13th (temp=1)400000.8%
Gemma 3 4B300000.7%
Mistral Small 3.2 24B300000.6%
Nemotron 3 Nano200000.4%
o4 Mini High000000.0%
o4 Mini000000.0%
Grok 4000000.0%
Gemini 2.5 Flash (Reasoning)000000.0%
Gemini 2.5 Flash Lite (Reasoning)000000.0%
GPT-4o, May 13th (temp=0)000000.0%
Inception Mercury 2000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Gemini 2.5 Flash Lite000000.0%
GPT-4o Mini (temp=1)000000.0%
Gemma 3 12B000000.0%
GPT-4o Mini (temp=0)000000.0%
Qwen 2.5 72B000000.0%
Llama 3.1 Nemotron 70B000000.0%
GPT-4.1 Nano000000.0%
Claude 3 Haiku000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 2.0 Mini878480716176.5%
GPT-5747370696570.1%
GPT-5.4767675665870.0%
GPT-5.4 (Reasoning, Low)797666574364.3%
ByteDance Seed 2.0 Lite797158574461.8%
GPT-5.4 (Reasoning)717058555361.4%
GPT-5.4 Mini786661494760.2%
ByteDance Seed 1.6 Flash756358544859.5%
GPT-5.4 Mini (Reasoning, Low)686159535158.5%
Rocinante 12B1007553361455.4%
Claude Opus 4.6706649464154.6%
GPT-5.1735950493653.6%
GPT-5.4 Mini (Reasoning)775250493853.2%
Qwen 3.5 27B675452404050.8%
Grok 4.20 (Beta)625049483649.0%
GPT-5.2655150501947.0%
GPT-5 Mini626049352546.4%
MiniMax M2.7684949382846.4%
Qwen 3.5 9B824943431446.3%
MiniMax M2.5665754272746.3%
GPT-5.4 Nano (Reasoning)575247373645.7%
Claude Opus 4646037362945.4%
Qwen 3.5 397B A17B585344403145.3%
Claude Sonnet 4.5745247341845.2%
Claude Opus 4.6 (Reasoning)615245402745.0%
Claude Haiku 4.5694941353144.9%
Qwen 3 32B625145352343.2%
Qwen 3.5 Flash624943371541.3%
Z.AI GLM 5 Turbo735951121041.2%
Qwen 3.5 122B555242342341.2%
ByteDance Seed 1.6464540353540.2%
GPT-5.4 Nano (Reasoning, Low)665430272540.2%
Z.AI GLM 4.7555244351340.0%
Arcee AI: Trinity Large (Preview)724632232239.0%
Claude Opus 4.5484440313138.8%
GPT-5 Nano494038353038.4%
Z.AI GLM 4.7 Flash515140321838.3%
Claude Sonnet 4.6603736322638.1%
Claude Sonnet 4.6 (Reasoning)554734332037.9%
Z.AI GLM 5634038252137.3%
MoonshotAI: Kimi K2.5613831292336.2%
Qwen 3.5 35B444240322336.2%
Gemini 3 Flash (Preview, Reasoning)403836352534.8%
GPT-5.4 Nano464432252233.7%
Qwen 3.5 Plus (2026-02-15)523732311533.6%
Gemini 3 Pro (Preview)484432291433.3%
Hermes 3 405B60543913033.2%
DeepSeek V3 (2025-03-24)6653397032.9%
Gemini 3 Flash (Preview)513829282032.9%
Qwen3 235B A22B Instruct 2507513930281131.8%
Claude 3.7 Sonnet413935181429.6%
Grok 4.20 (Beta, Reasoning)473718171527.0%
WizardLM 2 8x22b45412715025.8%
Writer: Palmyra X5383025191425.2%
Aion 2.034342920925.1%
DeepSeek V3 (2024-12-26)675200023.8%
Mistral Small Creative5029259423.5%
DeepSeek V3.236342716022.5%
Mistral Medium 3.132302316922.1%
GPT-4.1332619181422.0%
Claude Sonnet 44637260021.9%
Gemini 3.1 Pro (Preview)4640193021.7%
Stealth: Healer Alpha48261311520.6%
Claude 3.5 Sonnet4035210019.3%
Llama 3.1 8B5121170018.1%
Hermes 3 70B611900016.2%
Mistral Large 2491990015.4%
DeepSeek-V2 Chat75000015.0%
Mistral Large421785014.6%
Mistral Small 42822180013.5%
Z.AI GLM 4.53416151013.3%
Llama 3.1 70B62000012.4%
Mistral Small 4 (Reasoning)331864011.9%
Stealth: Hunter Alpha341291011.2%
Ministral 3 8B271990010.8%
Gemini 2.5 Pro272600010.6%
Grok 4.1 Fast2713100010.0%
Gemini 3.1 Flash Lite (Preview)24158009.3%
Gemma 3 27B28120008.1%
Gemma 3 12B2500005.1%
Mistral Small 3.2 24B1790005.1%
Ministral 3 3B1860004.7%
DeepSeek V3.11760004.6%
Ministral 3 14B13100004.5%
Z.AI GLM 4.62200004.5%
Stealth: Aurora Alpha900001.8%
Nemotron 3 Nano800001.5%
Mistral Large 3510001.2%
Cohere Command R+ (Aug. 2024)500001.0%
Gemini 2.5 Flash500001.0%
LFM2 24B300000.7%
Gemma 3 4B200000.4%
Gemini 2.5 Flash (Reasoning)100000.1%
o4 Mini High000000.0%
o4 Mini000000.0%
Grok 4000000.0%
Grok 4 Fast000000.0%
Gemini 2.5 Flash Lite (Reasoning)000000.0%
GPT-4o, May 13th (temp=0)000000.0%
Nemotron 3 Super000000.0%
Inception Mercury 2000000.0%
GPT-4o, May 13th (temp=1)000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Gemini 2.5 Flash Lite000000.0%
Inception Mercury000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Qwen 2.5 72B000000.0%
Llama 3.1 Nemotron 70B000000.0%
GPT-4.1 Nano000000.0%
Claude 3 Haiku000000.0%
Arcee AI: Trinity Mini000000.0%
Mistral NeMO000000.0%
Ministral 8B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5928887847986.2%
ByteDance Seed 2.0 Lite929089856584.2%
ByteDance Seed 1.6 Flash949189767284.2%
GPT-5.4918685817483.6%
Claude Sonnet 4.6 (Reasoning)979179726781.1%
GPT-5.4 (Reasoning)838282807480.4%
GPT-5.4 (Reasoning, Low)888478776979.3%
GPT-5.4 Mini (Reasoning)828279777278.3%
Qwen 3.5 27B837978777077.4%
GPT-5.1828282726977.3%
Claude Haiku 4.5878077746676.9%
GPT-5.4 Mini (Reasoning, Low)868577716576.8%
GPT-5.4 Mini898377696376.3%
MiniMax M2.7937978636174.9%
GPT-5.2827878696674.5%
Claude Opus 4.6 (Reasoning)818075736374.1%
GPT-5 Mini887671686072.9%
Claude Sonnet 4.6857773715772.4%
Claude Opus 4.5857767666271.6%
Claude Sonnet 4.5897969595670.5%
Z.AI GLM 5 Turbo787575645769.7%
Claude Opus 4.6787772616169.7%
Hermes 3 405B937768545268.7%
Claude Opus 4817968635168.5%
Z.AI GLM 4.7817373645168.4%
ByteDance Seed 2.0 Mini746965656467.4%
GPT-5.4 Nano (Reasoning, Low)787369585867.3%
GPT-5.4 Nano716867646266.5%
Qwen 3.5 35B848071653266.4%
Qwen 3.5 397B A17B807169585366.4%
Z.AI GLM 5727067635866.1%
Grok 4.20 (Beta, Reasoning)866463615565.9%
Gemini 3 Pro (Preview)756967595665.3%
GPT-5.4 Nano (Reasoning)736763626064.9%
Qwen 3.5 122B848072444464.8%
MiniMax M2.5806969672862.6%
Qwen 3.5 Flash756362585462.4%
Z.AI GLM 4.7 Flash797065544061.6%
Writer: Palmyra X5747261583960.7%
ByteDance Seed 1.6776966474460.7%
Qwen 3.5 9B767359544060.4%
Stealth: Healer Alpha787849444358.3%
DeepSeek V3.2676362564057.6%
MoonshotAI: Kimi K2.5686058564557.2%
Claude 3.7 Sonnet767452414056.6%
Gemini 3 Flash (Preview)676357573656.0%
Qwen3 235B A22B Instruct 2507746153464455.7%
Z.AI GLM 4.5705555514655.6%
Aion 2.0736362383854.8%
Mistral Small 4 (Reasoning)676261423854.0%
Qwen 3.5 Plus (2026-02-15)695754513753.7%
Mistral Small 4615856504353.5%
GPT-4.1796553393053.2%
Claude Sonnet 4676553513153.2%
Grok 4.20 (Beta)675852513853.2%
WizardLM 2 8x22b665753474253.1%
DeepSeek V3 (2025-03-24)716655541953.0%
Mistral Large 3665858413852.0%
Mistral Large 2696853432050.7%
Claude 3.5 Sonnet706460471250.5%
Gemini 3 Flash (Preview, Reasoning)545050484549.7%
Ministral 8B775150363349.3%
Stealth: Hunter Alpha555248484349.1%
Qwen 3 32B716363222047.5%
Gemini 3.1 Pro (Preview)605344413546.6%
Rocinante 12B100563934246.3%
GPT-5 Nano504845454446.3%
Mistral Medium 3.1695943313046.1%
Ministral 3 8B59585454045.1%
DeepSeek V3.1565248352743.7%
Llama 3.1 8B565650302343.1%
Mistral Large584342373442.8%
Gemini 2.5 Pro644845321741.3%
DeepSeek V3 (2024-12-26)786036201040.9%
Hermes 3 70B85443736040.5%
Gemma 3 27B453938373639.0%
Ministral 3 14B593834342537.9%
Arcee AI: Trinity Large (Preview)673431302637.7%
Grok 4544238282136.8%
Z.AI GLM 4.6723030301835.8%
o4 Mini453433332233.4%
Mistral Small Creative50504021132.2%
Inception Mercury8348280031.9%
Grok 4.1 Fast54413920031.0%
Gemini 3.1 Flash Lite (Preview)533929261031.0%
Nemotron 3 Super48452121027.3%
Grok 4 Fast333224211424.8%
Arcee AI: Trinity Mini5945200024.8%
DeepSeek-V2 Chat34322520022.4%
Cohere Command R+ (Aug. 2024)594400020.7%
Gemma 3 12B37222118320.3%
Gemini 2.5 Flash (Reasoning)473890018.9%
Llama 3.1 70B712000018.3%
GPT-4.1 Mini5221161018.1%
Gemini 2.5 Flash Lite29261812016.9%
Ministral 3B523100016.6%
Gemini 2.5 Flash443250016.4%
Mistral NeMO551193015.7%
LFM2 24B4817120015.6%
GPT-4o, May 13th (temp=1)3217158014.5%
Claude 3.5 Haiku393200014.1%
Nemotron 3 Nano323240013.5%
Gemini 2.5 Flash Lite (Reasoning)3915120013.2%
Mistral Small 3.2 24B62000012.5%
Claude 3 Haiku52500011.4%
o4 Mini High381080011.1%
Ministral 3 3B3890009.5%
Gemma 3 4B3800007.6%
Qwen 2.5 72B1400002.9%
Inception Mercury 21300002.7%
GPT-4o, Aug. 6th (temp=0)1100002.2%
GPT-4o, Aug. 6th (temp=1)740002.2%
GPT-4.1 Nano810002.0%
GPT-4o Mini (temp=1)720001.8%
GPT-4o, May 13th (temp=0)000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 Nemotron 70B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.4 Mini (Reasoning)888786868085.4%
Claude Sonnet 4.6959086866784.9%
GPT-5.4 (Reasoning)978879797683.8%
GPT-5.4938881737381.8%
GPT-5.4 (Reasoning, Low)828181807880.5%
ByteDance Seed 2.0 Lite848475757378.2%
GPT-5838276727277.0%
Claude Sonnet 4.6 (Reasoning)967979745676.8%
GPT-5.4 Mini (Reasoning, Low)838375756876.8%
GPT-5.4 Mini848179786076.3%
GPT-5.4 Nano (Reasoning)857676696674.5%
Claude Opus 4.6 (Reasoning)837776686674.1%
ByteDance Seed 1.6 Flash928679525071.6%
Claude Haiku 4.5807773695570.8%
GPT-5.2787471656470.5%
Claude Opus 4.6827973625670.4%
GPT-5.4 Nano (Reasoning, Low)767271686570.2%
Qwen 3.5 9B857771605770.0%
GPT-5 Mini727170706469.4%
GPT-5.1757467636268.2%
GPT-5.4 Nano787670605467.6%
Claude Opus 4727171645867.2%
MiniMax M2.5877668535267.1%
Z.AI GLM 5717067655966.4%
Claude Sonnet 4.5787170654465.4%
Claude 3.7 Sonnet817965525065.2%
Qwen 3.5 35B826564615364.9%
Claude Opus 4.5858459504464.5%
Mistral Large 2696464615762.9%
Z.AI GLM 5 Turbo81817169761.8%
Grok 4.20 (Beta)726461595161.2%
ByteDance Seed 1.6767167474260.8%
WizardLM 2 8x22b736556555059.8%
ByteDance Seed 2.0 Mini756562544059.1%
MiniMax M2.7856665443559.1%
Hermes 3 405B926060542958.9%
Writer: Palmyra X5726363544258.8%
Claude Sonnet 4727058474758.4%
Stealth: Hunter Alpha807574332958.2%
Mistral Large716858563557.5%
Rocinante 12B736557523756.6%
Mistral Small 4 (Reasoning)696256514656.5%
Qwen 3.5 Flash665853524454.5%
Stealth: Healer Alpha646354454454.0%
Z.AI GLM 4.7645754474753.7%
Qwen 3.5 122B665555494453.7%
Qwen3 235B A22B Instruct 2507796361432053.3%
Arcee AI: Trinity Large (Preview)875447413252.3%
Z.AI GLM 4.5825349492952.3%
Grok 4.20 (Beta, Reasoning)725350473952.1%
MoonshotAI: Kimi K2.5605950433749.8%
Gemini 3 Pro (Preview)635346444049.4%
Cohere Command R+ (Aug. 2024)625854402948.6%
Z.AI GLM 4.6755049402948.4%
GPT-5 Nano535244444347.2%
Mistral Medium 3.1575150443247.1%
Z.AI GLM 4.7 Flash555148433946.9%
Mistral Large 3646449292646.4%
DeepSeek V3.1565147443346.2%
Gemini 3 Flash (Preview)654944422945.7%
Mistral Small Creative524444413944.2%
Qwen 3.5 27B784935302843.7%
Gemma 3 27B604646352843.0%
Ministral 3 14B836527271242.6%
Qwen 3.5 397B A17B604542372742.4%
Gemini 3 Flash (Preview, Reasoning)535143422342.3%
Mistral Small 3.2 24B9777360041.9%
Mistral Small 4603938372640.2%
DeepSeek V3.2675041231839.8%
Qwen 3 32B454237363438.9%
Aion 2.0545341242038.6%
Llama 3.1 8B65553932038.1%
GPT-4.1524038372137.7%
Gemini 3.1 Pro (Preview)563836262235.8%
Qwen 3.5 Plus (2026-02-15)463933292634.5%
Gemini 2.5 Pro443836351734.2%
Gemini 3.1 Flash Lite (Preview)463939341033.5%
Gemma 3 12B504329232133.1%
Hermes 3 70B8940340032.4%
Gemini 2.5 Flash453030292832.3%
Grok 4523131171729.6%
Ministral 3B79371811029.1%
GPT-4o, May 13th (temp=1)49452416828.3%
DeepSeek-V2 Chat473027181327.2%
Ministral 8B42413016426.6%
Ministral 3 8B6933280025.8%
Claude 3 Haiku5531276124.0%
o4 Mini High362523171723.5%
DeepSeek V3 (2024-12-26)302926171523.5%
o4 Mini40332019022.6%
Inception Mercury733800022.2%
Grok 4.1 Fast34262222021.0%
DeepSeek V3 (2025-03-24)49281413020.9%
Ministral 3 3B33242318019.6%
Nemotron 3 Super5521174019.3%
Arcee AI: Trinity Mini4726230019.3%
Gemini 2.5 Flash Lite513700017.6%
Gemini 2.5 Flash (Reasoning)3623148116.3%
GPT-4.1 Mini4226120016.0%
Gemini 2.5 Flash Lite (Reasoning)4912100014.1%
Grok 4 Fast2323187014.1%
GPT-4.1 Nano3717150014.0%
Mistral NeMO3319170013.8%
Claude 3.5 Sonnet3617141013.5%
GPT-4o, May 13th (temp=0)262475012.5%
Gemma 3 4B21151010211.6%
Llama 3.1 70B32150009.3%
Nemotron 3 Nano17167509.1%
GPT-4o Mini (temp=0)26190008.9%
Inception Mercury 23522007.8%
Qwen 2.5 72B3020006.4%
LFM2 24B2341005.6%
GPT-4o, Aug. 6th (temp=1)1743004.9%
GPT-4o Mini (temp=1)1500003.0%
Stealth: Aurora Alpha752002.9%
GPT-4o, Aug. 6th (temp=0)1020002.5%
Llama 3.1 Nemotron 70B1020002.3%
Claude 3.5 Haiku000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.6 (Reasoning)1009392918191.3%
GPT-5.4 Mini (Reasoning)797878777477.1%
Claude Sonnet 4.6888076656574.8%
ByteDance Seed 2.0 Lite807976696974.6%
Z.AI GLM 5 Turbo858074676574.0%
ByteDance Seed 2.0 Mini858476665873.8%
GPT-5.4 (Reasoning, Low)787871686872.4%
Claude Opus 4.6848274625771.8%
ByteDance Seed 1.6 Flash797874715671.7%
Claude Haiku 4.5868379644571.4%
Qwen 3.5 9B888373684371.2%
GPT-5.4 (Reasoning)777472666671.0%
GPT-5.4 Mini (Reasoning, Low)827170686471.0%
GPT-5.4797875685270.4%
GPT-5817368646369.7%
Hermes 3 405B959457554869.6%
GPT-5.1797765645968.6%
ByteDance Seed 1.6917763565468.2%
GPT-5.4 Mini817266655567.9%
Claude Opus 4.5777468625467.1%
Claude Opus 4.6 (Reasoning)787574664367.0%
GPT-5 Mini757069625766.9%
Qwen 3.5 35B858278543666.9%
MiniMax M2.7706867666266.5%
WizardLM 2 8x22b797469594966.0%
MiniMax M2.5696968655665.4%
Z.AI GLM 5796562615965.1%
Mistral Large 2716968665064.8%
Claude Opus 4757470515164.4%
Qwen 3.5 397B A17B776765555463.8%
Qwen 3.5 122B846462545263.1%
GPT-5.4 Nano656561605661.5%
Qwen 3 32B666461605160.3%
Writer: Palmyra X5747366582559.2%
Claude Sonnet 4.5747371492959.2%
GPT-5.2666457555158.6%
GPT-5.4 Nano (Reasoning, Low)616059585258.2%
Gemini 3 Pro (Preview)666360504957.6%
Arcee AI: Trinity Large (Preview)827552472957.2%
Qwen 3.5 Flash766561533057.1%
Mistral Small 4 (Reasoning)736757474156.9%
GPT-5.4 Nano (Reasoning)686854474355.8%
Mistral Small 4736562423755.7%
MoonshotAI: Kimi K2.5766553493455.5%
Grok 4.20 (Beta, Reasoning)615554534954.4%
Z.AI GLM 4.7595757534253.8%
Z.AI GLM 4.7 Flash706258473053.3%
Gemini 3 Flash (Preview)616050504152.3%
Qwen 3.5 27B676251453652.1%
Claude Sonnet 4696256462752.0%
Qwen 3.5 Plus (2026-02-15)595751474552.0%
Stealth: Hunter Alpha715453472550.3%
Mistral Small Creative686357382449.9%
Gemini 2.5 Pro625352483249.5%
Grok 4.20 (Beta)665648443148.8%
Aion 2.0555353493248.5%
Gemini 3.1 Pro (Preview)655049423548.1%
Hermes 3 70B674741404047.1%
Rocinante 12B86853921046.2%
Z.AI GLM 4.5585641373645.5%
DeepSeek V3.1705945312245.4%
Claude 3.7 Sonnet604948402945.2%
Gemini 3 Flash (Preview, Reasoning)554846443345.0%
Ministral 3 14B744540333144.7%
Llama 3.1 8B67625630443.7%
Qwen3 235B A22B Instruct 2507634842343143.6%
DeepSeek V3.2575540382743.6%
Ministral 3B755737262143.1%
GPT-5 Nano704638372543.1%
DeepSeek V3 (2025-03-24)69644140042.8%
Mistral Large 3753937342742.2%
Mistral Large69584528440.6%
Inception Mercury9389200040.4%
GPT-4.1464237373339.1%
Mistral Medium 3.1454137322836.6%
Stealth: Healer Alpha60454034136.1%
Z.AI GLM 4.6464239331935.9%
Claude 3.5 Sonnet62503217032.1%
DeepSeek V3 (2024-12-26)483727261931.6%
Ministral 8B6048329730.9%
Gemini 3.1 Flash Lite (Preview)573632171130.6%
Nemotron 3 Super393828201928.8%
Grok 4.1 Fast484031131128.5%
Arcee AI: Trinity Mini7630258027.8%
Grok 4403925181327.1%
Mistral Small 3.2 24B48442120026.5%
Mistral NeMO5135325224.7%
Grok 4 Fast47362015023.9%
Ministral 3 3B5345130022.2%
Gemma 3 12B614520021.8%
Gemma 3 27B4526263320.7%
Ministral 3 8B5028212020.2%
Cohere Command R+ (Aug. 2024)732250019.9%
Gemini 2.5 Flash (Reasoning)3729178619.4%
DeepSeek-V2 Chat661087018.3%
o4 Mini High3123230015.5%
GPT-4.1 Nano432181014.5%
LFM2 24B421297014.0%
Claude 3.5 Haiku501370013.9%
Gemini 2.5 Flash362480013.6%
o4 Mini382410012.4%
Gemini 2.5 Flash Lite (Reasoning)391040010.6%
Claude 3 Haiku21138609.6%
GPT-4o, May 13th (temp=1)23167209.6%
GPT-4.1 Mini2687008.2%
GPT-4o, Aug. 6th (temp=0)3800007.6%
GPT-4o, May 13th (temp=0)3700007.5%
Llama 3.1 70B3500006.9%
Nemotron 3 Nano11102004.5%
GPT-4o Mini (temp=1)1200002.5%
Gemini 2.5 Flash Lite1100002.3%
Llama 3.1 Nemotron 70B900001.9%
Stealth: Aurora Alpha600001.2%
Qwen 2.5 72B300000.5%
GPT-4o, Aug. 6th (temp=1)300000.5%
Inception Mercury 2000000.1%
GPT-4o Mini (temp=0)000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 2.0 Lite898680797281.3%
GPT-5.4 Mini (Reasoning)888679756879.1%
GPT-5.4 (Reasoning)878078737278.1%
GPT-5858280756777.7%
GPT-5.4 Mini (Reasoning, Low)877976757077.3%
GPT-5.4 (Reasoning, Low)908177706677.0%
GPT-5.4838076737277.0%
MiniMax M2.5918080735676.2%
GPT-5 Mini847574727175.4%
Claude Sonnet 4.6917769676674.0%
GPT-5.1888272656274.0%
GPT-5.4 Mini787674706973.2%
ByteDance Seed 1.6 Flash797371716872.6%
Claude Sonnet 4.6 (Reasoning)958265645572.3%
Z.AI GLM 5898382594371.2%
GPT-5.2757170706970.9%
Claude Opus 4887367605769.3%
Claude Opus 4.5797972635068.7%
GPT-5.4 Nano696967666467.1%
Claude Sonnet 4.5797876712766.3%
GPT-5.4 Nano (Reasoning, Low)716564646365.3%
Claude Opus 4.6806662605765.0%
GPT-5.4 Nano (Reasoning)726966635264.6%
Claude Opus 4.6 (Reasoning)756868644864.5%
Z.AI GLM 5 Turbo916457514962.5%
Grok 4.20 (Beta, Reasoning)786457575662.5%
ByteDance Seed 1.6936752514661.7%
Qwen 3.5 397B A17B776759554861.3%
Claude 3.7 Sonnet856861523860.9%
MiniMax M2.7746463574660.8%
Qwen 3.5 27B856962503760.7%
Claude Haiku 4.5875955534760.2%
ByteDance Seed 2.0 Mini706755524958.5%
Gemini 3 Pro (Preview)715951514455.3%
Qwen 3.5 122B786658531954.8%
Z.AI GLM 4.7 Flash716054474054.5%
Z.AI GLM 4.7625856514454.3%
Qwen3 235B A22B Instruct 2507645452494653.0%
Qwen 3.5 Flash715453502951.5%
Qwen 3 32B79695653051.2%
Qwen 3.5 35B716344413450.4%
Writer: Palmyra X5595749463950.1%
GPT-4.1626043433548.7%
MoonshotAI: Kimi K2.5725852382348.6%
GPT-5 Nano565548424148.5%
Stealth: Hunter Alpha656454382148.3%
Claude Sonnet 4805438363348.2%
Rocinante 12B685857372148.1%
Qwen 3.5 9B63635945747.5%
Grok 4.20 (Beta)555049424047.2%
DeepSeek V3 (2025-03-24)785148361946.4%
Llama 3.1 8B585552511546.3%
Stealth: Healer Alpha625853421646.1%
Mistral Medium 3.1544944434046.1%
Mistral Small Creative734141403445.7%
Z.AI GLM 4.5615048462245.6%
DeepSeek V3.2574645423244.5%
WizardLM 2 8x22b544746462944.3%
Mistral Small 4 (Reasoning)655540332643.9%
DeepSeek V3.1545447392543.7%
Z.AI GLM 4.6754338313143.7%
Qwen 3.5 Plus (2026-02-15)564946392843.4%
Aion 2.0584342412441.4%
Gemini 3 Flash (Preview)565040322239.9%
Ministral 8B584432311936.7%
Arcee AI: Trinity Large (Preview)655529201436.6%
Mistral Large 2593737302036.5%
Gemini 3 Flash (Preview, Reasoning)444033322935.7%
Mistral Large58464427035.0%
DeepSeek V3 (2024-12-26)59484620034.7%
Claude 3 Haiku51433732533.7%
Mistral Small 4583627262033.5%
Mistral Large 3504824232133.3%
Grok 4524327271332.5%
Ministral 3 14B44433636232.3%
Gemini 3.1 Pro (Preview)433829282332.1%
Hermes 3 405B46433825030.4%
Claude 3.5 Sonnet6548380030.2%
Gemini 2.5 Pro49472922029.4%
DeepSeek-V2 Chat59343119128.7%
Gemma 3 27B42382929027.6%
o4 Mini High393231171226.4%
Gemini 3.1 Flash Lite (Preview)39393710025.0%
Llama 3.1 Nemotron 70B5150150023.2%
Ministral 3 3B44272317022.1%
Claude 3.5 Haiku554400019.8%
GPT-4o, May 13th (temp=0)5525140018.8%
Cohere Command R+ (Aug. 2024)423952217.9%
Ministral 3 8B612610017.5%
Gemma 3 12B4221146016.4%
Arcee AI: Trinity Mini373581016.2%
Ministral 3B472151015.0%
Hermes 3 70B471770014.2%
Qwen 2.5 72B2820150012.7%
Gemma 3 4B3413115012.5%
Grok 4 Fast242166512.2%
o4 Mini26168009.8%
Gemini 2.5 Flash (Reasoning)201910009.8%
Llama 3.1 70B25136009.0%
Grok 4.1 Fast131210808.7%
Gemini 2.5 Flash Lite21112007.0%
Inception Mercury3200006.4%
Gemini 2.5 Flash2620005.7%
Gemini 2.5 Flash Lite (Reasoning)1981005.6%
Mistral Small 3.2 24B2250005.5%
LFM2 24B1364004.5%
Nemotron 3 Super1390004.3%
Mistral NeMO1380004.3%
GPT-4o Mini (temp=1)1900003.7%
Nemotron 3 Nano1410003.2%
GPT-4o, May 13th (temp=1)943003.1%
Stealth: Aurora Alpha500001.0%
Inception Mercury 2300000.6%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
GPT-4.1 Nano000000.0%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 2.0 Lite757271625667.3%
GPT-5.4 (Reasoning, Low)756766625565.2%
GPT-5676565635863.7%
Claude Sonnet 4.6 (Reasoning)766967524361.3%
GPT-5.4706665554860.8%
Claude Sonnet 4.6706156515057.5%
GPT-5.4 (Reasoning)605956555156.0%
ByteDance Seed 1.6 Flash736352464154.9%
GPT-5 Mini716748443653.4%
Claude Haiku 4.5787350481853.3%
GPT-5.2585452515153.0%
GPT-5.4 Mini636158533053.0%
ByteDance Seed 2.0 Mini595756464452.4%
Claude Opus 4.6 (Reasoning)615951504152.2%
GPT-5.4 Mini (Reasoning, Low)545150494850.3%
GPT-5.4 Mini (Reasoning)645548443950.1%
MiniMax M2.5595547444249.5%
Z.AI GLM 4.5585647463949.2%
GPT-5.1595147393646.4%
Claude Sonnet 4.5565348433046.2%
Claude Opus 4.5584645443746.0%
Claude 3.7 Sonnet585547442345.3%
Claude Opus 4.6615642392745.0%
Z.AI GLM 4.7 Flash504644382640.8%
Z.AI GLM 5 Turbo564935303040.2%
GPT-5 Nano614039272538.5%
Hermes 3 405B55535026938.4%
MiniMax M2.7524946281538.1%
Qwen 3.5 9B464337323238.0%
Qwen 3.5 Plus (2026-02-15)533838362337.7%
Qwen3 235B A22B Instruct 2507543935332837.7%
Arcee AI: Trinity Large (Preview)493837343137.6%
Grok 4.20 (Beta, Reasoning)474035343137.5%
Grok 4.20 (Beta)504034332837.2%
GPT-5.4 Nano524137361937.2%
Claude Opus 4483939282535.8%
GPT-5.4 Nano (Reasoning, Low)474437292335.7%
ByteDance Seed 1.6544827252135.0%
Z.AI GLM 5534539271034.6%
MoonshotAI: Kimi K2.5494730232234.3%
Mistral Small 4 (Reasoning)513834252133.7%
Rocinante 12B50404036033.3%
Z.AI GLM 4.7423130292831.9%
Mistral Medium 3.1453930211930.7%
Mistral Small Creative403530292030.7%
Gemini 3.1 Pro (Preview)413534311130.3%
Writer: Palmyra X5373332321229.3%
Aion 2.0472726242028.7%
GPT-5.4 Nano (Reasoning)323231232228.2%
Qwen 3.5 35B50393612027.3%
Gemini 3 Pro (Preview)343330271327.3%
Qwen 3.5 27B41393015826.6%
Mistral Small 4393427201226.4%
GPT-4.1433420181726.3%
Llama 3.1 8B5546300026.3%
Mistral Large 2313124232126.0%
Gemini 3 Flash (Preview, Reasoning)48372316024.8%
Claude 3.5 Sonnet46282820024.4%
Ministral 3 8B5637251023.8%
Qwen 3 32B49272417023.3%
Mistral Large 3342923151122.5%
Mistral Large332621181121.7%
Claude Sonnet 4372216151520.9%
Gemini 3.1 Flash Lite (Preview)4032125017.9%
DeepSeek V3.227242115017.5%
Qwen 3.5 Flash4414139717.5%
Qwen 3.5 122B5519112017.5%
Gemini 3 Flash (Preview)3825221017.2%
LFM2 24B31212111016.7%
Hermes 3 70B522500015.3%
Ministral 3 14B242476012.3%
Z.AI GLM 4.6301910010.0%
Qwen 3.5 397B A17B251285010.0%
Stealth: Hunter Alpha231610009.7%
DeepSeek V3.1201310008.5%
Gemini 2.5 Pro25170008.4%
DeepSeek-V2 Chat30102008.3%
Nemotron 3 Super18156208.1%
DeepSeek V3 (2024-12-26)16160006.3%
Grok 414140005.6%
Cohere Command R+ (Aug. 2024)1574005.3%
Qwen 2.5 72B2500005.0%
o4 Mini2410004.9%
Llama 3.1 70B2200004.5%
DeepSeek V3 (2025-03-24)2100004.2%
Gemma 3 12B2000004.1%
Mistral NeMO2000004.0%
Claude 3 Haiku1800003.6%
Gemini 2.5 Flash1143003.6%
Mistral Small 3.2 24B1700003.5%
Stealth: Healer Alpha1510003.4%
Ministral 8B754003.3%
Grok 4 Fast1600003.1%
Gemini 2.5 Flash Lite1400002.7%
Inception Mercury1200002.4%
GPT-4o, Aug. 6th (temp=1)1200002.3%
Ministral 3B432001.9%
Nemotron 3 Nano900001.8%
Ministral 3 3B800001.6%
Grok 4.1 Fast600001.2%
Gemma 3 27B210000.5%
o4 Mini High200000.4%
Claude 3.5 Haiku200000.4%
GPT-4o, May 13th (temp=0)100000.3%
Arcee AI: Trinity Mini000000.0%
Gemini 2.5 Flash (Reasoning)000000.0%
Gemini 2.5 Flash Lite (Reasoning)000000.0%
Inception Mercury 2000000.0%
GPT-4o, May 13th (temp=1)000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 Nemotron 70B000000.0%
GPT-4.1 Nano000000.0%
WizardLM 2 8x22b000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5757474736972.9%
ByteDance Seed 2.0 Mini777674655569.3%
ByteDance Seed 2.0 Lite817557575264.2%
Claude Opus 4.6756763535061.4%
GPT-5.4 (Reasoning, Low)706762514558.8%
GPT-5.4 (Reasoning)636157554556.3%
GPT-5.4635752513952.3%
Claude Sonnet 4.6635449493950.6%
MiniMax M2.7555451474650.5%
ByteDance Seed 1.6 Flash615651483550.4%
GPT-5 Mini575453413748.7%
GPT-5.4 Mini (Reasoning)675346433148.2%
Grok 4.20 (Beta)605744433547.9%
Qwen 3.5 9B675547422547.2%
MiniMax M2.5625453363047.0%
Z.AI GLM 5 Turbo665250491446.2%
Claude Sonnet 4.6 (Reasoning)535150423546.1%
Claude Opus 4.6 (Reasoning)595041363544.2%
GPT-5.4 Mini625146322843.9%
GPT-5.2514746433243.6%
GPT-5.4 Mini (Reasoning, Low)544540403442.9%
GPT-5 Nano464544413642.4%
GPT-5.1454140403840.8%
Claude Opus 458574036939.9%
Grok 4.20 (Beta, Reasoning)664633262639.4%
Claude Haiku 4.5554134343139.1%
Mistral Small 4584234332638.6%
Qwen 3.5 122B544737322338.5%
GPT-5.4 Nano (Reasoning, Low)545044301438.4%
Gemini 3 Pro (Preview)564639371338.2%
Z.AI GLM 5504439342337.8%
GPT-5.4 Nano444239313137.4%
Mistral Medium 3.1543833302936.9%
Qwen 3.5 27B494238342236.8%
Llama 3.1 8B614937191736.5%
Qwen 3.5 Flash514135272235.4%
GPT-4.1683531251735.3%
Qwen 3.5 397B A17B454140311734.7%
Gemini 3 Flash (Preview)453535322634.6%
GPT-5.4 Nano (Reasoning)474135262334.4%
Claude 3.7 Sonnet434138381034.1%
Claude Sonnet 4.573462715232.5%
MoonshotAI: Kimi K2.5454127251630.7%
Z.AI GLM 4.7 Flash433332271830.4%
Mistral Small 4 (Reasoning)443427221628.5%
Z.AI GLM 4.7333129272028.1%
Gemini 3.1 Pro (Preview)403529221427.9%
Claude Opus 4.552333312827.6%
Qwen 3 32B51462214026.8%
Rocinante 12B4840279024.8%
Gemini 3 Flash (Preview, Reasoning)403127151024.6%
DeepSeek V3.2373324191024.3%
Qwen 3.5 35B342524211824.3%
DeepSeek V3 (2025-03-24)34343219023.8%
Z.AI GLM 4.55337109021.8%
Ministral 3 14B4226249120.5%
Qwen 2.5 72B100000020.0%
Qwen 3.5 Plus (2026-02-15)36301512519.6%
ByteDance Seed 1.6393398719.1%
Writer: Palmyra X53630254018.7%
Z.AI GLM 4.64720168018.3%
Qwen3 235B A22B Instruct 2507401313121117.6%
Claude 3.5 Sonnet3531128017.3%
Mistral Small Creative31241710417.0%
Mistral Large 23915129014.9%
Stealth: Hunter Alpha412364014.7%
Aion 2.024231610014.5%
Arcee AI: Trinity Large (Preview)451182013.4%
DeepSeek V3.12919170013.0%
Hermes 3 70B281470010.0%
Stealth: Healer Alpha181810009.3%
Gemma 3 27B3280008.1%
Mistral Large 32964007.8%
Gemini 3.1 Flash Lite (Preview)21140006.9%
Claude Sonnet 418132006.6%
Grok 4 Fast2910006.0%
Hermes 3 405B2730005.9%
Gemini 2.5 Pro1397005.8%
LFM2 24B16120005.7%
Claude 3.5 Haiku2600005.2%
Ministral 3B1860005.0%
Mistral Large1950004.8%
o4 Mini1260003.8%
Ministral 3 8B1230003.0%
Grok 4.1 Fast1140003.0%
Ministral 8B654002.9%
Grok 41200002.4%
Gemini 2.5 Flash Lite (Reasoning)1100002.3%
Gemma 3 12B900001.7%
DeepSeek V3 (2024-12-26)900001.7%
Cohere Command R+ (Aug. 2024)800001.6%
Ministral 3 3B700001.4%
Mistral Small 3.2 24B700001.3%
Nemotron 3 Super600001.1%
Inception Mercury500001.0%
o4 Mini High000000.0%
Gemini 2.5 Flash (Reasoning)000000.0%
GPT-4o, May 13th (temp=0)000000.0%
DeepSeek-V2 Chat000000.0%
Inception Mercury 2000000.0%
GPT-4o, May 13th (temp=1)000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Gemini 2.5 Flash Lite000000.0%
Gemini 2.5 Flash000000.0%
GPT-4o Mini (temp=1)000000.0%
Llama 3.1 70B000000.0%
GPT-4o Mini (temp=0)000000.0%
Nemotron 3 Nano000000.0%
Llama 3.1 Nemotron 70B000000.0%
GPT-4.1 Nano000000.0%
Claude 3 Haiku000000.0%
WizardLM 2 8x22b000000.0%
Arcee AI: Trinity Mini000000.0%
Gemma 3 4B000000.0%
Mistral NeMO000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.4928883827984.8%
Claude Sonnet 4.6 (Reasoning)878580807881.7%
ByteDance Seed 2.0 Lite848382817981.7%
GPT-5.4 (Reasoning)868282827581.5%
ByteDance Seed 1.6 Flash878785786881.1%
GPT-5.4 Mini (Reasoning)878179787780.4%
ByteDance Seed 2.0 Mini878679767480.4%
GPT-5828180807880.3%
Z.AI GLM 5 Turbo928675747079.6%
GPT-5.4 (Reasoning, Low)868382737379.4%
Claude Sonnet 4.6858176767578.5%
GPT-5.1817975747376.4%
MiniMax M2.7878682646376.3%
Claude Opus 4.6 (Reasoning)817875757075.9%
GPT-5.4 Mini (Reasoning, Low)847775736775.1%
GPT-5.4 Mini827973727075.0%
Grok 4.20 (Beta)847473716874.2%
GPT-5 Mini807875755973.5%
Grok 4.20 (Beta, Reasoning)777371675668.8%
GPT-5.4 Nano (Reasoning, Low)717169666468.1%
Claude Opus 4.6766866646267.4%
MiniMax M2.5737270684966.5%
Claude Haiku 4.5896563595666.2%
GPT-5.2767068664865.4%
ByteDance Seed 1.6696964635563.9%
GPT-5.4 Nano (Reasoning)696762615963.8%
GPT-5.4 Nano666562615962.6%
MoonshotAI: Kimi K2.5727162545462.3%
Claude Opus 4.5696864585162.0%
Claude Sonnet 4.5716464575161.4%
Qwen 3 32B736463574861.0%
Writer: Palmyra X5686361575560.9%
Ministral 3 8B827159444460.0%
Claude Opus 4646158575459.0%
GPT-4.1696461613958.7%
Aion 2.0686760484557.7%
Z.AI GLM 5656259544657.1%
Claude 3.5 Sonnet656554524856.6%
Z.AI GLM 4.7715655544355.8%
Ministral 8B726457463755.4%
Claude 3.7 Sonnet696755473855.2%
GPT-5 Nano706349464254.0%
Gemini 3 Flash (Preview)605756494753.7%
Qwen3 235B A22B Instruct 2507796142424152.7%
Mistral Small 4 (Reasoning)785148463952.4%
Mistral Large686154482952.2%
Mistral Small 4625656503551.8%
Ministral 3 14B585351474751.3%
DeepSeek V3 (2025-03-24)785856471651.3%
Gemini 3 Pro (Preview)606046424049.4%
Gemini 2.5 Pro575547444249.0%
Qwen 3.5 9B766546342348.8%
Arcee AI: Trinity Large (Preview)626046443248.7%
DeepSeek V3.2565653413548.1%
Mistral Medium 3.1686149302947.4%
Z.AI GLM 4.5765638352445.7%
Claude Sonnet 4574844423545.2%
Gemini 3 Flash (Preview, Reasoning)625351382245.1%
Stealth: Healer Alpha645541323144.5%
Mistral Large 2695745322044.4%
Z.AI GLM 4.7 Flash535146393244.3%
Qwen 3.5 Flash554443403844.3%
Z.AI GLM 4.6524744383643.5%
o4 Mini655048311441.7%
Mistral Small Creative64534938441.6%
Gemini 3.1 Pro (Preview)505049332541.4%
DeepSeek V3.1484640403241.3%
Mistral Large 3484645442141.0%
Rocinante 12B68595720040.9%
Qwen 3.5 397B A17B615136252439.4%
Qwen 3.5 122B544945321238.5%
Qwen 3.5 35B513937331334.5%
Gemini 3.1 Flash Lite (Preview)513731272734.5%
Qwen 3.5 Plus (2026-02-15)524535211934.4%
o4 Mini High474335252234.2%
Inception Mercury937500033.6%
Llama 3.1 8B5855530033.4%
Qwen 3.5 27B484726222233.1%
Stealth: Hunter Alpha473228271930.6%
DeepSeek V3 (2024-12-26)56502510028.3%
Hermes 3 405B573724111027.8%
Nemotron 3 Super323025231525.1%
Gemma 3 27B59291917024.8%
Grok 4 Fast342321191823.1%
Grok 4322923141021.8%
Llama 3.1 70B6035130021.7%
Claude 3.5 Haiku31302221020.9%
Ministral 3 3B5333153020.7%
LFM2 24B40312210020.6%
Ministral 3B31282717020.6%
Hermes 3 70B3635255020.3%
Gemma 3 12B252322201120.2%
Grok 4.1 Fast35262012619.8%
Gemini 2.5 Flash Lite (Reasoning)4333156019.4%
GPT-4.1 Mini3728148017.3%
Mistral Small 3.2 24B434100016.8%
DeepSeek-V2 Chat601193016.3%
Gemini 2.5 Flash Lite382386015.1%
Mistral NeMO3124171014.6%
Gemini 2.5 Flash2723213014.6%
WizardLM 2 8x22b382900013.4%
Gemini 2.5 Flash (Reasoning)2517166012.6%
Nemotron 3 Nano351682112.3%
GPT-4o, May 13th (temp=1)431700011.9%
Stealth: Aurora Alpha21158409.8%
Arcee AI: Trinity Mini25240009.8%
Cohere Command R+ (Aug. 2024)17127307.8%
Gemma 3 4B1798006.8%
GPT-4.1 Nano3300006.5%
Claude 3 Haiku2330005.1%
Inception Mercury 21520003.4%
Qwen 2.5 72B810001.9%
GPT-4o Mini (temp=1)300000.6%
Llama 3.1 Nemotron 70B100000.2%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.4 Mini (Reasoning)918583818083.9%
GPT-5.4928983787683.7%
GPT-5.4 (Reasoning)898381787581.2%
Claude Sonnet 4.6 (Reasoning)908783747180.9%
GPT-5.4 Mini848482797080.0%
Claude Sonnet 4.6917877767479.3%
GPT-5.4 (Reasoning, Low)828281797279.2%
ByteDance Seed 2.0 Lite858181777179.1%
GPT-5 Mini858079767278.4%
GPT-5.4 Mini (Reasoning, Low)918581695576.0%
GPT-5857675726875.2%
Claude Opus 4.5898170696775.0%
Claude Opus 4.6837776726674.7%
Claude Opus 4.6 (Reasoning)878174615972.3%
GPT-5.2777373726471.6%
ByteDance Seed 2.0 Mini817765634967.1%
GPT-5.1727269645766.8%
GPT-5.4 Nano (Reasoning, Low)796867635666.7%
ByteDance Seed 1.6787267605666.6%
ByteDance Seed 1.6 Flash806864616066.6%
Z.AI GLM 4.5787566585666.3%
Grok 4.20 (Beta)867166555365.9%
GPT-5.4 Nano747366595264.8%
Z.AI GLM 5756863615464.2%
GPT-5.4 Nano (Reasoning)676563636163.7%
MoonshotAI: Kimi K2.5736662565361.9%
Claude Haiku 4.5797466484361.9%
GPT-5 Nano706161595661.4%
Grok 4.20 (Beta, Reasoning)706959575160.9%
Z.AI GLM 5 Turbo686461575360.6%
Claude Opus 4756462564660.4%
MiniMax M2.7807159484360.3%
MiniMax M2.5736962484759.9%
Gemini 3 Pro (Preview)706658555059.7%
Claude 3.7 Sonnet666464624259.5%
Claude Sonnet 4.5826348474657.1%
Claude 3.5 Sonnet875553523756.7%
Qwen 3.5 397B A17B636156515156.3%
Gemini 3.1 Pro (Preview)656563443955.2%
Rocinante 12B716564502455.0%
Qwen 3.5 Flash646260454454.9%
DeepSeek-V2 Chat765555513354.2%
Qwen3 235B A22B Instruct 2507817253461453.0%
Claude Sonnet 4626156503552.8%
Qwen 3.5 9B665752464152.4%
Stealth: Healer Alpha635552484051.5%
Qwen 3.5 122B575352504651.5%
Hermes 3 405B795250403551.2%
Ministral 3 8B707062342151.2%
Hermes 3 70B756739373650.9%
Gemini 3.1 Flash Lite (Preview)716547392950.2%
LFM2 24B615653443449.6%
Qwen 3.5 27B665350493149.4%
Gemini 3 Flash (Preview)535151444448.8%
Ministral 3 14B656542373047.7%
Stealth: Hunter Alpha655352392847.4%
Z.AI GLM 4.7 Flash545150413947.1%
Mistral Large 2656047382546.9%
Qwen 3.5 Plus (2026-02-15)605952342946.9%
Gemini 2.5 Pro545248463346.8%
Arcee AI: Trinity Large (Preview)656046322946.5%
Mistral Large 3575454412746.3%
Mistral Medium 3.1605345393446.2%
Writer: Palmyra X5595248393246.0%
Mistral Large534948473345.9%
Qwen 3.5 35B515045413845.1%
Z.AI GLM 4.7565047403144.7%
Qwen 3 32B654741353043.6%
Mistral Small Creative574540363542.6%
Gemini 3 Flash (Preview, Reasoning)514842363542.5%
DeepSeek V3.2585040343042.3%
Z.AI GLM 4.6595143361741.2%
Aion 2.0535047292741.1%
Mistral Small 4 (Reasoning)584943381740.9%
Gemini 2.5 Flash565143361439.9%
DeepSeek V3.1544941411339.6%
GPT-4.1594738282238.8%
Llama 3.1 8B646333201438.6%
Ministral 8B544827272636.7%
Ministral 3B524535302036.4%
Gemini 2.5 Flash (Reasoning)564039261134.4%
Grok 4474333252334.1%
DeepSeek V3 (2025-03-24)393633321631.2%
Nemotron 3 Nano533926221330.7%
Mistral Small 4513226252030.6%
Grok 4.1 Fast483532231330.0%
Gemma 3 4B543832131329.8%
DeepSeek V3 (2024-12-26)54492712729.8%
Llama 3.1 70B6664150029.1%
GPT-4.1 Mini63333214028.5%
Gemini 2.5 Flash Lite (Reasoning)47393614027.1%
Grok 4 Fast423333161127.0%
Cohere Command R+ (Aug. 2024)6460100026.9%
Gemma 3 27B41392819225.9%
o4 Mini36323030125.7%
GPT-4o, May 13th (temp=0)48342016424.3%
Nemotron 3 Super313128171123.5%
GPT-4o, May 13th (temp=1)37341816722.4%
Ministral 3 3B38312515021.9%
Gemma 3 12B57201714121.7%
Claude 3 Haiku39352110020.8%
Arcee AI: Trinity Mini39262513020.5%
Mistral NeMO29232214718.9%
o4 Mini High3830178018.7%
Stealth: Aurora Alpha262018141218.0%
GPT-4o Mini (temp=1)3632150016.5%
Inception Mercury 23222207016.3%
Gemini 2.5 Flash Lite3922135115.9%
Mistral Small 3.2 24B2921203014.7%
Llama 3.1 Nemotron 70B30141413014.3%
GPT-4.1 Nano29141411013.6%
Qwen 2.5 72B2722119013.6%
WizardLM 2 8x22b2921170013.4%
GPT-4o, Aug. 6th (temp=0)251993011.1%
GPT-4o, Aug. 6th (temp=1)21126108.0%
Claude 3.5 Haiku1700003.5%
GPT-4o Mini (temp=0)1310002.9%
Inception Mercury1300002.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.6919190886384.9%
ByteDance Seed 1.6 Flash918785837684.4%
ByteDance Seed 2.0 Lite918681797782.9%
Claude Sonnet 4.6 (Reasoning)948282737180.3%
ByteDance Seed 2.0 Mini888482816680.1%
GPT-5817877767176.6%
GPT-5 Mini817873736974.9%
Claude Opus 4.6847670706472.7%
GPT-5.4 (Reasoning, Low)787773706172.1%
GPT-5.4 (Reasoning)777372706871.8%
GPT-5.4787569676670.9%
Claude Opus 4.6 (Reasoning)767672686270.9%
MiniMax M2.7807770665970.6%
ByteDance Seed 1.6917568655370.4%
GPT-5.1887067605968.8%
Z.AI GLM 5 Turbo827364615867.6%
GPT-5.4 Mini747170565364.9%
Claude Haiku 4.5717068625164.4%
MiniMax M2.5747169574964.0%
GPT-5.4 Mini (Reasoning)736766605263.5%
Arcee AI: Trinity Large (Preview)876960534763.1%
Claude Opus 4.5717064624762.7%
Claude Sonnet 4.5776864633962.4%
GPT-5.4 Nano666664635362.4%
Grok 4.20 (Beta, Reasoning)787063534762.2%
GPT-5.4 Nano (Reasoning)796557565362.0%
GPT-5.4 Nano (Reasoning, Low)797057525061.7%
Writer: Palmyra X5727164544461.1%
GPT-5.4 Mini (Reasoning, Low)726459565461.1%
Claude Opus 4756060594860.3%
Z.AI GLM 5736355555360.0%
GPT-5.2666462594960.0%
DeepSeek V3 (2025-03-24)1005947413957.1%
Claude Sonnet 4825857543557.1%
Mistral Large717057493355.8%
GPT-5 Nano615959484855.0%
Qwen 3.5 397B A17B666458483554.0%
Mistral Large 3696952424054.0%
Grok 4.20 (Beta)595756544253.4%
Rocinante 12B916049422253.0%
Llama 3.1 8B635454534052.9%
Qwen3 235B A22B Instruct 2507655550494352.7%
Claude 3.5 Sonnet635755473751.8%
Claude 3.7 Sonnet595452473950.2%
GPT-4.1626149403850.1%
Qwen 3 32B666452353350.0%
Ministral 3 14B656456333249.9%
Gemini 3 Pro (Preview)635348454149.7%
Z.AI GLM 4.7575652463849.7%
Aion 2.0605856383749.5%
Qwen 3.5 Flash615958402949.5%
Gemini 3 Flash (Preview, Reasoning)706244392848.9%
Qwen 3.5 122B595049453948.3%
Z.AI GLM 4.5595553413348.1%
Mistral Large 2605151463147.8%
Gemini 3 Flash (Preview)674947453147.7%
MoonshotAI: Kimi K2.5575648443347.6%
Mistral Small 4 (Reasoning)655941393347.3%
Claude 3.5 Haiku786352261847.3%
Ministral 3 8B716540342647.2%
Qwen 3.5 9B555246433846.9%
Mistral Small 4585452373346.8%
Qwen 3.5 35B695742312945.4%
Qwen 3.5 Plus (2026-02-15)595044383345.1%
Hermes 3 70B684943372444.2%
Z.AI GLM 4.7 Flash574641393743.8%
Mistral Medium 3.1494847453043.7%
DeepSeek V3.2534945382642.1%
Stealth: Healer Alpha624739342541.3%
Stealth: Hunter Alpha574843322440.7%
Qwen 3.5 27B504643371939.1%
LFM2 24B514539322838.8%
Mistral Small Creative534840282238.2%
Gemini 3.1 Pro (Preview)474241362437.9%
Z.AI GLM 4.6524029271933.4%
Gemini 3.1 Flash Lite (Preview)474338201733.1%
DeepSeek V3 (2024-12-26)54443815431.3%
DeepSeek V3.1493131241630.3%
DeepSeek-V2 Chat473730241229.9%
Ministral 3 3B474030181429.8%
Grok 4423532211729.4%
Gemini 2.5 Pro473424211728.8%
Hermes 3 405B5448189927.4%
Ministral 3B482918171225.1%
Ministral 8B44302821024.8%
Cohere Command R+ (Aug. 2024)6937180024.8%
o4 Mini High352826191624.7%
Mistral NeMO552817121224.7%
Grok 4 Fast38272624924.6%
Nemotron 3 Super36342617423.4%
o4 Mini352620201523.4%
GPT-4o, May 13th (temp=0)3431288721.9%
GPT-4.1 Mini604210020.7%
Gemma 3 27B5026148119.9%
Grok 4.1 Fast34221716418.6%
Mistral Small 3.2 24B4524211018.1%
Arcee AI: Trinity Mini5423100017.5%
Llama 3.1 70B632400017.5%
Gemini 2.5 Flash32231411416.7%
Inception Mercury611500015.2%
Gemini 2.5 Flash Lite3018110011.7%
Qwen 2.5 72B471000011.5%
Nemotron 3 Nano211693110.0%
Gemini 2.5 Flash (Reasoning)24183009.1%
Inception Mercury 226172008.8%
GPT-4.1 Nano181610008.8%
Stealth: Aurora Alpha19175008.2%
GPT-4o, Aug. 6th (temp=1)23123107.8%
Gemma 3 4B2040004.8%
Gemma 3 12B1670004.5%
Gemini 2.5 Flash Lite (Reasoning)965004.0%
GPT-4o, Aug. 6th (temp=0)1500003.1%
GPT-4o, May 13th (temp=1)1400002.9%
Claude 3 Haiku300000.5%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 Nemotron 70B000000.0%
WizardLM 2 8x22b000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.61008577777683.0%
ByteDance Seed 2.0 Lite929087737082.4%
Claude Opus 4.6 (Reasoning)888275727077.7%
GPT-5808079737277.0%
GPT-5.4 (Reasoning, Low)868277716776.8%
GPT-5.4848173727176.2%
GPT-5.4 Mini (Reasoning, Low)797776736974.7%
GPT-5.4 (Reasoning)827876716474.5%
Claude Sonnet 4.6 (Reasoning)817874696974.4%
MiniMax M2.5817674706673.6%
GPT-5.4 Mini797570706772.2%
MiniMax M2.7877472666372.2%
GPT-5 Mini837569666672.0%
GPT-5.4 Mini (Reasoning)807473686471.9%
Claude Opus 4.5777369676770.4%
ByteDance Seed 2.0 Mini757471666169.3%
Z.AI GLM 5 Turbo847664616069.0%
Claude Haiku 4.5777470665568.5%
GPT-5.4 Nano (Reasoning, Low)727266666367.6%
Claude Opus 4887262615567.6%
ByteDance Seed 1.6 Flash888171504867.5%
Claude Sonnet 4.5787164636067.3%
ByteDance Seed 1.6856968604765.9%
GPT-5.4 Nano726766616065.1%
GPT-5.1746868575564.3%
Z.AI GLM 5816867564463.2%
GPT-5.4 Nano (Reasoning)796360575663.0%
Z.AI GLM 4.7777464543961.5%
Claude 3.7 Sonnet696462595060.9%
WizardLM 2 8x22b10010066281060.8%
GPT-5.2796661573659.8%
Claude Sonnet 4757159484559.4%
Claude Opus 4.6686157554757.8%
DeepSeek V3 (2025-03-24)835755543857.5%
Grok 4.20 (Beta, Reasoning)686060504657.2%
MoonshotAI: Kimi K2.5776160464157.0%
Claude 3.5 Sonnet726758552956.3%
Llama 3.1 8B706155494155.2%
Rocinante 12B926149433055.2%
Gemini 3 Flash (Preview, Reasoning)746356453654.8%
Qwen 3.5 122B635655464452.8%
Gemini 3 Flash (Preview)675752434352.3%
Grok 4.20 (Beta)555452514852.2%
Gemini 3 Pro (Preview)676353443251.9%
Z.AI GLM 4.5725951413551.6%
Z.AI GLM 4.7 Flash685648463350.2%
Qwen 3.5 Flash695555353249.4%
Qwen 3.5 397B A17B595746404048.6%
DeepSeek-V2 Chat745046393348.5%
Ministral 3 14B615648443148.1%
Hermes 3 70B676147372948.1%
Mistral Small 4 (Reasoning)655148433047.1%
Qwen 3.5 9B615242413846.7%
Aion 2.0646241382946.5%
Mistral Large 3705852282346.2%
Stealth: Hunter Alpha595048422945.7%
GPT-5 Nano554745423945.7%
Arcee AI: Trinity Large (Preview)615855312245.2%
Qwen 3.5 35B574945373344.1%
DeepSeek V3.1615437352943.2%
DeepSeek V3.2504947392542.0%
Gemini 3.1 Flash Lite (Preview)575244322541.9%
Qwen3 235B A22B Instruct 2507595736342241.7%
Gemini 3.1 Pro (Preview)515143362641.7%
Qwen 3.5 Plus (2026-02-15)514837353140.3%
Writer: Palmyra X5615351261040.3%
Mistral Small 4544537333340.3%
Stealth: Healer Alpha695043231539.9%
Mistral Medium 3.1504637343039.5%
Mistral Large 2574643321738.8%
Ministral 3 8B504637332738.6%
Mistral Small Creative504834332638.3%
Qwen 3.5 27B474443401437.5%
Gemini 2.5 Pro614931261736.8%
DeepSeek V3 (2024-12-26)62373725633.6%
Qwen 3 32B50434130032.9%
Ministral 8B504632221132.2%
Z.AI GLM 4.6494031211731.5%
Gemma 3 27B383732291930.9%
Mistral Large542726202029.6%
Gemma 3 4B51472910829.1%
o4 Mini48432721027.6%
GPT-4.1423823221027.4%
o4 Mini High403026211726.7%
Hermes 3 405B6144270026.3%
Gemini 2.5 Flash46412913025.7%
Grok 4.1 Fast492120201625.1%
Ministral 3B6032284024.7%
Grok 44734287023.2%
Gemini 2.5 Flash Lite60321211023.1%
Llama 3.1 70B37322118021.6%
Claude 3.5 Haiku4240158021.1%
Ministral 3 3B5823118020.0%
Cohere Command R+ (Aug. 2024)521270014.2%
GPT-4o, May 13th (temp=0)481600012.8%
Grok 4 Fast20181511012.7%
Gemini 2.5 Flash Lite (Reasoning)3019104012.5%
LFM2 24B3314120011.8%
Arcee AI: Trinity Mini191713009.9%
Gemini 2.5 Flash (Reasoning)221512009.8%
Llama 3.1 Nemotron 70B23204009.5%
Gemma 3 12B23136208.9%
GPT-4o, May 13th (temp=1)29130008.5%
Claude 3 Haiku3273008.4%
Nemotron 3 Super22109008.2%
GPT-4.1 Nano25142008.1%
Mistral NeMO2932107.1%
Qwen 2.5 72B2700005.5%
Inception Mercury 21700003.3%
GPT-4o, Aug. 6th (temp=0)1500003.0%
GPT-4.1 Mini760002.7%
GPT-4o, Aug. 6th (temp=1)1100002.3%
Inception Mercury900001.8%
Nemotron 3 Nano420001.2%
GPT-4o Mini (temp=1)500001.0%
Stealth: Aurora Alpha000000.0%
Mistral Small 3.2 24B000000.0%
GPT-4o Mini (temp=0)000000.0%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.4 (Reasoning)777469686871.1%
ByteDance Seed 2.0 Lite797164625967.0%
GPT-5.4 (Reasoning, Low)796665565564.4%
Claude Sonnet 4.6 (Reasoning)786461595162.8%
GPT-5.4 Mini756857565462.1%
GPT-5.4 Mini (Reasoning)696363595561.9%
GPT-5.4746963525161.8%
ByteDance Seed 1.6 Flash656462585160.1%
GPT-5.4 Mini (Reasoning, Low)656158585459.4%
GPT-5736863543959.3%
ByteDance Seed 2.0 Mini756855463355.6%
GPT-5.1595753513951.8%
Claude Sonnet 4.6595550423948.8%
Z.AI GLM 5 Turbo625247393847.6%
GPT-5 Mini555448453247.0%
GPT-5.2595244413846.8%
Claude Opus 4.6565148383345.0%
Z.AI GLM 5635547342544.9%
Claude Haiku 4.5626050272043.9%
Qwen 3.5 9B764441322343.2%
Grok 4.20 (Beta, Reasoning)60555148042.9%
Claude 3.7 Sonnet604440382741.8%
Writer: Palmyra X5544342422040.2%
LFM2 24B84692521039.8%
MiniMax M2.7464339363239.3%
GPT-5.4 Nano524537352739.2%
Llama 3.1 8B64474242038.9%
Qwen 3.5 Flash674645221037.9%
DeepSeek V3.2524841301637.4%
GPT-5.4 Nano (Reasoning, Low)484035342737.1%
GPT-5.4 Nano (Reasoning)584237321637.0%
Rocinante 12B100432314036.0%
Z.AI GLM 4.7 Flash545041211235.6%
MiniMax M2.5585129221735.2%
Ministral 3 14B494336262134.9%
Claude Opus 4.5504343241234.5%
Mistral Medium 3.1424137272334.0%
Claude Sonnet 4.5393736352033.5%
Z.AI GLM 4.560453117732.1%
GPT-5 Nano403631292432.1%
Qwen 3.5 Plus (2026-02-15)49483428031.9%
Claude Opus 4.6 (Reasoning)55464019031.9%
Qwen3 235B A22B Instruct 2507514627201631.9%
Stealth: Hunter Alpha58393913931.8%
ByteDance Seed 1.6584126171631.8%
Grok 4.20 (Beta)513829221531.1%
Qwen 3.5 35B45424024030.2%
Mistral Large52322823928.7%
Qwen 3.5 27B52412320728.4%
Qwen 3.5 122B50332816726.7%
Mistral Small Creative37323019825.3%
Gemini 3 Pro (Preview)40343418025.3%
MoonshotAI: Kimi K2.54335316323.5%
Z.AI GLM 4.748411411022.9%
Claude Opus 445252420022.9%
Qwen 3 32B48301815022.3%
Claude 3.5 Sonnet5031300022.3%
Mistral Small 438222016820.9%
Aion 2.05625148020.6%
Claude Sonnet 442281614019.9%
Ministral 8B40271513019.0%
Qwen 3.5 397B A17B302116151319.0%
Cohere Command R+ (Aug. 2024)4131172018.4%
Arcee AI: Trinity Large (Preview)3734119018.2%
Gemini 3 Flash (Preview, Reasoning)3027249017.9%
Hermes 3 70B4621180017.1%
DeepSeek V3 (2025-03-24)3129165016.2%
Mistral Small 4 (Reasoning)2927170014.5%
Z.AI GLM 4.62419170012.0%
Mistral Large 3242490011.4%
Stealth: Healer Alpha332100010.8%
GPT-4.1311480010.7%
WizardLM 2 8x22b221475510.6%
Nemotron 3 Super27138009.5%
Gemini 3.1 Pro (Preview)181111809.5%
Mistral Large 226119009.3%
Ministral 3 8B32131009.2%
Gemini 2.5 Pro161514009.0%
Hermes 3 405B23192008.8%
Gemini 3 Flash (Preview)22155008.5%
DeepSeek-V2 Chat21114007.3%
Gemini 3.1 Flash Lite (Preview)2273006.5%
Ministral 3B19100005.8%
o4 Mini1783105.6%
Ministral 3 3B1590004.8%
DeepSeek V3.11850004.7%
Gemma 3 27B1460003.9%
Mistral NeMO1200002.4%
Grok 4 Fast1010002.3%
Gemini 2.5 Flash Lite (Reasoning)1100002.2%
Qwen 2.5 72B730002.1%
DeepSeek V3 (2024-12-26)620001.6%
GPT-4.1 Mini700001.5%
Gemini 2.5 Flash Lite500001.0%
o4 Mini High500001.0%
Gemini 2.5 Flash400000.8%
Grok 4400000.7%
Claude 3.5 Haiku200000.3%
Grok 4.1 Fast000000.0%
Gemini 2.5 Flash (Reasoning)000000.0%
GPT-4o, May 13th (temp=0)000000.0%
Inception Mercury 2000000.0%
GPT-4o, May 13th (temp=1)000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Inception Mercury000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Small 3.2 24B000000.0%
Gemma 3 12B000000.0%
Llama 3.1 70B000000.0%
GPT-4o Mini (temp=0)000000.0%
Nemotron 3 Nano000000.0%
Llama 3.1 Nemotron 70B000000.0%
GPT-4.1 Nano000000.0%
Claude 3 Haiku000000.0%
Arcee AI: Trinity Mini000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 2.0 Lite847669665770.3%
GPT-5776766635966.3%
ByteDance Seed 2.0 Mini797167565064.8%
GPT-5.4 (Reasoning)727065615464.3%
GPT-5.4 Mini (Reasoning)666463615461.4%
GPT-5.4 (Reasoning, Low)726559504958.9%
GPT-5.4595956554254.1%
GPT-5.4 Mini595554534853.6%
GPT-5.4 Mini (Reasoning, Low)706352443653.0%
ByteDance Seed 1.6 Flash605851463850.4%
Llama 3.1 8B715454492450.4%
MiniMax M2.5645445443748.9%
GPT-5.1595552463248.9%
Claude Opus 4.6 (Reasoning)655048423948.6%
Claude Sonnet 4.6 (Reasoning)555448434348.5%
Claude Sonnet 4.6745248441446.6%
Qwen 3.5 9B98583329945.3%
GPT-5.2534944403844.9%
Grok 4.20 (Beta)575244422944.8%
Qwen 3.5 Flash515139393543.0%
GPT-5 Mini584948342442.4%
Claude Opus 4.6595440302942.4%
Grok 4.20 (Beta, Reasoning)595352242342.1%
MiniMax M2.7535038363341.9%
Z.AI GLM 5504342403041.2%
Qwen 3.5 35B454242402939.7%
Z.AI GLM 5 Turbo62545320839.5%
Z.AI GLM 4.7 Flash534039352438.0%
MoonshotAI: Kimi K2.556545225037.3%
GPT-5.4 Nano (Reasoning, Low)484338312737.2%
ByteDance Seed 1.6564341311236.6%
Claude Opus 4.5554038321836.5%
GPT-5.4 Nano424138342134.9%
Stealth: Hunter Alpha48473635734.6%
Mistral Small 4 (Reasoning)554728241633.8%
Gemini 3 Flash (Preview, Reasoning)514228282033.8%
Z.AI GLM 4.7564037201533.7%
Claude 3.7 Sonnet57482726933.3%
GPT-4.1483828282433.3%
GPT-5 Nano474230242233.2%
Mistral Small 457403424531.9%
Qwen 3 32B51473026030.8%
GPT-5.4 Nano (Reasoning)50353431030.1%
Gemini 3 Flash (Preview)383734231830.1%
Qwen 3.5 122B402727272328.9%
Gemini 3 Pro (Preview)424125161628.2%
Claude Haiku 4.542402519626.3%
Claude Opus 462221817925.4%
Qwen3 235B A22B Instruct 2507382323222025.1%
Qwen 3.5 27B393119181725.0%
Qwen 3.5 397B A17B52322913024.9%
Mistral Large 343402313023.6%
Claude Sonnet 4.549272221023.6%
Aion 2.05036192021.4%
Stealth: Healer Alpha5123216020.3%
Writer: Palmyra X540341410019.7%
Mistral Small 3.2 24B97000019.5%
Mistral Medium 3.128212019017.6%
Mistral Large5114107016.5%
Qwen 3.5 Plus (2026-02-15)3319199016.0%
Rocinante 12B25201810014.6%
Gemini 3.1 Pro (Preview)422500013.5%
WizardLM 2 8x22b381587013.5%
Z.AI GLM 4.5272500010.3%
DeepSeek V3.238102009.9%
Claude Sonnet 426164009.4%
Gemma 3 27B25156009.2%
DeepSeek V3.14120008.6%
DeepSeek V3 (2025-03-24)16118407.9%
Grok 4 Fast27100007.3%
Ministral 3 8B3420007.0%
Z.AI GLM 4.62465006.9%
Hermes 3 70B18123006.7%
Grok 4.1 Fast16123006.2%
Claude 3.5 Sonnet2820005.9%
Arcee AI: Trinity Mini2900005.8%
Gemini 3.1 Flash Lite (Preview)16100005.2%
DeepSeek-V2 Chat2400004.8%
Mistral Small Creative12101004.5%
Arcee AI: Trinity Large (Preview)985104.4%
Mistral Large 21840004.4%
Ministral 8B765003.7%
Hermes 3 405B1700003.3%
Ministral 3B1600003.1%
DeepSeek V3 (2024-12-26)930002.5%
Ministral 3 14B700001.4%
Claude 3 Haiku700001.3%
Nemotron 3 Super600001.3%
Gemma 3 12B500000.9%
Gemini 2.5 Pro320000.9%
o4 Mini High400000.9%
o4 Mini400000.8%
Gemma 3 4B300000.7%
Qwen 2.5 72B300000.6%
Cohere Command R+ (Aug. 2024)200000.5%
Grok 4100000.2%
LFM2 24B100000.2%
Gemini 2.5 Flash (Reasoning)000000.0%
Gemini 2.5 Flash Lite (Reasoning)000000.0%
GPT-4o, May 13th (temp=0)000000.0%
Inception Mercury 2000000.0%
GPT-4o, May 13th (temp=1)000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Gemini 2.5 Flash Lite000000.0%
Gemini 2.5 Flash000000.0%
Inception Mercury000000.0%
GPT-4o Mini (temp=1)000000.0%
Llama 3.1 70B000000.0%
GPT-4o Mini (temp=0)000000.0%
Nemotron 3 Nano000000.0%
Llama 3.1 Nemotron 70B000000.0%
GPT-4.1 Nano000000.0%
Ministral 3 3B000000.0%
Mistral NeMO000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.4 (Reasoning, Low)888886868686.6%
Claude Sonnet 4.6 (Reasoning)969087797885.9%
GPT-5.4 Mini (Reasoning)908784818084.5%
GPT-5868684838183.9%
GPT-5.4868582828083.0%
GPT-5.4 (Reasoning)888484817682.8%
Claude Sonnet 4.6918681767682.2%
GPT-5.4 Mini (Reasoning, Low)898181807881.8%
GPT-5.1868280797480.2%
ByteDance Seed 2.0 Lite898280796979.6%
Claude Opus 4.6898482736578.4%
ByteDance Seed 1.6 Flash938274707078.1%
GPT-5.4 Mini858477737178.0%
Claude Opus 4.6 (Reasoning)818080747377.6%
ByteDance Seed 2.0 Mini888582745877.6%
Z.AI GLM 5 Turbo878579725776.2%
GPT-5 Mini838180746376.2%
GPT-5.2797772717073.7%
Claude Haiku 4.5797372727273.3%
Z.AI GLM 5797773666171.2%
Claude Opus 4.5747270706570.4%
GPT-5.4 Nano (Reasoning)767568666369.6%
MiniMax M2.7797874655369.6%
Qwen 3.5 397B A17B737271706169.5%
MiniMax M2.5817468655568.7%
GPT-5.4 Nano (Reasoning, Low)767272615667.6%
Claude Sonnet 4.5776969645867.6%
GPT-5.4 Nano747068666067.5%
Stealth: Healer Alpha917768504766.5%
Writer: Palmyra X5786963605364.6%
Grok 4.20 (Beta, Reasoning)816661545162.5%
Qwen 3.5 122B796559585062.3%
Claude Opus 4696563605462.2%
Qwen 3.5 9B827857523761.4%
Qwen 3.5 35B786964534261.3%
Qwen3 235B A22B Instruct 2507656260565559.9%
GPT-5 Nano616058585558.4%
Qwen 3.5 Flash666561514657.5%
Grok 4.20 (Beta)746455474657.2%
Mistral Large705857505057.2%
ByteDance Seed 1.6716564543056.6%
Mistral Small 4 (Reasoning)686352494856.1%
Gemini 3 Pro (Preview)615857564855.9%
Claude 3.7 Sonnet705649474754.0%
DeepSeek V3.2726159473154.0%
Stealth: Hunter Alpha656152464253.2%
Qwen 3.5 27B805852383753.1%
Qwen 3.5 Plus (2026-02-15)676158433652.8%
Gemini 3 Flash (Preview, Reasoning)775049454152.3%
WizardLM 2 8x22b636249434352.0%
GPT-4.1646055423250.7%
MoonshotAI: Kimi K2.5675151473450.1%
Z.AI GLM 4.6645953452949.9%
Gemini 3.1 Pro (Preview)726158322649.8%
Aion 2.0625346434249.1%
Gemini 3 Flash (Preview)615453383848.9%
Mistral Large 3746657321448.5%
Hermes 3 70B674947433347.5%
Claude 3.5 Sonnet595448393547.2%
Mistral Small 4704343423947.1%
DeepSeek V3 (2025-03-24)645245373245.9%
Z.AI GLM 4.7 Flash545048453145.7%
Ministral 3 14B72594642945.5%
Z.AI GLM 4.7696337312645.3%
Mistral Small Creative625545362244.0%
Ministral 8B666161201043.6%
Mistral Large 2694438372542.6%
Z.AI GLM 4.5534738373542.1%
Mistral Medium 3.1555243362241.6%
Arcee AI: Trinity Large (Preview)67484343441.1%
Qwen 3 32B61603936840.8%
DeepSeek V3.1664543252340.4%
Claude Sonnet 4514642352740.3%
o4 Mini High58484240037.6%
Grok 4.1 Fast444434322235.1%
Cohere Command R+ (Aug. 2024)70393429034.5%
Gemma 3 27B433531231729.9%
Grok 4 Fast413735191629.6%
Hermes 3 405B53422420729.5%
Grok 4403527242229.4%
Nemotron 3 Super60413214029.3%
GPT-4.1 Mini49422827029.2%
Gemini 2.5 Pro403937161429.1%
Gemini 3.1 Flash Lite (Preview)483624221328.6%
Gemini 2.5 Flash (Reasoning)393731221528.6%
Mistral NeMO5843430028.6%
Llama 3.1 8B52383616028.4%
DeepSeek V3 (2024-12-26)6432318027.0%
o4 Mini342827241224.9%
Gemini 2.5 Flash Lite (Reasoning)363222161123.6%
Ministral 3B4835330023.2%
Ministral 3 8B422921141023.2%
GPT-4.1 Nano34262319922.3%
DeepSeek-V2 Chat41262221022.0%
Rocinante 12B5031240020.9%
Qwen 2.5 72B4637180020.1%
Inception Mercury26251510015.3%
LFM2 24B372196315.2%
Gemma 3 12B3024164014.8%
GPT-4o, Aug. 6th (temp=1)3115126012.9%
Gemini 2.5 Flash Lite22220008.6%
Llama 3.1 70B3700007.5%
Stealth: Aurora Alpha22131007.1%
Ministral 3 3B2870006.9%
Gemma 3 4B17162006.9%
Gemini 2.5 Flash17160006.7%
Mistral Small 3.2 24B2920006.1%
GPT-4o, May 13th (temp=1)2260005.6%
Nemotron 3 Nano1286005.2%
Arcee AI: Trinity Mini1461104.3%
Claude 3 Haiku1800003.5%
GPT-4o, Aug. 6th (temp=0)1043003.4%
Inception Mercury 2943003.3%
GPT-4o Mini (temp=1)400000.8%
GPT-4o Mini (temp=0)100000.1%
GPT-4o, May 13th (temp=0)000000.1%
Claude 3.5 Haiku000000.0%
Llama 3.1 Nemotron 70B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.6969584818187.4%
GPT-5.4939185857686.0%
GPT-5.4 Mini (Reasoning)898888828285.7%
GPT-5.4 (Reasoning)939283837785.7%
GPT-5.4 (Reasoning, Low)928987807484.5%
Claude Sonnet 4.6 (Reasoning)878787837784.2%
GPT-5.4 Mini (Reasoning, Low)918984797282.8%
Z.AI GLM 5 Turbo928575747480.1%
GPT-5.4 Mini838281787679.8%
GPT-5868379737278.8%
Qwen 3.5 9B1008481754977.7%
GPT-5 Mini868380696877.3%
Claude Opus 4.6837978786777.0%
ByteDance Seed 1.6 Flash888179736276.7%
ByteDance Seed 2.0 Lite807877766976.1%
GPT-5.4 Nano (Reasoning)878176686775.8%
GPT-5.4 Nano (Reasoning, Low)817876746975.7%
Claude Opus 4.6 (Reasoning)908376725675.6%
GPT-5.1818173696874.3%
Claude Sonnet 4.5837574696873.9%
MiniMax M2.5827978695472.3%
GPT-5.2837971646272.1%
ByteDance Seed 1.6777474705770.2%
Claude Haiku 4.5866969615067.2%
GPT-5.4 Nano736766646466.8%
ByteDance Seed 2.0 Mini727266635866.3%
Claude Opus 4838163594265.9%
Claude Opus 4.5717064636265.9%
MiniMax M2.7847772494865.8%
Mistral Small 4 (Reasoning)777066583761.3%
Qwen 3.5 Flash787458494460.7%
Stealth: Hunter Alpha777461454560.4%
Z.AI GLM 4.7 Flash716756555360.3%
Z.AI GLM 5666563634360.1%
Qwen 3.5 122B797959443959.9%
Rocinante 12B686857554959.6%
Claude Sonnet 4766564632759.0%
Claude 3.7 Sonnet737259583258.9%
Qwen 3.5 27B696461504457.6%
GPT-5 Nano595857565556.9%
WizardLM 2 8x22b635956555156.8%
DeepSeek V3.2686059583956.7%
MoonshotAI: Kimi K2.5716457503856.2%
Aion 2.0645959504756.0%
Mistral Medium 3.1716864393555.4%
Grok 4.20 (Beta, Reasoning)856554412654.3%
Stealth: Healer Alpha817152412453.9%
Grok 4.20 (Beta)665251464251.5%
Z.AI GLM 4.7555350494650.6%
Mistral Large 3706262471150.5%
Qwen 3.5 35B696153353450.3%
GPT-4.1656341413949.9%
Z.AI GLM 4.5614948464449.9%
Qwen 3.5 397B A17B715647353448.9%
Ministral 3 8B605251452947.4%
Gemini 3 Flash (Preview)595248403847.3%
Writer: Palmyra X5645838373646.5%
Ministral 8B594942403845.8%
Mistral Small Creative754237363445.0%
Qwen 3 32B685140352744.3%
GPT-4.1 Mini584741343342.8%
Gemini 3 Flash (Preview, Reasoning)545351302442.6%
Z.AI GLM 4.6564541412942.4%
Mistral Large684936362042.1%
Mistral Large 266664236041.7%
Gemini 3 Pro (Preview)615443381141.5%
Cohere Command R+ (Aug. 2024)63484140439.5%
Llama 3.1 70B925517161539.1%
Gemini 3.1 Pro (Preview)753533252237.9%
Qwen3 235B A22B Instruct 2507464237352937.7%
Llama 3.1 8B64573830037.7%
Arcee AI: Trinity Mini64534226036.9%
DeepSeek V3.156544526036.5%
Nemotron 3 Super574645211136.4%
o4 Mini494238312036.2%
Gemini 2.5 Flash49474340135.9%
Mistral Small 4494535232134.6%
Hermes 3 70B573735331134.4%
Gemma 3 27B434138311834.3%
Claude 3.5 Sonnet514937191233.5%
Gemini 2.5 Flash (Reasoning)473632242031.9%
GPT-4.1 Nano494133181831.7%
DeepSeek-V2 Chat463734241030.3%
DeepSeek V3 (2025-03-24)463834181630.2%
Gemini 2.5 Pro424029211930.2%
Arcee AI: Trinity Large (Preview)5846450029.8%
Qwen 3.5 Plus (2026-02-15)463936191029.7%
Gemini 3.1 Flash Lite (Preview)442827272229.7%
GPT-4o, May 13th (temp=0)42403425429.0%
Hermes 3 405B463528201528.6%
Claude 3 Haiku46423019628.4%
DeepSeek V3 (2024-12-26)37353227026.1%
Ministral 3B41313125025.6%
Grok 4.1 Fast403127201025.5%
Mistral Small 3.2 24B982700025.0%
Gemma 3 12B50352214024.0%
Grok 432302522923.5%
Ministral 3 14B41391615323.0%
Inception Mercury 2292826161522.8%
GPT-4o, Aug. 6th (temp=0)51301814022.6%
Llama 3.1 Nemotron 70B57241613022.3%
Grok 4 Fast36322914022.2%
GPT-4o, May 13th (temp=1)3835256221.0%
o4 Mini High5122168019.5%
LFM2 24B32301911419.4%
Gemini 2.5 Flash Lite (Reasoning)413350015.9%
Gemini 2.5 Flash Lite4010108514.7%
Qwen 2.5 72B2415109111.8%
Nemotron 3 Nano312800011.8%
Ministral 3 3B272280011.5%
Stealth: Aurora Alpha2414128011.4%
GPT-4o Mini (temp=1)27120007.7%
Gemma 3 4B1365405.5%
Mistral NeMO2070005.4%
Claude 3.5 Haiku1920004.1%
Inception Mercury1910003.9%
GPT-4o Mini (temp=0)700001.5%
GPT-4o, Aug. 6th (temp=1)000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 2.0 Mini939089878588.6%
ByteDance Seed 2.0 Lite888784838184.9%
GPT-5.4898383838083.6%
Claude Sonnet 4.61009079747483.5%
Qwen 3.5 9B888785857183.1%
GPT-5848378777779.8%
GPT-5 Mini848278786978.2%
GPT-5.4 Mini (Reasoning, Low)827776767577.2%
GPT-5.4 (Reasoning, Low)868079746676.8%
GPT-5.4 (Reasoning)898178686676.4%
Claude Sonnet 4.6 (Reasoning)868077746275.9%
ByteDance Seed 1.6 Flash847977726775.6%
GPT-5.4 Mini (Reasoning)817877776475.4%
MiniMax M2.5817676755873.1%
ByteDance Seed 1.6897570696373.1%
GPT-5.4 Nano (Reasoning, Low)767472717172.8%
Claude Opus 4.6828271686072.4%
GPT-5.4 Mini777774696171.8%
Z.AI GLM 5 Turbo858264615970.4%
GPT-5.4 Nano (Reasoning)777268656469.2%
Claude Opus 4.6 (Reasoning)846868646169.1%
GPT-5.4 Nano746867676267.6%
GPT-5.1757068665466.6%
Qwen 3.5 397B A17B777067595565.5%
Claude Opus 4.5807267624665.3%
GPT-5.2696865636064.9%
Claude Haiku 4.5787866574464.5%
Writer: Palmyra X5747272634164.5%
Qwen3 235B A22B Instruct 2507726967635164.2%
Rocinante 12B828070622363.2%
Z.AI GLM 4.7686868575162.3%
Claude Sonnet 4.5847256534562.2%
Mistral Large776458545261.3%
MiniMax M2.7757365642460.5%
Z.AI GLM 5716059585059.7%
Stealth: Hunter Alpha816357484258.2%
LFM2 24B828066322857.8%
Grok 4.20 (Beta, Reasoning)646362534757.7%
Claude Opus 4716556564057.6%
Mistral Large 3725554534856.4%
Qwen 3.5 122B696962493256.2%
Mistral Small Creative666359474556.1%
Mistral Small 4615958564655.8%
MoonshotAI: Kimi K2.5766249474054.9%
WizardLM 2 8x22b686255533554.7%
Qwen 3.5 Flash656555493754.1%
GPT-5 Nano635751514954.0%
Ministral 3 14B787647363253.8%
Z.AI GLM 4.5706658423353.6%
Aion 2.0625858484053.1%
Gemini 3 Flash (Preview)655952454152.2%
Mistral Large 2816050393152.2%
GPT-4.1725246444251.2%
Qwen 3 32B615654513250.9%
Stealth: Healer Alpha676548383350.3%
DeepSeek V3.2515049474147.7%
Grok 4.20 (Beta)575649413547.5%
Gemini 3 Pro (Preview)725743422347.4%
Qwen 3.5 35B686042382847.2%
DeepSeek V3 (2025-03-24)685845352947.0%
Mistral Medium 3.1725743392346.8%
DeepSeek V3.1646151292846.6%
Z.AI GLM 4.6605847343446.6%
Qwen 3.5 Plus (2026-02-15)735341372545.8%
Qwen 3.5 27B674744403145.8%
Mistral Small 4 (Reasoning)585148392844.9%
DeepSeek V3 (2024-12-26)635245382544.7%
Z.AI GLM 4.7 Flash514839383241.7%
Claude Sonnet 4605433271738.2%
Claude 3.7 Sonnet623837272136.8%
Gemini 3 Flash (Preview, Reasoning)483735322835.9%
Claude 3.5 Sonnet645528171535.8%
Ministral 3 8B58494619835.7%
Hermes 3 70B64443830035.2%
Hermes 3 405B66373625834.6%
Grok 4504030292334.5%
o4 Mini514525252534.4%
Nemotron 3 Super473734291532.4%
Cohere Command R+ (Aug. 2024)64582810031.8%
Ministral 8B48433821931.8%
o4 Mini High49432522829.4%
Gemini 2.5 Pro43393817929.3%
Llama 3.1 8B776250028.8%
DeepSeek-V2 Chat403433251228.8%
Inception Mercury784900025.4%
Gemini 3.1 Pro (Preview)5843205025.1%
Grok 4.1 Fast322925251324.8%
Gemini 3.1 Flash Lite (Preview)40382716024.0%
Ministral 3 3B4640320023.6%
Grok 4 Fast423317151123.4%
Gemma 3 27B32282525122.1%
Arcee AI: Trinity Large (Preview)5522205020.5%
Stealth: Aurora Alpha262220191420.3%
Inception Mercury 24234250020.3%
Mistral NeMO5123196120.0%
Arcee AI: Trinity Mini4722174218.1%
Qwen 2.5 72B4721146017.7%
Gemma 3 4B4428130017.1%
Claude 3.5 Haiku701200016.5%
Llama 3.1 70B77000015.4%
Gemma 3 12B2721196114.9%
GPT-4o, May 13th (temp=0)3815140013.3%
Gemini 2.5 Flash2922141013.0%
Gemini 2.5 Flash Lite (Reasoning)321900010.0%
GPT-4o, Aug. 6th (temp=1)24220009.2%
GPT-4.1 Nano23109008.4%
GPT-4.1 Mini26105008.2%
Gemini 2.5 Flash (Reasoning)2481006.6%
Mistral Small 3.2 24B22100006.3%
Claude 3 Haiku2150005.3%
Gemini 2.5 Flash Lite1320003.0%
Nemotron 3 Nano942002.9%
GPT-4o, May 13th (temp=1)1030002.5%
Ministral 3B1100002.2%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 Nemotron 70B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.4 (Reasoning)938886828286.1%
Claude Sonnet 4.6908680767681.7%
Claude Sonnet 4.6 (Reasoning)958680737181.0%
GPT-5.4898281807180.4%
ByteDance Seed 2.0 Lite908177767379.4%
GPT-5.4 (Reasoning, Low)858482826379.3%
GPT-5.4 Mini (Reasoning)907777767478.9%
GPT-5858479756978.5%
GPT-5 Mini867978686775.6%
GPT-5.4 Mini818074706874.6%
ByteDance Seed 2.0 Mini858478645573.3%
GPT-5.2777474736672.9%
GPT-5.4 Nano (Reasoning, Low)827973705571.6%
ByteDance Seed 1.6 Flash867772645671.2%
Claude Opus 4.5767670696571.2%
GPT-5.4 Nano (Reasoning)777572725570.2%
GPT-5.4 Mini (Reasoning, Low)868270595370.2%
Claude Opus 4.6827772675370.1%
MiniMax M2.5847269675369.0%
Qwen 3.5 9B946664575466.9%
Claude Opus 4826966595466.0%
MiniMax M2.7857671593665.4%
Qwen 3.5 35B827566604465.4%
Z.AI GLM 5 Turbo916965534464.3%
GPT-5.4 Nano786661615363.8%
GPT-5.1746969604763.7%
Z.AI GLM 4.7756666575463.7%
Claude Opus 4.6 (Reasoning)747068564763.1%
Claude Sonnet 4.5696662555361.0%
Grok 4.20 (Beta, Reasoning)806663623060.2%
WizardLM 2 8x22b895959523759.1%
ByteDance Seed 1.6626158575658.7%
Grok 4.20 (Beta)695855554957.2%
Z.AI GLM 5756354503756.0%
Claude 3.7 Sonnet666565582455.6%
Aion 2.0645453524954.5%
Writer: Palmyra X5676154533754.2%
Qwen 3.5 397B A17B646362473353.8%
DeepSeek V3.2666557473353.6%
Claude Sonnet 4585756514553.5%
Mistral Small Creative646251464052.8%
Llama 3.1 8B646354523152.7%
Z.AI GLM 4.7 Flash786549383252.4%
Stealth: Hunter Alpha686055502852.1%
Qwen 3.5 27B646151433951.5%
Z.AI GLM 4.5735049444151.3%
Z.AI GLM 4.6716546452149.5%
Gemini 3 Pro (Preview)696437363448.2%
GPT-5 Nano515048474548.0%
Claude Haiku 4.5665546393247.5%
Stealth: Healer Alpha714444423447.2%
Qwen 3.5 122B545443403946.0%
DeepSeek V3 (2025-03-24)635045432144.3%
Mistral Large 2614739363543.5%
Qwen 3.5 Flash595342382242.8%
Gemini 3.1 Pro (Preview)635137323142.7%
MoonshotAI: Kimi K2.5854637361042.6%
Qwen 3.5 Plus (2026-02-15)595844272542.5%
Gemini 3 Flash (Preview)524946353042.4%
Qwen3 235B A22B Instruct 2507675538291841.4%
DeepSeek V3.1544239363340.8%
Claude 3.5 Sonnet735242231140.1%
Mistral Medium 3.1544934332839.6%
Ministral 8B585537221838.2%
Arcee AI: Trinity Large (Preview)65622721936.6%
Mistral Small 4 (Reasoning)604742201336.4%
Cohere Command R+ (Aug. 2024)66433633035.7%
Gemini 3 Flash (Preview, Reasoning)353433333133.1%
Rocinante 12B78492216032.8%
Qwen 3 32B70383316031.4%
GPT-4.1393831242331.0%
Hermes 3 70B444227261330.3%
Ministral 3 8B8542221029.9%
Mistral Large59591310328.6%
Ministral 3 14B423826141026.0%
Gemma 3 27B50353111325.8%
Mistral Small 4422222191323.7%
DeepSeek-V2 Chat49282114523.3%
Mistral Large 339351410821.3%
Gemma 3 12B342423141121.0%
o4 Mini High38322012020.4%
Ministral 3 3B4928158020.2%
Gemini 2.5 Pro332120131220.0%
DeepSeek V3 (2024-12-26)40321512019.7%
GPT-4o, May 13th (temp=0)4331220019.1%
Claude 3 Haiku25222217317.6%
GPT-4.1 Mini4032150017.5%
Ministral 3B3326194016.4%
Gemma 3 4B411198414.9%
Grok 425191811114.8%
GPT-4.1 Nano3719180014.8%
Nemotron 3 Super27171510414.7%
Arcee AI: Trinity Mini3216149014.1%
Llama 3.1 Nemotron 70B3117144413.9%
Grok 4 Fast2222129413.8%
Claude 3.5 Haiku53800012.3%
Mistral Small 3.2 24B302800011.6%
Hermes 3 405B3114111011.4%
LFM2 24B292620011.3%
Qwen 2.5 72B301480010.3%
Llama 3.1 70B321350010.0%
Gemini 3.1 Flash Lite (Preview)19139709.5%
Grok 4.1 Fast161412208.8%
o4 Mini25180008.6%
Inception Mercury3091007.8%
Nemotron 3 Nano2000004.1%
Gemini 2.5 Flash Lite1232003.3%
GPT-4o Mini (temp=1)1600003.3%
Gemini 2.5 Flash (Reasoning)1300002.6%
Inception Mercury 2550002.1%
GPT-4o, May 13th (temp=1)432001.8%
Mistral NeMO900001.8%
GPT-4o Mini (temp=0)500001.0%
Gemini 2.5 Flash400000.8%
Gemini 2.5 Flash Lite (Reasoning)000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%