Useless dialogue additions

Test: Bad Writing Habits

Avg. Score
45.4%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Claude Sonnet 4.692.4%$0.03139.3s60%
2Qwen 3.5 397B A17B96.4%$0.0143.0m64%
3Claude Haiku 4.579.8%$0.01121.6s38%
4Minimax M2.580.7%$0.00341.3m44%
5Claude Opus 4.690.2%$0.0781.2m56%
6DeepSeek V3 (2025-03-24)79.4%$0.001439.4s32%
7GPT-5 Mini76.6%$0.010057.4s38%
8GPT-5.186.9%$0.0541.8m54%
9GPT-589.7%$0.0652.8m63%
10Claude Sonnet 4.578.6%$0.03538.1s35%
11Z.AI GLM 577.9%$0.00841.2m31%
12Writer: Palmyra X567.9%$0.01122.0s28%
13Claude Opus 4.577.4%$0.07053.4s40%
14Claude 3.5 Haiku70.9%$0.003510.8s10%
15Claude Sonnet 469.2%$0.03243.7s24%
16o4 Mini61.8%$0.01525.7s19%
17Rocinante 12B67.1%$0.001438.4s13%
18Claude 3.7 Sonnet65.6%$0.04246.7s26%
19Claude 3.5 Sonnet68.2%$0.04835.5s21%
20Mistral Medium 3.157.0%$0.004836.5s14%
21o4 Mini High62.9%$0.02547.2s15%
22MoonshotAI: Kimi K2.575.9%$0.0193.2m31%
23Grok 4.1 Fast52.3%$0.001837.8s11%
24Gemini 3.1 Pro (Preview)85.6%$0.1071.8m34%
25Mistral Large 255.9%$0.01329.4s9%
26Hermes 3 405B56.1%$0.003253.2s10%
27Llama 3.1 8B60.0%$0.00031.3m10%
28ByteDance Seed 1.6 Flash43.9%$0.001327.3s11%
29GPT-4.151.5%$0.01844.7s14%
30GPT-5.266.4%$0.0561.5m24%
31Mistral Large 350.0%$0.003330.3s5%
32Mistral Small Creative41.1%$0.00079.1s4%
33Ministral 3 14B40.6%$0.000711.7s4%
34Z.AI GLM 4.544.6%$0.005142.1s9%
35Mistral Large48.4%$0.01430.9s5%
36GPT-4o, Aug. 6th (temp=1)43.2%$0.01824.4s6%
37ByteDance Seed 1.662.9%$0.0132.5m14%
38Gemini 2.5 Flash37.2%$0.005210.6s1%
39Grok 4 Fast34.9%$0.001724.1s4%
40Hermes 3 70B43.6%$0.00101.2m5%
41Claude Opus 482.3%$0.2091.4m45%
42Llama 3.1 70B36.6%$0.001529.4s0%
43Ministral 3B29.4%$0.00018.1s0%
44Z.AI GLM 4.7 Flash39.1%$0.00171.2m6%
45WizardLM 2 8x22b48.6%$0.00261.8m7%
46Z.AI GLM 4.740.4%$0.0101.4m10%
47Ministral 8B28.1%$0.000410.4s0%
48GPT-5 Nano37.2%$0.00421.4m8%
49Arcee AI: Trinity Mini24.8%$0.00039.2s0%
50Llama 3.1 Nemotron 70B31.5%$0.003831.7s0%
51Gemini 2.5 Flash Lite23.7%$0.00099.5s0%
52Z.AI GLM 4.638.0%$0.006551.5s0%
53GPT-4.1 Nano24.0%$0.000713.3s0%
54Ministral 3 8B25.5%$0.000819.6s0%
55Ministral 3 3B22.4%$0.000511.1s0%
56Gemini 3 Pro (Preview)44.0%$0.05554.4s10%
57Arcee AI: Trinity Large (Preview)29.8%$0.000043.6s0%
58Gemini 2.5 Pro42.2%$0.03636.2s1%
59DeepSeek V3 (2024-12-26)33.1%$0.002154.6s0%
60DeepSeek-V2 Chat32.0%$0.002153.3s0%
61Claude 3 Haiku20.7%$0.002514.9s0%
62Gemini 3 Flash (Preview)23.3%$0.007819.6s0%
63GPT-4.1 Mini20.3%$0.002719.0s0%
64DeepSeek V3.238.2%$0.00141.9m5%
65Qwen 3.5 Plus (2026-02-15)20.3%$0.006031.5s0%
66GPT-4o, May 13th (temp=1)27.8%$0.03314.4s0%
67Mistral NeMO10.9%$0.000510.1s0%
68GPT-4o Mini (temp=1)16.7%$0.001234.8s0%
69Cohere Command R+ (Aug. 2024)25.6%$0.02052.5s0%
70Gemma 3 27B15.1%$0.000652.6s0%
71Stealth: Aurora Alpha2.3%$0.00009.8s0%
72Gemma 3 4B3.1%$0.000220.0s0%
73DeepSeek V3.129.2%$0.00201.8m0%
74Gemma 3 12B7.3%$0.000441.3s0%
75Qwen 2.5 72B3.6%$0.001036.7s0%
76GPT-4o, Aug. 6th (temp=0)7.5%$0.02322.7s0%
77GPT-4o Mini (temp=0)0.0%$0.001234.8s0%
78GPT-4o, May 13th (temp=0)8.7%$0.03514.1s0%
79Grok 431.3%$0.0481.7m1%
80Mistral Small 3.2 24B16.1%$0.00695.7m0%
45.40%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Sonnet 4.5100100100979698.6%
GPT-5.11001001001007895.6%
GPT-5100100100858393.7%
Claude Opus 4100100100834886.3%
Claude Sonnet 4.610010080806384.3%
Claude Opus 4.610010092893483.0%
GPT-5.210010098644882.1%
Claude Haiku 4.510010089723779.7%
Claude 3.5 Sonnet978372696377.0%
Rocinante 12B10010010080075.9%
GPT-5 Mini10010095661675.3%
Claude Opus 4.51009671634675.1%
ByteDance Seed 1.610010010069073.9%
Z.AI GLM 510010069544673.8%
Grok 4.1 Fast1001007569068.9%
DeepSeek-V2 Chat1001008063068.4%
o4 Mini100948363067.9%
Writer: Palmyra X51007569543767.0%
Claude 3.7 Sonnet868272342860.3%
Z.AI GLM 4.7 Flash1001006625058.2%
DeepSeek V3 (2024-12-26)1001004242056.7%
DeepSeek V3 (2025-03-24)1001005425055.7%
WizardLM 2 8x22b1009742251455.6%
o4 Mini High100896614053.9%
GPT-4.187756937053.5%
Gemini 2.5 Flash100100370047.4%
Z.AI GLM 4.787753831046.2%
MoonshotAI: Kimi K2.510087420045.6%
Grok 4 Fast100100250045.0%
Claude Sonnet 48987310041.4%
Claude 3.5 Haiku10010000040.0%
Hermes 3 70B10010000040.0%
Grok 410075140037.8%
DeepSeek V3.11007500035.0%
Cohere Command R+ (Aug. 2024)10054180034.3%
Llama 3.1 8B1006900033.9%
Ministral 3 14B1005400030.7%
Minimax M2.55446460029.1%
Ministral 3 8B10025140027.8%
Mistral Large 31003400026.8%
Mistral Large6934250025.7%
Gemini 3 Pro (Preview)49322116023.8%
ByteDance Seed 1.6 Flash5139250022.9%
Z.AI GLM 4.5634900022.3%
Mistral Medium 3.1634200020.8%
Qwen 3.5 Plus (2026-02-15)3834208020.0%
Z.AI GLM 4.6100000020.0%
GPT-4o, May 13th (temp=1)100000020.0%
GPT-4o, Aug. 6th (temp=1)544200019.0%
Arcee AI: Trinity Large (Preview)632550018.5%
Hermes 3 405B80700017.4%
Llama 3.1 70B80000015.9%
GPT-5 Nano671000015.4%
Llama 3.1 Nemotron 70B54000010.7%
Ministral 3B54000010.7%
GPT-4.1 Mini2518100010.5%
GPT-4o Mini (temp=1)361060010.3%
Gemini 3 Flash (Preview)4600009.2%
Claude 3 Haiku34100008.7%
DeepSeek V3.225140007.8%
Mistral Large 23400006.8%
Gemma 3 12B2500005.0%
Mistral Small Creative2500005.0%
Ministral 8B2500005.0%
GPT-4.1 Nano1400002.8%
Gemini 2.5 Pro000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemini 2.5 Flash Lite000000.0%
Gemma 3 27B000000.0%
Qwen 2.5 72B000000.0%
Mistral Small 3.2 24B000000.0%
Arcee AI: Trinity Mini000000.0%
Ministral 3 3B000000.0%
Mistral NeMO000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Mistral Large100100100100100100.0%
GPT-5.1100100100927793.8%
Gemini 3.1 Pro (Preview)1001001001005490.7%
Claude Opus 410010099835587.5%
Gemini 3 Pro (Preview)10010083806986.5%
Claude 3.5 Sonnet100100100755485.7%
Arcee AI: Trinity Mini1001001001002585.0%
Minimax M2.5100100100695484.6%
Claude Opus 4.6100100100100080.0%
o4 Mini High100100100100080.0%
MoonshotAI: Kimi K2.5100100100100080.0%
Claude Opus 4.5100100100752580.0%
Z.AI GLM 5100100100100080.0%
DeepSeek V3 (2025-03-24)100100100100080.0%
Claude Haiku 4.5100100100100080.0%
Ministral 3 3B100100100100080.0%
ByteDance Seed 1.610010010080075.9%
Writer: Palmyra X510010010069073.9%
Hermes 3 405B10010010069073.9%
Llama 3.1 8B10010010063072.5%
o4 Mini10010010042068.3%
Claude 3.7 Sonnet10010071501467.0%
Ministral 8B10010010025065.0%
GPT-5 Mini99897354063.0%
Rocinante 12B1001001007061.4%
GPT-4.11001001000060.0%
Grok 4 Fast1001001000060.0%
Mistral Large 31001001000060.0%
DeepSeek V3.21001001000060.0%
Mistral Medium 3.11001001000060.0%
DeepSeek V3.11001001000060.0%
Mistral Large 21001001000060.0%
Llama 3.1 70B1001001000060.0%
Mistral Small Creative1001001000060.0%
GPT-4.1 Nano1001001000060.0%
WizardLM 2 8x22b1001001000060.0%
ByteDance Seed 1.6 Flash100776948059.0%
GPT-5.21001005425055.7%
GPT-5 Nano836057423455.1%
Z.AI GLM 4.71001002525050.0%
Grok 4100100420048.3%
GPT-4.1 Mini10080540046.6%
GPT-4o, Aug. 6th (temp=1)100634225045.8%
Grok 4.1 Fast100100250045.0%
Ministral 3 14B100100250045.0%
Qwen 3.5 Plus (2026-02-15)83803425044.4%
DeepSeek V3 (2024-12-26)10010000040.0%
GPT-4o, May 13th (temp=0)10010000040.0%
DeepSeek-V2 Chat10010000040.0%
Hermes 3 70B10010000040.0%
Claude 3 Haiku10010000040.0%
Ministral 3 8B10010000040.0%
Ministral 3B10010000040.0%
Arcee AI: Trinity Large (Preview)9642257033.9%
Gemini 3 Flash (Preview)42424242033.3%
Z.AI GLM 4.51005400030.7%
GPT-4o Mini (temp=1)1004200028.3%
Z.AI GLM 4.7 Flash1002500025.0%
Gemma 3 27B1002500025.0%
Gemini 2.5 Flash100000020.0%
Gemini 2.5 Flash Lite100000020.0%
Mistral NeMO100000020.0%
GPT-4o, May 13th (temp=1)423400015.2%
Gemma 3 4B2525140012.8%
Cohere Command R+ (Aug. 2024)54000010.7%
Llama 3.1 Nemotron 70B2500005.0%
Stealth: Aurora Alpha1800003.6%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemma 3 12B000000.0%
Qwen 2.5 72B000000.0%
Mistral Small 3.2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
DeepSeek V3 (2025-03-24)1001001001009799.4%
GPT-51001001001009799.3%
GPT-5.21001001001009398.5%
Z.AI GLM 5100100100928395.0%
Minimax M2.5100100100898394.5%
Claude 3.5 Sonnet1001001001006993.9%
Mistral Medium 3.11001001001005490.7%
Claude Opus 4100100100836790.1%
Claude Haiku 4.510010096916389.7%
MoonshotAI: Kimi K2.510010097806989.2%
Claude Opus 4.61001001001003486.8%
Hermes 3 405B1001001001003486.8%
Mistral Large 310010097894686.5%
Claude Sonnet 41001001001003186.3%
Claude 3.7 Sonnet10010089766385.6%
Writer: Palmyra X51009788696784.4%
Claude Opus 4.5100100100665483.9%
o4 Mini1008383806682.5%
Claude 3.5 Haiku100100100100080.0%
Z.AI GLM 4.7 Flash10010010082076.3%
GPT-5 Mini908881794275.9%
Arcee AI: Trinity Large (Preview)1001009483075.4%
Gemini 3 Pro (Preview)968980732873.3%
Z.AI GLM 4.5968573565472.7%
o4 Mini High10010071542570.0%
Mistral Large 299978766069.7%
ByteDance Seed 1.6 Flash10010091341868.6%
WizardLM 2 8x22b1001007850666.8%
ByteDance Seed 1.610010010025065.0%
Ministral 3 14B1007563421158.1%
Rocinante 12B100876934058.0%
GPT-4o, Aug. 6th (temp=1)928054302555.9%
Llama 3.1 8B10097600051.4%
Z.AI GLM 4.777726342050.6%
Mistral Small Creative10075637049.0%
GPT-4.180726614046.4%
Ministral 3 8B9175547045.3%
Grok 4.1 Fast75544842043.7%
Mistral Large836342181043.0%
DeepSeek V3 (2024-12-26)9654540040.5%
Ministral 3B10010000040.0%
GPT-5 Nano10071214039.3%
Llama 3.1 Nemotron 70B1008700037.3%
Claude 3 Haiku928900036.2%
DeepSeek-V2 Chat1007500035.0%
Gemma 3 12B66503011031.5%
DeepSeek V3.266483011031.1%
Qwen 3.5 Plus (2026-02-15)5651375029.7%
Hermes 3 70B1004800029.6%
Gemini 2.5 Flash1004400028.8%
DeepSeek V3.110025140027.8%
GPT-4o Mini (temp=1)54361711023.4%
Llama 3.1 70B100000020.0%
GPT-4o, May 13th (temp=1)514500019.2%
Grok 4 Fast3837146018.9%
Gemma 3 27B543800018.3%
Qwen 2.5 72B80000015.9%
Grok 475400015.9%
Z.AI GLM 4.63425140014.6%
Gemini 3 Flash (Preview)3122180014.1%
GPT-4.1 Mini69000013.9%
Arcee AI: Trinity Mini63000012.5%
Cohere Command R+ (Aug. 2024)63000012.5%
Gemma 3 4B63000012.5%
Mistral NeMO58000011.7%
Ministral 8B54000010.7%
GPT-4o, May 13th (temp=0)252500010.0%
Gemini 2.5 Flash Lite2500005.0%
Gemini 2.5 Pro000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Mistral Small 3.2 24B000000.0%
Ministral 3 3B000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
GPT-51001001001009999.9%
GPT-4.1100100100969698.2%
Qwen 3.5 397B A17B1001001001008396.7%
GPT-5.2100100100928395.0%
Claude Sonnet 4.61001001001007595.0%
GPT-5.11001001001007194.3%
Claude Sonnet 41001001001006993.9%
DeepSeek V3 (2025-03-24)1001001001006993.9%
o4 Mini100100100927593.3%
Claude Haiku 4.510010099975890.8%
Claude Opus 4.6100100100836389.2%
Claude 3.7 Sonnet10010089886788.9%
GPT-5 Mini100100100835888.2%
Arcee AI: Trinity Large (Preview)1009489877188.2%
Rocinante 12B10010080756383.4%
Z.AI GLM 510010099883083.3%
Claude 3.5 Sonnet10010075756683.2%
Claude Sonnet 4.51009292804882.2%
Mistral Large1009483755882.1%
Hermes 3 405B100100100545481.4%
Writer: Palmyra X510010099832581.4%
WizardLM 2 8x22b100100100565081.2%
ByteDance Seed 1.6100100100100080.0%
Grok 4.1 Fast10010075634275.8%
Claude 3.5 Haiku100100100542575.7%
Mistral Large 210010010063072.5%
Claude Opus 4.51008980662572.0%
MoonshotAI: Kimi K2.51001009658070.8%
Hermes 3 70B10010010042068.3%
o4 Mini High1001006756064.7%
Mistral Medium 3.189807563061.3%
Z.AI GLM 4.5100726560059.4%
Minimax M2.5100896634057.9%
GPT-4.1 Mini875450484656.8%
Gemini 3 Pro (Preview)756957483256.5%
DeepSeek-V2 Chat1001006020055.9%
ByteDance Seed 1.6 Flash100875825054.0%
DeepSeek V3 (2024-12-26)1001003014048.7%
Grok 480754825045.5%
GPT-4o, Aug. 6th (temp=1)96754214045.2%
Claude 3 Haiku10066600045.2%
Ministral 3 14B100692525043.9%
GPT-4o, May 13th (temp=1)10097180043.0%
Z.AI GLM 4.7 Flash95604810042.4%
Cohere Command R+ (Aug. 2024)8769250036.2%
Llama 3.1 Nemotron 70B8054420035.0%
GPT-4o Mini (temp=1)7265360034.6%
Mistral Large 39448300034.3%
Llama 3.1 8B8354200031.3%
DeepSeek V3.2825800028.0%
Z.AI GLM 4.75754290027.8%
Z.AI GLM 4.68034180026.3%
Mistral Small Creative1002500025.0%
Ministral 3 8B1001470024.2%
Mistral NeMO1001800023.6%
Gemini 2.5 Flash Lite694500022.9%
Grok 4 Fast4842180021.5%
Gemini 3 Flash (Preview)574250020.7%
Gemini 2.5 Pro100000020.0%
Llama 3.1 70B100000020.0%
DeepSeek V3.14834100018.4%
Gemini 2.5 Flash632800018.1%
GPT-4.1 Nano87000017.3%
Ministral 8B87000017.3%
GPT-4o, May 13th (temp=0)5812100016.0%
Gemma 3 12B3018160012.8%
Stealth: Aurora Alpha4800009.6%
GPT-4o, Aug. 6th (temp=0)4800009.6%
Qwen 2.5 72B4800009.6%
Qwen 3.5 Plus (2026-02-15)31124009.4%
Gemma 3 27B2560006.2%
Ministral 3 3B2500005.0%
Arcee AI: Trinity Mini1800003.6%
GPT-5 Nano000000.0%
GPT-4o Mini (temp=0)000000.0%
Mistral Small 3.2 24B000000.0%
Gemma 3 4B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
GPT-5.2100100100999197.8%
GPT-51001001001008697.1%
DeepSeek V3 (2025-03-24)1001001001008396.7%
GPT-5.11001001001007895.6%
Claude Sonnet 4.6100100100898795.2%
Writer: Palmyra X5100100100838393.3%
MoonshotAI: Kimi K2.5100100100896991.7%
Minimax M2.5100100100696987.8%
Claude Opus 4.51009994726385.5%
Claude Opus 4.6100100100635483.2%
Claude 3.5 Sonnet1009687755081.4%
WizardLM 2 8x22b969182805680.9%
Claude 3.5 Haiku100100100100080.0%
Claude Haiku 4.51009480724578.1%
Rocinante 12B10010010091078.1%
Llama 3.1 8B10010010089077.9%
Z.AI GLM 51001008989075.7%
Claude Sonnet 41001009475073.8%
Claude 3.7 Sonnet1009675573572.7%
GPT-5 Mini948474693871.9%
GPT-4o, Aug. 6th (temp=1)97979463771.6%
o4 Mini High1008575751870.6%
Hermes 3 405B1001009656070.3%
Gemini 3 Pro (Preview)1008178543870.1%
Grok 4.1 Fast100996969067.5%
ByteDance Seed 1.610010010025065.0%
DeepSeek V3.21007266461459.7%
Qwen 3.5 Plus (2026-02-15)100100780055.6%
Mistral Medium 3.11001003130052.2%
GPT-4.110075690048.9%
ByteDance Seed 1.6 Flash87575131045.2%
Z.AI GLM 4.591694814044.4%
o4 Mini100100200043.9%
Mistral Large 2100632525042.5%
Z.AI GLM 4.7 Flash10066460042.4%
Gemini 2.5 Flash82564217840.9%
Grok 480544225040.0%
Mistral Large 369633416036.5%
Ministral 3 14B1008000035.9%
Hermes 3 70B1008000035.9%
Mistral Small Creative877770034.3%
DeepSeek V3 (2024-12-26)8542420033.7%
DeepSeek-V2 Chat757500030.0%
Claude 3 Haiku1004600029.2%
Z.AI GLM 4.7785880029.0%
GPT-4.1 Mini575400022.1%
Gemma 3 27B575400022.1%
GPT-4o, May 13th (temp=1)5431250022.0%
Mistral Large5031250021.3%
Cohere Command R+ (Aug. 2024)663400020.1%
Llama 3.1 70B100000020.0%
Grok 4 Fast5418140017.1%
Gemini 2.5 Pro25201818016.1%
Gemini 3 Flash (Preview)75000015.0%
Llama 3.1 Nemotron 70B421800011.9%
Gemma 3 12B342100011.0%
GPT-4.1 Nano54000010.7%
Ministral 3B54000010.7%
DeepSeek V3.134110009.1%
Arcee AI: Trinity Large (Preview)4500009.0%
GPT-5 Nano25180008.6%
Gemma 3 4B3000006.0%
Ministral 3 8B1600003.2%
Z.AI GLM 4.61400002.8%
GPT-4o Mini (temp=1)1000001.9%
Arcee AI: Trinity Mini1000001.9%
Stealth: Aurora Alpha000000.0%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemini 2.5 Flash Lite000000.0%
Qwen 2.5 72B000000.0%
Mistral Small 3.2 24B000000.0%
Ministral 3 3B000000.0%
Mistral NeMO000000.0%
Ministral 8B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-51001001001009498.9%
Claude 3.7 Sonnet1001001001008997.9%
Mistral Medium 3.11001001001007595.0%
GPT-5 Mini10010096898594.0%
Hermes 3 405B1001001001006392.5%
Claude 3.5 Sonnet1001001001005490.7%
Gemini 2.5 Flash Lite1001001001005490.7%
GPT-5.11009794827589.5%
Z.AI GLM 51001001001004288.3%
Rocinante 12B1001001001004288.3%
Z.AI GLM 4.7100100100804885.5%
Claude Sonnet 4.6100100100100080.0%
Z.AI GLM 4.5100100100100080.0%
MoonshotAI: Kimi K2.5100100100633479.3%
Claude Haiku 4.5100100100424276.7%
Gemini 2.5 Flash10010010054070.7%
Minimax M2.5100878780070.5%
Writer: Palmyra X5100100100252570.0%
Llama 3.1 8B10010010037067.4%
GPT-5.21001006960065.8%
Z.AI GLM 4.610010010025065.0%
ByteDance Seed 1.6 Flash1001006348062.1%
WizardLM 2 8x22b1001008025060.9%
Gemini 2.5 Pro1001001000060.0%
DeepSeek V3 (2025-03-24)1001001000060.0%
Llama 3.1 70B1001001000060.0%
Mistral Large1001001000060.0%
Hermes 3 70B1001001000060.0%
Arcee AI: Trinity Mini1001001000060.0%
DeepSeek V3.21001005425055.7%
Z.AI GLM 4.7 Flash10010025252555.0%
Mistral Small 3.2 24B100100630052.5%
ByteDance Seed 1.6100100340046.8%
Grok 4 Fast10063540043.2%
GPT-4o, May 13th (temp=1)8369630043.1%
o4 Mini High10010000040.0%
Grok 410010000040.0%
Mistral Large 310010000040.0%
GPT-4o, Aug. 6th (temp=1)10010000040.0%
Mistral NeMO10010000040.0%
GPT-4.1 Nano10010000040.0%
Ministral 3B10010000040.0%
Mistral Small Creative10063340039.3%
GPT-4o Mini (temp=1)100582011037.9%
Grok 4.1 Fast10069140036.7%
Ministral 3 14B6969250032.8%
Gemini 3 Pro (Preview)9748180032.6%
Llama 3.1 Nemotron 70B1005400030.7%
o4 Mini54423414028.6%
Arcee AI: Trinity Large (Preview)1004200028.3%
GPT-5 Nano34342918023.0%
DeepSeek-V2 Chat100000020.0%
GPT-4.1 Mini100000020.0%
Mistral Large 2100000020.0%
Ministral 3 3B100000020.0%
DeepSeek V3.1692500018.9%
Claude 3 Haiku542500015.7%
Gemini 3 Flash (Preview)482500014.6%
Qwen 3.5 Plus (2026-02-15)71000014.3%
Qwen 2.5 72B69000013.9%
GPT-4.1422500013.3%
Cohere Command R+ (Aug. 2024)422500013.3%
Gemma 3 27B2525140012.8%
Ministral 8B54000010.7%
DeepSeek V3 (2024-12-26)2500005.0%
Gemma 3 12B2500005.0%
GPT-4o, May 13th (temp=0)1400002.8%
Stealth: Aurora Alpha000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Ministral 3 8B000000.0%
Gemma 3 4B000000.0%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.6100100100100100100.0%
GPT-5.1999983777586.8%
Minimax M2.51008280807282.6%
Claude Opus 4.610010082635780.2%
Qwen 3.5 397B A17B100100100100080.0%
Rocinante 12B100100100100080.0%
Claude Opus 41009385733677.3%
GPT-594917474066.3%
Z.AI GLM 51001008934064.7%
Claude Haiku 4.51001006654063.9%
GPT-5 Mini807269632561.8%
GPT-4o, Aug. 6th (temp=1)100757554060.7%
MoonshotAI: Kimi K2.5100967234060.4%
Writer: Palmyra X51006360422557.8%
Hermes 3 70B1001006614056.0%
Hermes 3 405B1001005810053.6%
GPT-5.2716864441853.2%
Claude Sonnet 4.5755858373753.0%
Ministral 8B100100540050.7%
Z.AI GLM 4.7 Flash83635845049.8%
ByteDance Seed 1.6635454423749.6%
Claude Sonnet 480665418043.4%
DeepSeek-V2 Chat63634834041.4%
GPT-4.18063630040.9%
GPT-5 Nano75605116040.5%
Gemini 3.1 Pro (Preview)10075250040.0%
Mistral Large10054250035.7%
o4 Mini High10063140035.3%
DeepSeek V3 (2025-03-24)75424214034.4%
Claude Opus 4.563572518633.6%
Claude 3.7 Sonnet58474212031.8%
Mistral Medium 3.183312518031.5%
o4 Mini807500030.9%
Llama 3.1 8B1005400030.7%
GPT-4o, May 13th (temp=0)5454420029.8%
GPT-4o Mini (temp=1)995000029.7%
Mistral Large 310025180028.6%
Grok 41003100026.3%
Mistral Large 210014140025.6%
Claude 3.5 Sonnet754800024.6%
Ministral 3 14B695400024.6%
Z.AI GLM 4.56034250023.8%
Claude 3.5 Haiku100000020.0%
Llama 3.1 70B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Ministral 3 8B100000020.0%
Gemini 3 Flash (Preview)583700019.0%
Gemini 3 Pro (Preview)4242110018.9%
GPT-4.1 Mini751800018.6%
ByteDance Seed 1.6 Flash573600018.5%
DeepSeek V3 (2024-12-26)484200017.9%
Grok 4.1 Fast252570011.5%
Qwen 3.5 Plus (2026-02-15)46600010.4%
Mistral Small Creative34100008.7%
Z.AI GLM 4.73400006.8%
Gemini 2.5 Flash3400006.8%
Claude 3 Haiku3000006.0%
Grok 4 Fast2500005.0%
DeepSeek V3.22500005.0%
Cohere Command R+ (Aug. 2024)2500005.0%
GPT-4o, May 13th (temp=1)600001.2%
Gemini 2.5 Pro000000.0%
Z.AI GLM 4.6000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
DeepSeek V3.1000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemma 3 12B000000.0%
Gemini 2.5 Flash Lite000000.0%
Gemma 3 27B000000.0%
Qwen 2.5 72B000000.0%
Mistral Small 3.2 24B000000.0%
Arcee AI: Trinity Large (Preview)000000.0%
Arcee AI: Trinity Mini000000.0%
Ministral 3 3B000000.0%
Mistral NeMO000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
WizardLM 2 8x22b000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Claude Opus 4.51001001001009498.8%
GPT-51009796876989.8%
DeepSeek V3 (2025-03-24)1001001001002585.0%
Claude Opus 4100100100694282.2%
Qwen 3.5 397B A17B100100100100080.0%
MoonshotAI: Kimi K2.5100100100100080.0%
Gemini 2.5 Pro100100100100080.0%
Minimax M2.5100100100100080.0%
Mistral Large 3100100100100080.0%
Claude Haiku 4.5100100100100080.0%
DeepSeek V3.2100100100100080.0%
DeepSeek V3.1100100100100080.0%
Llama 3.1 70B100100100100080.0%
Ministral 3 14B100100100100080.0%
Ministral 8B100100100100080.0%
ByteDance Seed 1.610010010042068.3%
Grok 4 Fast10010010042068.3%
GPT-5 Nano10010010025065.0%
Writer: Palmyra X510010010025065.0%
o4 Mini High1001001000060.0%
Z.AI GLM 4.61001001000060.0%
Gemini 2.5 Flash1001001000060.0%
Llama 3.1 Nemotron 70B1001001000060.0%
Mistral Large1001001000060.0%
Claude 3 Haiku1001001000060.0%
Ministral 3 8B1001001000060.0%
Mistral NeMO1001001000060.0%
GPT-4.1 Nano1001001000060.0%
GPT-5.1100897534059.7%
GPT-5 Mini87806363058.2%
Mistral Small Creative1001005425055.7%
Claude Sonnet 41001006314055.3%
Gemini 3 Pro (Preview)1001005414053.5%
Hermes 3 70B1001004225053.3%
ByteDance Seed 1.6 Flash77635450048.7%
Gemini 3 Flash (Preview)100100420048.3%
Hermes 3 405B100100420048.3%
o4 Mini100100340046.8%
Mistral Medium 3.1100100250045.0%
Grok 410010000040.0%
GPT-4.110010000040.0%
GPT-4o, Aug. 6th (temp=0)10010000040.0%
GPT-4o Mini (temp=1)10010000040.0%
WizardLM 2 8x22b10010000040.0%
GPT-5.210075140037.8%
Grok 4.1 Fast10042140031.1%
Claude Sonnet 4.51004200028.3%
GPT-4o, May 13th (temp=1)1003400026.8%
Claude 3.5 Sonnet1002500025.0%
DeepSeek-V2 Chat1002500025.0%
Gemma 3 4B1002500025.0%
Claude 3.7 Sonnet42252525023.3%
Z.AI GLM 4.7 Flash100000020.0%
Qwen 3.5 Plus (2026-02-15)100000020.0%
DeepSeek V3 (2024-12-26)100000020.0%
GPT-4.1 Mini100000020.0%
Gemma 3 12B100000020.0%
Gemma 3 27B100000020.0%
Qwen 2.5 72B100000020.0%
Mistral Small 3.2 24B100000020.0%
Z.AI GLM 4.53400006.8%
Z.AI GLM 4.72500005.0%
Cohere Command R+ (Aug. 2024)2500005.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Arcee AI: Trinity Large (Preview)000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100979297.8%
Minimax M2.5100100100978195.6%
Claude Sonnet 4.5100100100858393.7%
Claude Haiku 4.5100100100896390.4%
Claude 3.7 Sonnet1009997816688.7%
GPT-5.1999689876887.8%
MoonshotAI: Kimi K2.5100100100755485.7%
Claude Opus 4.510010095696485.7%
Claude 3.5 Sonnet10010075696782.4%
Z.AI GLM 5100100100633779.9%
Mistral Medium 3.1100959189074.9%
Writer: Palmyra X51009283772074.4%
GPT-5978574555372.8%
GPT-5.2998266595872.7%
GPT-5 Mini1009573692572.5%
Claude Sonnet 410010065454470.9%
Z.AI GLM 4.7858369633867.7%
o4 Mini948787541466.9%
Claude Opus 4100858260666.8%
Grok 496926663063.2%
Gemini 3.1 Pro (Preview)10010010011062.3%
o4 Mini High1007766541462.2%
GPT-4.1966054543860.1%
Claude 3.5 Haiku1001001000060.0%
Llama 3.1 70B1001001000060.0%
Llama 3.1 8B100100890057.9%
GPT-4o, Aug. 6th (temp=1)975854343054.6%
ByteDance Seed 1.6 Flash100757522054.3%
Arcee AI: Trinity Mini8075750045.9%
Gemini 3 Flash (Preview)9687460045.6%
Mistral Small Creative1006337161045.0%
Ministral 3 3B100100250045.0%
Mistral Large 283585820043.9%
Arcee AI: Trinity Large (Preview)10060545043.8%
Z.AI GLM 4.7 Flash575447422043.7%
ByteDance Seed 1.68075500040.9%
Mistral Large 310010000040.0%
DeepSeek-V2 Chat10010000040.0%
GPT-4.1 Nano10087100039.2%
Rocinante 12B1008770038.8%
Hermes 3 405B1009200038.3%
Gemini 3 Pro (Preview)67583916737.6%
Mistral Small 3.2 24B1008300036.7%
Grok 4.1 Fast8054480036.2%
GPT-5 Nano7160420034.7%
DeepSeek V3 (2024-12-26)10057100033.3%
Ministral 8B69632014033.1%
Ministral 3 14B63423411029.9%
Ministral 3B1004800029.6%
Qwen 3.5 Plus (2026-02-15)665800024.9%
Mistral Large6642140024.3%
WizardLM 2 8x22b1001600023.3%
Cohere Command R+ (Aug. 2024)6634140022.8%
Z.AI GLM 4.5574760022.1%
Llama 3.1 Nemotron 70B100000020.0%
DeepSeek V3.25722160018.9%
DeepSeek V3.13432215018.5%
GPT-4o, May 13th (temp=1)92000018.3%
GPT-4.1 Mini80000015.9%
Grok 4 Fast582000015.6%
Ministral 3 8B581000013.6%
GPT-4o Mini (temp=1)54000010.7%
Gemma 3 27B11110004.5%
Hermes 3 70B1470004.2%
Gemma 3 12B1900003.8%
Gemini 2.5 Pro1060003.1%
GPT-4o, Aug. 6th (temp=0)600001.2%
Z.AI GLM 4.6000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, May 13th (temp=0)000000.0%
Gemini 2.5 Flash000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemini 2.5 Flash Lite000000.0%
Qwen 2.5 72B000000.0%
Claude 3 Haiku000000.0%
Mistral NeMO000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.61001001001008797.3%
Claude Opus 4.61001001001007795.5%
GPT-5.11001001001007194.1%
GPT-51001001001006993.9%
Claude 3.7 Sonnet10010094888092.3%
Claude Opus 4.5100100100926390.8%
Minimax M2.51009996656384.4%
Claude Opus 41009694893482.5%
Claude Sonnet 4.51001001001001081.9%
o4 Mini10010080635880.1%
Qwen 3.5 397B A17B100100100100080.0%
Rocinante 12B100100100100080.0%
Claude Haiku 4.51009287635478.9%
GPT-5 Mini10010091383472.5%
Z.AI GLM 5100969263871.6%
GPT-4.1100876963765.2%
Claude 3.5 Sonnet1007569542564.6%
Mistral Large 21001008337064.0%
Mistral Large100898734062.0%
GPT-5.2817569443961.7%
Claude Sonnet 41008954372560.9%
Hermes 3 405B100877542060.6%
Mistral Medium 3.11009663251459.4%
o4 Mini High89896354058.9%
Mistral Large 3100695454055.3%
Grok 4.1 Fast94807214051.9%
DeepSeek V3 (2025-03-24)1001004214051.1%
Ministral 3 8B100100340046.8%
Writer: Palmyra X583424242743.1%
MoonshotAI: Kimi K2.583585418042.6%
Llama 3.1 8B10068450042.6%
Gemini 3.1 Pro (Preview)10010000040.0%
DeepSeek-V2 Chat7554420034.0%
Claude 3 Haiku10037310033.6%
GPT-5 Nano6864360033.6%
Llama 3.1 Nemotron 70B1005400030.7%
GPT-4o, May 13th (temp=1)8046250030.1%
Gemini 2.5 Pro875400028.0%
Mistral Small Creative1003700027.4%
Z.AI GLM 4.742392817025.2%
Cohere Command R+ (Aug. 2024)1002500025.0%
Gemini 3 Pro (Preview)7725117524.9%
ByteDance Seed 1.6 Flash685400024.3%
Arcee AI: Trinity Large (Preview)1002000023.9%
Grok 4 Fast4842180021.5%
Grok 45834140021.3%
DeepSeek V3 (2024-12-26)692570020.4%
Stealth: Aurora Alpha100000020.0%
Claude 3.5 Haiku100000020.0%
Llama 3.1 70B100000020.0%
Z.AI GLM 4.528252516018.9%
GPT-4o Mini (temp=1)661800016.8%
ByteDance Seed 1.683000016.7%
Mistral NeMO80000015.9%
GPT-4.1 Mini422570014.8%
Hermes 3 70B63000012.5%
Ministral 3 14B54000010.7%
WizardLM 2 8x22b4200008.3%
Ministral 3B4200008.3%
Ministral 8B25140007.8%
Arcee AI: Trinity Mini3400006.8%
DeepSeek V3.23100006.3%
Z.AI GLM 4.7 Flash3000006.0%
Gemma 3 27B1870005.0%
Qwen 2.5 72B2500005.0%
GPT-4.1 Nano2500005.0%
Mistral Small 3.2 24B2000003.9%
Gemini 3 Flash (Preview)1800003.6%
Gemini 2.5 Flash1070003.4%
GPT-4o, Aug. 6th (temp=1)1400002.8%
Z.AI GLM 4.6000000.0%
Qwen 3.5 Plus (2026-02-15)000000.0%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
DeepSeek V3.1000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemma 3 12B000000.0%
Gemini 2.5 Flash Lite000000.0%
Ministral 3 3B000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B1001001001009699.1%
Claude Sonnet 4.6100100100968295.4%
Z.AI GLM 510010099837391.1%
Llama 3.1 8B10010096877290.8%
GPT-5 Mini100100100925489.0%
Claude Opus 4.6100100100974287.8%
DeepSeek V3 (2025-03-24)100100100756387.5%
GPT-5.11009283767284.4%
Minimax M2.5100100100693781.2%
Claude 3.5 Haiku100100100100080.0%
Rocinante 12B100100100100080.0%
GPT-5100100100742579.8%
Claude 3.7 Sonnet898783804576.7%
Claude 3.5 Sonnet10010083633776.5%
Claude Sonnet 41008580674876.0%
Claude Opus 41008775644273.5%
Claude Opus 4.51008977494872.8%
Claude Sonnet 4.51007367543064.8%
MoonshotAI: Kimi K2.5100896358062.0%
Llama 3.1 Nemotron 70B1007563342559.3%
Claude Haiku 4.5807572571159.0%
Gemini 3.1 Pro (Preview)100100660053.2%
GPT-5.2805855343251.7%
Mistral Large 2877534201445.9%
Hermes 3 70B87664234045.7%
GPT-4.1806642311045.7%
Grok 4 Fast10069540044.6%
Arcee AI: Trinity Large (Preview)100634218044.4%
Hermes 3 405B69695814743.7%
Mistral Small Creative10075250040.0%
Ministral 3 14B10080200039.9%
GPT-5 Nano56544219535.1%
ByteDance Seed 1.610051140033.0%
Gemini 3 Pro (Preview)543430161128.9%
Mistral Medium 3.17254180028.8%
Writer: Palmyra X566421814027.9%
Mistral Large 35448340027.1%
GPT-4o, Aug. 6th (temp=1)6942200026.2%
ByteDance Seed 1.6 Flash6336280025.2%
Z.AI GLM 4.75130298023.6%
DeepSeek V3 (2024-12-26)1001800023.6%
Grok 4.1 Fast4637320023.0%
o4 Mini544800020.3%
Grok 4100000020.0%
GPT-4o, May 13th (temp=1)100000020.0%
WizardLM 2 8x22b89000017.9%
Gemma 3 27B543100017.0%
Gemini 2.5 Flash71000014.3%
Z.AI GLM 4.5511400013.1%
Z.AI GLM 4.7 Flash3216140012.5%
Llama 3.1 70B63000012.5%
DeepSeek V3.254700012.2%
Mistral Large54000010.7%
Gemini 2.5 Pro4600009.2%
GPT-4.1 Nano2500005.0%
Ministral 3B2500005.0%
o4 Mini High2000003.9%
DeepSeek V3.12000003.9%
GPT-4.1 Mini10100003.8%
Cohere Command R+ (Aug. 2024)1800003.6%
Gemini 2.5 Flash Lite1400002.8%
Qwen 3.5 Plus (2026-02-15)600001.2%
Z.AI GLM 4.6000000.0%
Gemini 3 Flash (Preview)000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, May 13th (temp=0)000000.0%
DeepSeek-V2 Chat000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemma 3 12B000000.0%
Qwen 2.5 72B000000.0%
Mistral Small 3.2 24B000000.0%
Claude 3 Haiku000000.0%
Ministral 3 8B000000.0%
Arcee AI: Trinity Mini000000.0%
Ministral 3 3B000000.0%
Mistral NeMO000000.0%
Gemma 3 4B000000.0%
Ministral 8B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Claude Sonnet 4.61001001001006993.9%
Claude Opus 410010094927592.1%
Llama 3.1 8B1001001001004288.3%
Gemini 3.1 Pro (Preview)100100100100080.0%
Claude Opus 4.6100100100100080.0%
MoonshotAI: Kimi K2.5100100100100080.0%
Claude Sonnet 4100100100100080.0%
Z.AI GLM 4.6100100100100080.0%
DeepSeek V3 (2025-03-24)100100100100080.0%
Grok 4 Fast100100100752580.0%
Claude Haiku 4.5100100100100080.0%
Z.AI GLM 4.5100100100662578.2%
GPT-5 Nano10010063484671.3%
GPT-51009266583670.4%
o4 Mini High1008763544268.9%
Claude 3.5 Sonnet10010010042068.3%
Writer: Palmyra X5928380631065.3%
GPT-5.11001006742663.0%
Claude Sonnet 4.510010010014062.8%
GPT-5 Mini1001007534061.8%
GPT-4.11001008914060.6%
Z.AI GLM 51001001000060.0%
Hermes 3 405B1001001000060.0%
Llama 3.1 70B1001001000060.0%
Gemini 2.5 Flash Lite1001001000060.0%
Llama 3.1 Nemotron 70B1001001000060.0%
Ministral 3 14B1001001000060.0%
Claude Opus 4.5100100870057.3%
Rocinante 12B100100870057.3%
DeepSeek V3 (2024-12-26)100100690053.9%
GPT-4o, Aug. 6th (temp=1)1001004225053.3%
Ministral 8B100100540050.7%
GPT-5.210072630047.0%
Grok 4.1 Fast77756314045.8%
o4 Mini63635425040.7%
Mistral Large 310010000040.0%
GPT-4.1 Mini10010000040.0%
DeepSeek V3.110010000040.0%
Mistral Large10010000040.0%
Gemma 3 27B10010000040.0%
GPT-4.1 Nano10010000040.0%
Grok 410034250031.8%
ByteDance Seed 1.6 Flash9631310031.8%
Mistral Small 3.2 24B1004200028.3%
Cohere Command R+ (Aug. 2024)1004200028.3%
Claude 3.7 Sonnet7745180028.0%
DeepSeek V3.21002500025.0%
GPT-4o, May 13th (temp=1)636300025.0%
Mistral Large 21002500025.0%
Mistral Medium 3.1723400021.3%
ByteDance Seed 1.6100000020.0%
Qwen 3.5 Plus (2026-02-15)100000020.0%
DeepSeek-V2 Chat100000020.0%
GPT-4o, Aug. 6th (temp=0)100000020.0%
Ministral 3 8B100000020.0%
Ministral 3 3B100000020.0%
Ministral 3B100000020.0%
Gemini 3 Flash (Preview)871000019.2%
Arcee AI: Trinity Large (Preview)542500015.7%
Z.AI GLM 4.7631000014.4%
Z.AI GLM 4.7 Flash66000013.2%
Hermes 3 70B63000012.5%
GPT-4o Mini (temp=1)2500005.0%
Mistral NeMO2500005.0%
Gemma 3 12B1400002.8%
Mistral Small Creative1400002.8%
Gemini 3 Pro (Preview)700001.5%
Stealth: Aurora Alpha000000.0%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Qwen 2.5 72B000000.0%
Claude 3 Haiku000000.0%
Arcee AI: Trinity Mini000000.0%
Gemma 3 4B000000.0%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
ByteDance Seed 1.61001001001008095.9%
GPT-5.1100100100928094.2%
Claude Haiku 4.510010089876989.1%
Claude Opus 4.610010092925888.3%
MoonshotAI: Kimi K2.5100100100834886.3%
GPT-5.210010078776583.9%
GPT-510010087785082.9%
Minimax M2.51009775696681.6%
Claude 3.5 Haiku100100100100080.0%
Mistral Medium 3.110010087872579.6%
o4 Mini High10010010075075.0%
Llama 3.1 8B10010010054070.7%
o4 Mini100100100251868.6%
Mistral Large 21001008354067.4%
Grok 4.1 Fast10010010034066.8%
Writer: Palmyra X5896966542560.7%
GPT-5 Mini1009556381260.1%
Mistral Large 31001001000060.0%
Claude Sonnet 4.6100806354059.1%
Claude Opus 4.599636342053.1%
Gemini 3 Flash (Preview)1001004810051.5%
Ministral 8B100100540050.7%
Claude Sonnet 4.569665848749.9%
Z.AI GLM 5100695425049.6%
Claude Opus 496635425047.3%
Llama 3.1 70B100100250045.0%
Z.AI GLM 4.7 Flash545451382243.5%
Claude 3.7 Sonnet69575425041.0%
Claude Sonnet 487832510040.9%
GPT-4.1 Nano10063420040.8%
Grok 410010000040.0%
DeepSeek V3 (2024-12-26)10010000040.0%
Mistral Large10010000040.0%
Ministral 3 3B10010000040.0%
Llama 3.1 Nemotron 70B10054420039.0%
ByteDance Seed 1.6 Flash8272256037.0%
Z.AI GLM 4.710038370035.0%
Gemini 3 Pro (Preview)7554366034.0%
DeepSeek V3.21006900033.9%
WizardLM 2 8x22b927200032.8%
Ministral 3 8B1005800031.7%
Rocinante 12B1003700027.4%
Mistral Small Creative755400025.7%
Arcee AI: Trinity Large (Preview)695800025.6%
GPT-5 Nano913500025.1%
Claude 3.5 Sonnet834200025.0%
DeepSeek V3.11001800023.6%
GPT-4.1754200023.3%
Z.AI GLM 4.5754200023.3%
Gemini 2.5 Pro100000020.0%
Gemini 2.5 Flash100000020.0%
Hermes 3 405B100000020.0%
Ministral 3B100000020.0%
Hermes 3 70B92000018.3%
GPT-4o, May 13th (temp=1)88000017.6%
Grok 4 Fast83000016.7%
GPT-4.1 Mini63000012.5%
Ministral 3 14B54000010.7%
Z.AI GLM 4.64200008.3%
GPT-4o, May 13th (temp=0)2500005.0%
GPT-4o, Aug. 6th (temp=0)2500005.0%
GPT-4o, Aug. 6th (temp=1)1800003.6%
Gemma 3 12B700001.5%
Stealth: Aurora Alpha000000.0%
Qwen 3.5 Plus (2026-02-15)000000.0%
DeepSeek-V2 Chat000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemini 2.5 Flash Lite000000.0%
Gemma 3 27B000000.0%
Qwen 2.5 72B000000.0%
Mistral Small 3.2 24B000000.0%
Claude 3 Haiku000000.0%
Arcee AI: Trinity Mini000000.0%
Cohere Command R+ (Aug. 2024)000000.0%
Mistral NeMO000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Minimax M2.51001001001005490.7%
GPT-5.11001001001004889.6%
Ministral 3B1001001001004288.3%
GPT-510010096805485.7%
Claude Opus 4.61001001001002585.0%
o4 Mini High1001001001001482.8%
Gemini 3 Pro (Preview)100100100100080.0%
Claude Sonnet 4.6100100100100080.0%
Claude 3.5 Sonnet100100100100080.0%
Claude Haiku 4.5100100100100080.0%
DeepSeek V3.2100100100100080.0%
Mistral Medium 3.1100100100100080.0%
DeepSeek V3.1100100100100080.0%
Mistral Large 2100100100100080.0%
Rocinante 12B100100100100080.0%
Llama 3.1 8B10010010075075.0%
Cohere Command R+ (Aug. 2024)10010010069073.9%
Mistral Large10010010034066.8%
Mistral Small Creative10010010025065.0%
GPT-5 Mini1001006458064.5%
ByteDance Seed 1.610010042422561.7%
Gemini 3 Flash (Preview)1001001000060.0%
Z.AI GLM 4.7 Flash1001001000060.0%
Z.AI GLM 4.51001001000060.0%
Hermes 3 405B1001001000060.0%
Mistral Small 3.2 24B1001001000060.0%
Ministral 3 8B1001001000060.0%
Ministral 8B1001001000060.0%
WizardLM 2 8x22b1001001000060.0%
Ministral 3 14B100100690053.9%
Writer: Palmyra X51001004225053.3%
GPT-5 Nano91894542053.3%
Claude Sonnet 4100100420048.3%
DeepSeek V3 (2024-12-26)100100250045.0%
Hermes 3 70B100100250045.0%
o4 Mini1004225181840.5%
Z.AI GLM 4.710010000040.0%
Qwen 3.5 Plus (2026-02-15)10010000040.0%
DeepSeek-V2 Chat10010000040.0%
Llama 3.1 70B10010000040.0%
Gemini 2.5 Flash Lite10010000040.0%
Claude 3 Haiku10010000040.0%
Ministral 3 3B10010000040.0%
GPT-4.1 Nano10010000040.0%
GPT-5.210054347039.0%
ByteDance Seed 1.6 Flash10066250038.2%
Llama 3.1 Nemotron 70B10054250035.7%
Grok 4.1 Fast10048250034.6%
Claude Opus 41007200034.5%
GPT-4o, May 13th (temp=1)1006900033.9%
Claude 3.7 Sonnet63503418032.9%
Grok 4 Fast10034250031.8%
GPT-4.1 Mini1004200028.3%
GPT-4o, Aug. 6th (temp=1)10025140027.8%
GPT-4o, Aug. 6th (temp=0)100000020.0%
Gemma 3 27B100000020.0%
Mistral NeMO100000020.0%
GPT-4.1542500015.7%
GPT-4o Mini (temp=1)58000011.7%
Grok 41400002.8%
Stealth: Aurora Alpha000000.0%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemma 3 12B000000.0%
Qwen 2.5 72B000000.0%
Arcee AI: Trinity Large (Preview)000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
Minimax M2.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-51001001001009999.8%
Claude Sonnet 4.61001001001009999.7%
Claude Opus 4.61001001001009699.1%
Claude Haiku 4.51001001001009298.3%
Gemini 3.1 Pro (Preview)1001001001008997.9%
Claude 3.7 Sonnet100100100898995.7%
Z.AI GLM 51001001001007595.0%
GPT-5 Mini1001001001006593.1%
Writer: Palmyra X5100100100976692.7%
Claude Opus 410010089828290.5%
GPT-5.110010092827690.1%
Claude Opus 4.5100100100876389.8%
DeepSeek V3 (2025-03-24)100100100895488.6%
MoonshotAI: Kimi K2.51001001001004288.3%
Claude Sonnet 4.5100100100666686.5%
Mistral Large 2100100100872582.3%
o4 Mini10010094872581.1%
Claude 3.5 Haiku100100100100080.0%
Claude 3.5 Sonnet10010094921079.0%
Llama 3.1 8B10010010094078.8%
GPT-5.21008174635875.1%
GPT-4.11008775722571.8%
o4 Mini High100100100251467.8%
Z.AI GLM 4.610010063482567.1%
GPT-5 Nano10010010031066.3%
Claude Sonnet 41001007550065.0%
Hermes 3 70B1001008725062.3%
Z.AI GLM 4.510010069251662.1%
WizardLM 2 8x22b10010057311260.0%
Mistral Medium 3.1997558422559.7%
Grok 4.1 Fast1001005442059.0%
Mistral Small Creative100100890057.9%
Gemini 3 Pro (Preview)10097870056.8%
Hermes 3 405B100776334054.8%
Z.AI GLM 4.7 Flash100945414052.2%
DeepSeek V3.285775737051.2%
ByteDance Seed 1.6 Flash918934212050.9%
Ministral 3 14B80635442047.5%
Mistral Large 310080540046.6%
Ministral 3 8B1004237342046.5%
Arcee AI: Trinity Large (Preview)10089420046.2%
GPT-4o, May 13th (temp=1)1005430301445.4%
GPT-4o, Aug. 6th (temp=1)10080420044.2%
Gemini 3 Flash (Preview)10063540043.2%
Llama 3.1 Nemotron 70B10063480042.1%
Ministral 3B10010000040.0%
Cohere Command R+ (Aug. 2024)10080140038.7%
Z.AI GLM 4.783542514035.2%
Gemini 2.5 Pro897500032.9%
Ministral 3 3B1006300032.5%
Qwen 3.5 Plus (2026-02-15)777550031.5%
Gemma 3 27B6350280028.1%
DeepSeek-V2 Chat8331250027.9%
DeepSeek V3.163372014026.6%
Gemini 2.5 Flash665400023.9%
Mistral Large1001000021.9%
Grok 4100000020.0%
GPT-4.1 Nano100000020.0%
Grok 4 Fast96000019.1%
Ministral 8B544200019.0%
Gemma 3 12B54000010.7%
Arcee AI: Trinity Mini4600009.2%
Mistral Small 3.2 24B4200008.3%
Rocinante 12B3470008.3%
GPT-4o Mini (temp=1)1800003.6%
GPT-4.1 Mini770002.9%
Llama 3.1 70B1400002.8%
Gemini 2.5 Flash Lite1000001.9%
Stealth: Aurora Alpha000000.0%
DeepSeek V3 (2024-12-26)000000.0%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Qwen 2.5 72B000000.0%
Claude 3 Haiku000000.0%
Mistral NeMO000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Grok 4.1 Fast100100100998997.6%
Gemini 3.1 Pro (Preview)1001001001008396.7%
Mistral Large 21001001001006993.9%
Claude Sonnet 4.6100100100927593.3%
Claude Opus 4.61001001001006392.5%
GPT-5 Mini969391868289.4%
o4 Mini High10010096876388.9%
o4 Mini10010080807586.8%
Cohere Command R+ (Aug. 2024)1001001001002585.0%
Claude Opus 410010096893784.3%
DeepSeek V3 (2025-03-24)10010075755480.7%
Writer: Palmyra X51009287754880.3%
Mistral Large100100100100080.0%
Hermes 3 70B100100100464277.5%
Claude Sonnet 4.510010010080075.9%
Llama 3.1 70B10010010063072.5%
Rocinante 12B10010010063072.5%
Claude Haiku 4.5100978380072.0%
GPT-5.11001009166071.4%
Z.AI GLM 4.51001007566068.2%
Claude Opus 4.5878380513466.9%
Claude Sonnet 410010075421466.1%
ByteDance Seed 1.6 Flash1009473292564.2%
GPT-4o, Aug. 6th (temp=1)10010010018063.6%
MoonshotAI: Kimi K2.51009769252162.5%
Hermes 3 405B100927542061.7%
Claude 3.7 Sonnet99916942661.3%
Mistral Large 31001001000060.0%
GPT-4.11007554481458.1%
Gemini 2.5 Pro1001006325057.5%
Z.AI GLM 51001007214057.3%
WizardLM 2 8x22b696554544256.7%
Claude 3.5 Sonnet1001004234055.2%
Minimax M2.5100635742753.7%
GPT-5.21009225251851.9%
Z.AI GLM 4.7756966341451.7%
Mistral Small Creative100635442051.5%
GPT-4.1 Nano100636331051.2%
GPT-4o, May 13th (temp=1)87725037049.1%
Arcee AI: Trinity Large (Preview)100100420048.3%
Llama 3.1 8B100100370047.4%
Mistral Medium 3.175694242045.6%
Gemini 3 Pro (Preview)7271586442.5%
Claude 3.5 Haiku10010000040.0%
Grok 4 Fast10063340039.3%
Claude 3 Haiku10050420038.3%
DeepSeek V3 (2024-12-26)10063250037.5%
Grok 41006077034.9%
Ministral 3 14B1006300032.5%
Z.AI GLM 4.610042180031.9%
GPT-4.1 Mini6969180031.3%
DeepSeek V3.26354100025.1%
Llama 3.1 Nemotron 70B7525140022.8%
GPT-5 Nano100000020.0%
Ministral 3 8B100000020.0%
Ministral 3B100000020.0%
Ministral 8B752500020.0%
Z.AI GLM 4.7 Flash4725168019.3%
Gemma 3 27B662000017.2%
DeepSeek-V2 Chat83000016.7%
Qwen 3.5 Plus (2026-02-15)462880016.5%
Gemini 2.5 Flash4814140015.2%
GPT-4o Mini (temp=1)482000013.6%
DeepSeek V3.1422500013.3%
GPT-4o, May 13th (temp=0)3025100012.9%
Gemma 3 12B303000011.9%
Gemini 2.5 Flash Lite58000011.7%
Gemini 3 Flash (Preview)30180009.5%
Stealth: Aurora Alpha1400002.8%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Qwen 2.5 72B000000.0%
Mistral Small 3.2 24B000000.0%
Arcee AI: Trinity Mini000000.0%
Ministral 3 3B000000.0%
Mistral NeMO000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5.1100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Claude Opus 4100100100978295.9%
GPT-5 Mini1001001001007595.0%
GPT-51001001001007294.5%
Claude Opus 4.6100100100927593.3%
o4 Mini100100100878093.2%
Claude Haiku 4.51001001001005090.0%
Claude Sonnet 4.6100100100876389.8%
Z.AI GLM 5100100100994688.9%
DeepSeek V3 (2025-03-24)10010092806988.1%
Claude Opus 4.510010082805182.4%
Minimax M2.510010080676081.3%
Gemini 3.1 Pro (Preview)100100100100080.0%
Hermes 3 70B1009789634278.1%
Rocinante 12B1001009889077.5%
o4 Mini High100100100543177.0%
GPT-5.2100989475073.3%
Claude 3.7 Sonnet998866544971.1%
Mistral Small Creative10010010042068.3%
Claude Sonnet 4.51001009738067.1%
MoonshotAI: Kimi K2.596898337562.0%
Gemini 3 Pro (Preview)1001008521061.2%
DeepSeek V3 (2024-12-26)1001001000060.0%
Claude Sonnet 41007548373458.8%
Mistral Large1001007514057.8%
Claude 3.5 Sonnet99926325055.6%
Writer: Palmyra X5996654342555.5%
Ministral 3 14B10010025252555.0%
GPT-5 Nano100100730054.7%
DeepSeek V3.292756342054.2%
Mistral Medium 3.187776925051.7%
GPT-4.19283750050.0%
Z.AI GLM 4.6100100420048.3%
Grok 4.1 Fast87844416046.3%
Mistral Large 28775690046.2%
Grok 4 Fast10083250041.7%
DeepSeek-V2 Chat10087180040.9%
ByteDance Seed 1.6 Flash10072310040.7%
Llama 3.1 8B10010000040.0%
Z.AI GLM 4.5100691414039.4%
Mistral Large 310069250038.9%
Ministral 3B1008700037.3%
Z.AI GLM 4.76958540036.3%
Hermes 3 405B10069100035.8%
Llama 3.1 Nemotron 70B63632514032.8%
Gemini 2.5 Pro8748250031.9%
Gemini 2.5 Flash7565160031.3%
Llama 3.1 70B1005400030.7%
GPT-4o Mini (temp=1)7242180026.4%
Cohere Command R+ (Aug. 2024)943700026.1%
Arcee AI: Trinity Mini694200022.2%
Ministral 3 3B545400021.4%
Grok 4802070021.3%
Qwen 3.5 Plus (2026-02-15)633900020.3%
Claude 3.5 Haiku100000020.0%
Mistral Small 3.2 24B100000020.0%
Ministral 8B100000020.0%
GPT-4o, May 13th (temp=1)891000019.8%
Z.AI GLM 4.7 Flash5825140019.4%
GPT-4o, Aug. 6th (temp=1)87000017.3%
WizardLM 2 8x22b3731180017.2%
Gemini 2.5 Flash Lite302500011.0%
Gemma 3 27B4600009.2%
GPT-4.1 Mini3400006.8%
Arcee AI: Trinity Large (Preview)3400006.8%
Stealth: Aurora Alpha2500005.0%
DeepSeek V3.1700001.5%
Gemini 3 Flash (Preview)000000.0%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemma 3 12B000000.0%
Qwen 2.5 72B000000.0%
Claude 3 Haiku000000.0%
Ministral 3 8B000000.0%
Mistral NeMO000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Minimax M2.51001001001008797.3%
Z.AI GLM 4.71001001001008396.7%
GPT-5 Mini100100100827791.9%
Claude Opus 41001001001005490.7%
GPT-510010087828089.6%
Claude Sonnet 41001001001002585.0%
Claude 3.5 Sonnet100100100100080.0%
Mistral Small Creative100100100100080.0%
Z.AI GLM 4.510010010089077.9%
Writer: Palmyra X51008069696376.2%
GPT-4.110010010075075.0%
GPT-5.11001008983074.5%
o4 Mini1001008380072.6%
o4 Mini High10010010063072.5%
Claude Sonnet 4.510010010054070.7%
Mistral Medium 3.110010010042068.3%
DeepSeek V3.110010010042068.3%
Gemini 2.5 Flash10010010042068.3%
Claude 3.7 Sonnet1009489341466.2%
Claude Opus 4.51007269483464.8%
Z.AI GLM 4.7 Flash10010054422564.0%
Z.AI GLM 4.61001001000060.0%
Claude Haiku 4.51001001000060.0%
Mistral Large 21001001000060.0%
Gemini 2.5 Flash Lite1001001000060.0%
Rocinante 12B1001001000060.0%
Mistral Small 3.2 24B100100970059.4%
DeepSeek V3.2100100690053.9%
WizardLM 2 8x22b100100690053.9%
GPT-5.2100725437052.5%
DeepSeek V3 (2025-03-24)100100540050.7%
GPT-4o, May 13th (temp=1)100100540050.7%
Hermes 3 405B100100540050.7%
Z.AI GLM 5100100420048.3%
GPT-4o, Aug. 6th (temp=1)1001001814046.3%
ByteDance Seed 1.6 Flash100634222045.1%
Hermes 3 70B10069540044.6%
ByteDance Seed 1.610010000040.0%
Mistral Large 310010000040.0%
GPT-4o, May 13th (temp=0)10010000040.0%
DeepSeek-V2 Chat10010000040.0%
GPT-4o, Aug. 6th (temp=0)10010000040.0%
Mistral Large10010000040.0%
Ministral 3 14B10010000040.0%
Grok 4 Fast10054420039.0%
GPT-5 Nano1004200028.3%
Arcee AI: Trinity Large (Preview)1004200028.3%
Gemma 3 27B696300026.4%
Llama 3.1 8B754200023.3%
DeepSeek V3 (2024-12-26)100000020.0%
Llama 3.1 70B100000020.0%
Claude 3 Haiku100000020.0%
Cohere Command R+ (Aug. 2024)100000020.0%
Ministral 3B100000020.0%
Grok 4661000015.2%
Gemma 3 12B75000015.0%
Llama 3.1 Nemotron 70B541400013.5%
Gemini 3 Flash (Preview)66000013.2%
Gemini 3 Pro (Preview)34140009.6%
Qwen 3.5 Plus (2026-02-15)3800007.6%
GPT-4o Mini (temp=1)3100006.3%
Ministral 8B2500005.0%
Stealth: Aurora Alpha000000.0%
GPT-4.1 Mini000000.0%
GPT-4o Mini (temp=0)000000.0%
Qwen 2.5 72B000000.0%
Ministral 3 8B000000.0%
Arcee AI: Trinity Mini000000.0%
Ministral 3 3B000000.0%
Mistral NeMO000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%