Em-dash & semicolon overuse

Test: Bad Writing Habits

Avg. Score
40.5%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Mistral NeMO95.0%$0.000510.1s63%
2Qwen 2.5 72B93.0%$0.001036.7s64%
3GPT-4o, May 13th (temp=0)92.9%$0.03514.1s64%
4Gemini 2.5 Flash86.5%$0.005210.6s50%
5Gemini 2.5 Flash Lite82.2%$0.00099.5s48%
6Qwen 3.5 397B A17B98.3%$0.0143.0m78%
7Z.AI GLM 4.684.0%$0.006551.5s43%
8Gemini 2.5 Pro83.7%$0.03636.2s50%
9GPT-4o, Aug. 6th (temp=0)79.3%$0.02322.7s30%
10Gemini 3.1 Pro (Preview)95.9%$0.1071.8m67%
11Gemma 3 27B73.5%$0.000652.6s30%
12Hermes 3 405B74.3%$0.003253.2s25%
13Qwen 3.5 Plus (2026-02-15)71.3%$0.006031.5s22%
14GPT-4o Mini (temp=0)66.3%$0.001234.8s24%
15Claude 3 Haiku60.2%$0.002514.9s12%
16Stealth: Aurora Alpha50.7%$0.00009.8s7%
17Z.AI GLM 4.764.3%$0.0101.4m17%
18Llama 3.1 70B54.4%$0.001529.4s6%
19DeepSeek V3.260.4%$0.00141.9m19%
20Cohere Command R+ (Aug. 2024)60.2%$0.02052.5s12%
21DeepSeek V3.159.2%$0.00201.8m19%
22Hermes 3 70B57.3%$0.00101.2m11%
23Rocinante 12B46.1%$0.001438.4s4%
24Gemma 3 12B41.2%$0.000441.3s8%
25Mistral Small 3.2 24B88.0%$0.00695.7m43%
26GPT-5.169.6%$0.0541.8m18%
27Z.AI GLM 4.7 Flash46.9%$0.00171.2m8%
28WizardLM 2 8x22b54.3%$0.00261.8m11%
29Arcee AI: Trinity Mini32.0%$0.00039.2s2%
30Arcee AI: Trinity Large (Preview)39.9%$0.000043.6s3%
31Llama 3.1 Nemotron 70B34.5%$0.003831.7s3%
32GPT-4o, May 13th (temp=1)40.0%$0.03314.4s5%
33o4 Mini38.1%$0.01525.7s2%
34Claude 3.5 Haiku26.7%$0.003510.8s0%
35Claude Sonnet 4.546.5%$0.03538.1s2%
36o4 Mini High42.8%$0.02547.2s4%
37Ministral 3 3B22.9%$0.000511.1s0%
38Z.AI GLM 541.9%$0.00841.2m0%
39Gemini 3 Flash (Preview)25.7%$0.007819.6s0%
40Z.AI GLM 4.528.3%$0.005142.1s2%
41Claude Haiku 4.527.3%$0.01121.6s0%
42Ministral 3B17.7%$0.00018.1s0%
43Ministral 3 8B20.2%$0.000819.6s0%
44Mistral Large 323.8%$0.003330.3s0%
45GPT-5.255.9%$0.0561.5m7%
46Mistral Large 225.7%$0.01329.4s0%
47Ministral 8B14.0%$0.000410.4s0%
48Claude Sonnet 4.636.5%$0.03139.3s0%
49DeepSeek V3 (2024-12-26)27.1%$0.002154.6s0%
50Mistral Large25.2%$0.01430.9s0%
51GPT-569.0%$0.0652.8m17%
52Grok 4.1 Fast21.2%$0.001837.8s0%
53Mistral Small Creative12.0%$0.00079.1s0%
54Claude Sonnet 436.4%$0.03243.7s0%
55Claude 3.5 Sonnet40.3%$0.04835.5s0%
56DeepSeek-V2 Chat24.0%$0.002153.3s0%
57Llama 3.1 8B29.3%$0.00031.3m0%
58GPT-5 Mini27.9%$0.010057.4s0%
59Gemma 3 4B11.5%$0.000220.0s0%
60Ministral 3 14B8.0%$0.000711.7s0%
61Minimax M2.524.0%$0.00341.3m0%
62GPT-4o Mini (temp=1)10.0%$0.001234.8s0%
63GPT-4.1 Mini5.0%$0.002719.0s0%
64GPT-4.1 Nano2.2%$0.000713.3s0%
65Writer: Palmyra X58.0%$0.01122.0s0%
66Mistral Medium 3.19.1%$0.004836.5s0%
67ByteDance Seed 1.6 Flash4.7%$0.001327.3s0%
68Gemini 3 Pro (Preview)33.2%$0.05554.4s3%
69DeepSeek V3 (2025-03-24)6.3%$0.001439.4s0%
70Grok 4 Fast1.4%$0.001724.1s0%
71GPT-4o, Aug. 6th (temp=1)7.0%$0.01824.4s0%
72Claude Opus 4.536.6%$0.07053.4s0%
73GPT-4.15.6%$0.01844.7s0%
74Claude 3.7 Sonnet9.6%$0.04246.7s0%
75Claude Opus 4.633.4%$0.0781.2m0%
76GPT-5 Nano0.8%$0.00421.4m0%
77MoonshotAI: Kimi K2.537.9%$0.0193.2m0%
78Grok 45.5%$0.0481.7m0%
79ByteDance Seed 1.60.0%$0.0132.5m0%
80Claude Opus 434.6%$0.2091.4m0%
40.45%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)1001001001009799.4%
Qwen 3.5 Plus (2026-02-15)1001001001009498.8%
Z.AI GLM 51001001001006993.8%
Stealth: Aurora Alpha100100100706286.4%
Z.AI GLM 4.7100100100991883.3%
Claude Sonnet 4.6100100100100080.0%
Claude 3 Haiku100100100100079.9%
GPT-4o Mini (temp=0)10010091831878.4%
Gemini 2.5 Pro100100100503877.6%
Claude Opus 4100100100641976.5%
Z.AI GLM 4.610010075653875.5%
o4 Mini High10010096661174.6%
Gemini 2.5 Flash100100100372973.1%
Gemini 2.5 Flash Lite10010060491965.4%
Claude Sonnet 41001001000060.0%
Hermes 3 70B1001001000060.0%
Gemma 3 27B1006158561357.7%
GPT-5 Mini1008371221057.1%
DeepSeek V3.2100956724057.1%
GPT-4o, May 13th (temp=1)100696611049.0%
Llama 3.1 70B100100430048.5%
DeepSeek V3.1864843402548.4%
MoonshotAI: Kimi K2.5100100200044.1%
o4 Mini100613720043.6%
WizardLM 2 8x22b7866546040.6%
Z.AI GLM 4.7 Flash10060430040.4%
Minimax M2.510010000040.0%
Claude 3.5 Haiku10010000040.0%
Hermes 3 405B10010000040.0%
Cohere Command R+ (Aug. 2024)10010000040.0%
Llama 3.1 8B1009200038.4%
Claude Haiku 4.51007600035.3%
Llama 3.1 Nemotron 70B10048130032.1%
Claude 3.5 Sonnet905900029.8%
Rocinante 12B1004300028.6%
DeepSeek-V2 Chat803600023.2%
Arcee AI: Trinity Large (Preview)484400018.5%
DeepSeek V3 (2024-12-26)76700016.6%
GPT-4.178000015.7%
Z.AI GLM 4.5432400013.5%
Gemma 3 12B52200010.7%
Arcee AI: Trinity Mini25240009.8%
Grok 4.1 Fast3200006.4%
Gemini 3 Pro (Preview)3100006.2%
GPT-4o, Aug. 6th (temp=1)3000006.0%
Mistral Large 22200004.4%
Mistral Large 31100002.2%
Claude 3.7 Sonnet300000.7%
Grok 4000000.0%
ByteDance Seed 1.6000000.0%
Gemini 3 Flash (Preview)000000.0%
GPT-5 Nano000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Grok 4 Fast000000.0%
GPT-4.1 Mini000000.0%
ByteDance Seed 1.6 Flash000000.0%
Mistral Medium 3.1000000.0%
Writer: Palmyra X5000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Large000000.0%
Mistral Small Creative000000.0%
Ministral 3 14B000000.0%
Ministral 3 8B000000.0%
Ministral 3 3B000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
Ministral 8B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)1001001001009999.8%
Gemini 2.5 Flash Lite1001001001008396.7%
Gemini 2.5 Flash1001001001008096.0%
Gemini 2.5 Pro1001001001006893.7%
Hermes 3 405B10010099947593.5%
Gemma 3 12B100100100946391.3%
Z.AI GLM 4.6100100100866690.5%
Z.AI GLM 4.7 Flash100100100994789.2%
MoonshotAI: Kimi K2.51001001001002584.9%
DeepSeek V3.2100100100851981.0%
Claude Sonnet 4.5100100100100080.0%
Grok 4.1 Fast100100100100080.0%
Rocinante 12B10010010096079.1%
Hermes 3 70B10010010090078.0%
Gemma 3 27B100100100711877.8%
Claude Opus 410010010087077.4%
Claude 3.5 Sonnet100100100701176.2%
Z.AI GLM 4.7948977683673.0%
Arcee AI: Trinity Large (Preview)1001009341066.8%
DeepSeek V3.11008977292463.8%
Claude 3.5 Haiku10010010013062.6%
o4 Mini100985250060.0%
WizardLM 2 8x22b100100932059.0%
o4 Mini High1001007412057.3%
Z.AI GLM 4.51001004935257.1%
GPT-5 Mini100964641056.5%
Llama 3.1 8B10098750054.6%
Llama 3.1 70B100100680053.5%
Gemini 3 Pro (Preview)10086770052.6%
GPT-4o, May 13th (temp=1)81815241051.0%
GPT-4o Mini (temp=0)10080530046.5%
Cohere Command R+ (Aug. 2024)10066440042.0%
Llama 3.1 Nemotron 70B10010070041.4%
Gemini 3 Flash (Preview)64383717031.1%
Claude 3 Haiku1005200030.4%
Mistral Large5432320023.5%
DeepSeek V3 (2024-12-26)624100020.6%
Stealth: Aurora Alpha84000016.8%
Minimax M2.574000014.9%
GPT-4.157000011.4%
Ministral 3 3B51300010.8%
Ministral 8B2600005.2%
Ministral 3 14B1400002.8%
Claude 3.7 Sonnet1200002.5%
DeepSeek-V2 Chat1200002.4%
Grok 4000000.0%
ByteDance Seed 1.6000000.0%
GPT-5 Nano000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Grok 4 Fast000000.0%
Mistral Large 3000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4.1 Mini000000.0%
ByteDance Seed 1.6 Flash000000.0%
Mistral Medium 3.1000000.0%
Writer: Palmyra X5000000.0%
Mistral Large 2000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Small Creative000000.0%
Ministral 3 8B000000.0%
Arcee AI: Trinity Mini000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Mistral NeMO100100100100100100.0%
Gemini 2.5 Flash Lite1001001001008997.8%
Qwen 2.5 72B1001001001008396.7%
Rocinante 12B100100100958596.0%
Arcee AI: Trinity Large (Preview)100100100938495.4%
Minimax M2.51001001001007695.3%
Hermes 3 405B1001001001007695.3%
Z.AI GLM 4.510010094858392.6%
WizardLM 2 8x22b10010096897592.0%
MoonshotAI: Kimi K2.5100100100827691.7%
Claude Opus 41001001001005891.6%
o4 Mini10010092787889.6%
Claude 3.5 Haiku1001001001003987.8%
Z.AI GLM 4.7 Flash100100100953786.4%
GPT-4o, May 13th (temp=0)1001001001003286.4%
Claude 3 Haiku1001001001003086.0%
Claude Haiku 4.5100100100744784.1%
Hermes 3 70B100100100100080.0%
Stealth: Aurora Alpha100100100602977.9%
Gemma 3 27B1001009971074.0%
Cohere Command R+ (Aug. 2024)10010066545174.0%
Ministral 3 3B10010010070073.9%
GPT-4o, May 13th (temp=1)100100100262169.4%
Gemini 3 Flash (Preview)10010077282666.3%
GPT-4o Mini (temp=0)1001008546066.2%
GPT-5 Mini1001006857065.0%
o4 Mini High1001008735064.4%
Arcee AI: Trinity Mini10010010016063.2%
Mistral Large 3826355544259.0%
Llama 3.1 70B100100930058.5%
DeepSeek V3 (2024-12-26)1001007314057.5%
Gemini 3 Pro (Preview)835851493454.9%
Writer: Palmyra X5100983325251.6%
Llama 3.1 Nemotron 70B10089680051.3%
Gemma 3 12B100754632050.7%
Mistral Small Creative100775911550.3%
DeepSeek-V2 Chat100894017049.2%
Grok 4100736013049.1%
GPT-4.11001003013048.7%
Claude 3.7 Sonnet100100280045.6%
Mistral Large 261565544243.9%
Mistral Large10059550042.8%
Ministral 3B68603734039.8%
Ministral 3 8B6451397533.3%
Ministral 3 14B1005670032.6%
Ministral 8B85372812032.3%
Mistral Medium 3.1703800021.7%
DeepSeek V3 (2025-03-24)100000020.0%
GPT-4.1 Mini4532220019.9%
ByteDance Seed 1.6 Flash692700019.2%
Gemma 3 4B593700019.2%
Llama 3.1 8B542400015.5%
GPT-4o Mini (temp=1)66000013.3%
Grok 4 Fast1500003.0%
ByteDance Seed 1.6000000.0%
GPT-5 Nano000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B1001001001009398.5%
Gemini 2.5 Flash1001001001009298.4%
GPT-4o, Aug. 6th (temp=0)1001001001007595.0%
WizardLM 2 8x22b100100100887893.1%
Grok 4.1 Fast1001001001006392.6%
Cohere Command R+ (Aug. 2024)100100100797190.2%
Hermes 3 405B1001001001004488.9%
GPT-4o, May 13th (temp=0)100100100736387.2%
DeepSeek V3.210010081777286.0%
Claude Haiku 4.5100100100893985.6%
o4 Mini100100100913284.6%
Claude Opus 4100100100595783.2%
Gemma 3 27B100100100951181.0%
Claude 3 Haiku100100100100080.0%
Mistral NeMO100100100100080.0%
Gemini 2.5 Flash Lite10010089653978.5%
GPT-5 Mini100100100821078.4%
DeepSeek V3.110010010079075.7%
Rocinante 12B100100100441171.1%
Claude 3.5 Sonnet10010010043569.4%
Minimax M2.510010092361669.0%
GPT-4o, May 13th (temp=1)1008673701368.3%
Claude 3.5 Haiku10010061442966.8%
Llama 3.1 70B1001001007061.4%
Mistral Large 3100916254061.3%
Hermes 3 70B1001001000060.0%
Stealth: Aurora Alpha1001005931057.9%
Llama 3.1 8B100100790055.9%
Llama 3.1 Nemotron 70B1007368161353.9%
Z.AI GLM 4.51001004415953.7%
GPT-4o Mini (temp=0)100834816049.4%
Mistral Large 2945540381648.5%
Gemini 3 Flash (Preview)73624732042.7%
GPT-4o, Aug. 6th (temp=1)1009900039.7%
Gemini 3 Pro (Preview)10071142037.3%
Arcee AI: Trinity Mini10059220036.2%
Mistral Large10032320032.8%
Arcee AI: Trinity Large (Preview)745800026.4%
DeepSeek-V2 Chat1002722026.3%
ByteDance Seed 1.6 Flash794200024.2%
DeepSeek V3 (2024-12-26)702750020.5%
Gemma 3 12B534800020.2%
Ministral 3 3B543730018.8%
Ministral 3 8B73000014.6%
Writer: Palmyra X529140008.4%
DeepSeek V3 (2025-03-24)3500007.0%
GPT-4.1 Mini3500007.0%
Mistral Small Creative2732006.5%
Gemma 3 4B3100006.2%
Claude 3.7 Sonnet2600005.2%
Ministral 3B1900003.7%
Ministral 8B1700003.4%
Mistral Medium 3.1100000.3%
Grok 4000000.0%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
GPT-5 Nano000000.0%
Grok 4 Fast000000.0%
GPT-4o Mini (temp=1)000000.0%
Ministral 3 14B000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Gemini 2.5 Flash1001001001009999.8%
Z.AI GLM 4.71001001001008997.9%
Z.AI GLM 4.61001001001008797.5%
Qwen 2.5 72B1001001001008596.9%
o4 Mini1001001001008196.3%
Gemini 2.5 Pro10010097928695.1%
Hermes 3 405B1001001001005190.1%
GPT-4o Mini (temp=0)100100100945590.0%
Mistral Small 3.2 24B100100100943986.7%
Mistral NeMO100100100874586.6%
GPT-4o, May 13th (temp=0)100100100715785.5%
Claude Sonnet 41001001001002184.3%
Llama 3.1 Nemotron 70B10010097883583.9%
Gemini 2.5 Flash Lite1001001001001482.8%
Z.AI GLM 4.7 Flash10010079675780.4%
Grok 4.1 Fast100100100100080.0%
Gemma 3 12B10010077744379.0%
GPT-5 Mini1008774696178.1%
Cohere Command R+ (Aug. 2024)10010010082076.4%
DeepSeek V3.11008673685476.3%
Hermes 3 70B10010010079075.9%
DeepSeek V3.2100100100433475.3%
o4 Mini High10010010060072.0%
WizardLM 2 8x22b10010010054070.9%
Stealth: Aurora Alpha10010094441470.4%
Llama 3.1 70B1001009848069.0%
Minimax M2.510010010034066.9%
Claude 3 Haiku10010010029366.4%
Gemma 3 27B1008961561964.9%
DeepSeek V3 (2024-12-26)978965332762.1%
Arcee AI: Trinity Large (Preview)1001005554061.9%
Gemini 3 Pro (Preview)100726965061.2%
DeepSeek-V2 Chat1008173261558.9%
Ministral 3 3B98845243055.3%
Mistral Large 2896843401951.6%
Mistral Large10065630045.5%
Ministral 8B100653922045.3%
Ministral 3 8B98544716343.5%
Z.AI GLM 4.5100632514842.2%
Claude Haiku 4.510063410040.7%
Rocinante 12B10010000040.0%
Claude 3.5 Haiku6844410030.6%
Writer: Palmyra X56053255028.4%
Llama 3.1 8B10029100027.7%
Mistral Large 36350114025.4%
Gemini 3 Flash (Preview)6346120024.2%
GPT-4o Mini (temp=1)793200022.2%
Ministral 3 14B6025185021.6%
Claude 3.7 Sonnet100500021.0%
Grok 4 Fast100000020.0%
GPT-4o, May 13th (temp=1)92700019.7%
Ministral 3B553600018.2%
Mistral Small Creative3733111016.5%
Arcee AI: Trinity Mini3732100015.7%
ByteDance Seed 1.6 Flash29200009.9%
GPT-4.1 Mini3400006.8%
Mistral Medium 3.12900005.8%
Gemma 3 4B2300004.5%
GPT-4.11440003.7%
DeepSeek V3 (2025-03-24)1100002.2%
Grok 4000000.0%
ByteDance Seed 1.6000000.0%
GPT-5 Nano000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Mistral NeMO1001001001009498.7%
Gemini 2.5 Pro1001001001008897.6%
Qwen 2.5 72B100100100979097.3%
Qwen 3.5 Plus (2026-02-15)100100100957794.4%
Z.AI GLM 4.61001001001006893.5%
Gemma 3 27B1009494826086.2%
DeepSeek V3.2100100100963586.2%
GPT-5 Mini100100100882081.6%
Claude 3.5 Sonnet100100100100080.0%
Claude 3.5 Haiku10010082793378.9%
GPT-4o Mini (temp=0)1009883761674.9%
Mistral Small 3.2 24B1001009875074.5%
Claude Opus 410010010053070.6%
Claude Haiku 4.510010010046069.2%
Z.AI GLM 4.7 Flash1001009145768.5%
Gemini 2.5 Flash Lite1008776581667.5%
DeepSeek V3.110010074451266.2%
Llama 3.1 70B1001009021062.2%
Hermes 3 405B100100880057.6%
Minimax M2.51001006312054.9%
o4 Mini93745935052.1%
Llama 3.1 Nemotron 70B10076713050.3%
Claude 3 Haiku100100372248.2%
WizardLM 2 8x22b10081591048.2%
Gemini 3 Pro (Preview)9778556047.1%
Rocinante 12B10099310046.0%
o4 Mini High100594525045.7%
Cohere Command R+ (Aug. 2024)100100130042.6%
Stealth: Aurora Alpha1009100038.1%
Hermes 3 70B8156290033.1%
Gemini 3 Flash (Preview)73362813029.9%
Z.AI GLM 4.597251312029.4%
Mistral Large6549290028.5%
Ministral 3 8B875110027.9%
Arcee AI: Trinity Large (Preview)855400027.8%
Gemma 3 12B6752140026.7%
Grok 4.1 Fast100000020.0%
DeepSeek V3 (2024-12-26)100000020.0%
Llama 3.1 8B100000020.0%
Writer: Palmyra X54528270020.0%
DeepSeek-V2 Chat97000019.4%
Ministral 3B88000017.5%
Arcee AI: Trinity Mini63000012.7%
GPT-4o, May 13th (temp=1)52000010.3%
Mistral Small Creative4300008.5%
Ministral 3 3B3900007.8%
Claude 3.7 Sonnet2920006.1%
Ministral 3 14B1900003.7%
Mistral Medium 3.11800003.6%
ByteDance Seed 1.6 Flash1500003.0%
Ministral 8B1000002.0%
Grok 4000000.0%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
GPT-5 Nano000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Grok 4 Fast000000.0%
Mistral Large 3000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4.1 Mini000000.0%
Mistral Large 2000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Mistral Small 3.2 24B100100100100100100.0%
GPT-4o, May 13th (temp=0)1001001001009498.9%
GPT-4o, Aug. 6th (temp=0)100100100986893.2%
Mistral NeMO10010089888592.5%
Qwen 2.5 72B100100100837191.0%
Llama 3.1 70B100100100735685.9%
GPT-4o Mini (temp=0)100100100713882.0%
Gemini 2.5 Flash1009791724781.4%
Claude 3 Haiku100100100633078.7%
Qwen 3.5 397B A17B10010010081076.1%
Gemini 2.5 Flash Lite10010088343471.3%
GPT-4o, May 13th (temp=1)100858060065.0%
Gemma 3 27B10010010010262.3%
Gemini 3.1 Pro (Preview)1009961201959.8%
Hermes 3 405B1001005635058.3%
Gemini 2.5 Pro89756361057.7%
Cohere Command R+ (Aug. 2024)100100830056.5%
Llama 3.1 Nemotron 70B83716341051.8%
DeepSeek V3.2100824314047.9%
Hermes 3 70B10080150039.0%
Z.AI GLM 4.67669400036.8%
WizardLM 2 8x22b8348372234.5%
Qwen 3.5 Plus (2026-02-15)1006400032.8%
Arcee AI: Trinity Mini762900021.0%
DeepSeek V3.1841900020.5%
Claude 3.5 Sonnet94000018.9%
GPT-4o Mini (temp=1)71000014.1%
Rocinante 12B561000013.2%
Ministral 3 8B30110008.3%
DeepSeek V3 (2024-12-26)3600007.2%
Z.AI GLM 4.72780006.9%
Gemma 3 12B2900005.7%
o4 Mini High2500004.9%
Gemini 3 Pro (Preview)2300004.6%
GPT-5.11560004.1%
Stealth: Aurora Alpha1900003.7%
Arcee AI: Trinity Large (Preview)1300002.6%
GPT-5700001.4%
DeepSeek-V2 Chat400000.8%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
MoonshotAI: Kimi K2.5000000.0%
Claude Opus 4.5000000.0%
o4 Mini000000.0%
Z.AI GLM 5000000.0%
GPT-5.2000000.0%
Claude Opus 4000000.0%
Minimax M2.5000000.0%
Claude Sonnet 4000000.0%
Grok 4000000.0%
Claude Sonnet 4.6000000.0%
Claude Sonnet 4.5000000.0%
Grok 4.1 Fast000000.0%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
Gemini 3 Flash (Preview)000000.0%
Z.AI GLM 4.7 Flash000000.0%
GPT-5 Nano000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Grok 4 Fast000000.0%
Mistral Large 3000000.0%
Claude 3.7 Sonnet000000.0%
Claude Haiku 4.5000000.0%
Z.AI GLM 4.5000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
ByteDance Seed 1.6 Flash000000.0%
Mistral Medium 3.1000000.0%
Writer: Palmyra X5000000.0%
Mistral Large 2000000.0%
Mistral Large000000.0%
Mistral Small Creative000000.0%
Ministral 3 14B000000.0%
Ministral 3 3B000000.0%
Llama 3.1 8B000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
Ministral 8B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Mistral NeMO100100100100100100.0%
Qwen 3.5 397B A17B1001001001006492.8%
Gemma 3 27B100100100937092.5%
GPT-4o, Aug. 6th (temp=0)1001001001004989.8%
Qwen 2.5 72B1001001001004488.9%
Gemini 2.5 Flash100100100755686.2%
Mistral Small 3.2 24B100100100100080.0%
Llama 3.1 70B1001009086075.1%
Z.AI GLM 4.6100949384074.1%
GPT-4o, May 13th (temp=0)1008666594471.1%
Gemini 2.5 Pro1008457562965.2%
WizardLM 2 8x22b9894910056.5%
Gemini 2.5 Flash Lite746756512755.0%
Claude 3 Haiku100100547052.1%
Cohere Command R+ (Aug. 2024)100100508051.6%
Hermes 3 70B10076620047.8%
DeepSeek V3.1100723320946.8%
Hermes 3 405B10078510045.7%
Gemma 3 12B796533321645.0%
GPT-4o Mini (temp=0)7669440037.7%
Arcee AI: Trinity Large (Preview)1007000033.9%
Ministral 3 3B6655260029.4%
DeepSeek V3.2393433231328.4%
GPT-56734146324.8%
Rocinante 12B575390023.8%
Claude Sonnet 4.5653000019.1%
GPT-5.136251712018.0%
Arcee AI: Trinity Mini444400017.7%
Qwen 3.5 Plus (2026-02-15)78000015.7%
Llama 3.1 Nemotron 70B73000014.6%
Llama 3.1 8B521900014.2%
Stealth: Aurora Alpha53000010.6%
Ministral 8B23140007.4%
GPT-4o, May 13th (temp=1)3400006.9%
GPT-5.23400006.8%
Mistral Small Creative17150006.3%
Z.AI GLM 4.72800005.6%
Z.AI GLM 4.7 Flash2800005.5%
Z.AI GLM 52500005.0%
DeepSeek V3 (2024-12-26)1800003.5%
Claude Opus 41700003.4%
DeepSeek-V2 Chat400000.8%
Claude 3.7 Sonnet100000.2%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
o4 Mini High000000.0%
MoonshotAI: Kimi K2.5000000.0%
Claude Opus 4.5000000.0%
o4 Mini000000.0%
Gemini 3 Pro (Preview)000000.0%
Minimax M2.5000000.0%
Claude Sonnet 4000000.0%
Grok 4000000.0%
Claude Sonnet 4.6000000.0%
Grok 4.1 Fast000000.0%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
Gemini 3 Flash (Preview)000000.0%
GPT-5 Nano000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Grok 4 Fast000000.0%
Claude 3.5 Sonnet000000.0%
Mistral Large 3000000.0%
Claude Haiku 4.5000000.0%
Z.AI GLM 4.5000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
ByteDance Seed 1.6 Flash000000.0%
Mistral Medium 3.1000000.0%
Writer: Palmyra X5000000.0%
Mistral Large 2000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Large000000.0%
Ministral 3 14B000000.0%
Ministral 3 8B000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Mistral NeMO100100100100100100.0%
Mistral Small 3.2 24B1001001001009098.0%
Gemini 2.5 Flash Lite1001001001004889.4%
Gemini 2.5 Pro1001001001003286.5%
Z.AI GLM 4.610010092855386.0%
Gemini 2.5 Flash1009490884683.4%
GPT-4o Mini (temp=0)10010094843782.9%
Hermes 3 405B100100100100380.7%
GPT-4o, Aug. 6th (temp=0)100100100100080.0%
Qwen 2.5 72B100100100603879.8%
Hermes 3 70B10010010092078.4%
Gemma 3 27B1001009381074.8%
WizardLM 2 8x22b10010010054070.7%
DeepSeek V3.210010073473170.2%
DeepSeek V3.1978063471961.3%
Claude 3.5 Sonnet1001006934060.6%
Arcee AI: Trinity Large (Preview)1001001000060.0%
Ministral 3 3B1006351483359.1%
Claude 3 Haiku100685246053.2%
Z.AI GLM 4.51001003624051.9%
Rocinante 12B8879760048.5%
Gemma 3 12B1001002711047.5%
Qwen 3.5 Plus (2026-02-15)100852919046.4%
Ministral 3B8574710045.9%
Cohere Command R+ (Aug. 2024)66544734941.8%
Llama 3.1 8B10010000040.0%
Mistral Small Creative6963610038.7%
Arcee AI: Trinity Mini10043430037.0%
GPT-5.167582817034.2%
GPT-4o, May 13th (temp=1)10045200033.1%
Gemma 3 4B797730031.9%
Claude 3.7 Sonnet8839290031.3%
GPT-58339280030.1%
Llama 3.1 70B1003050026.9%
Llama 3.1 Nemotron 70B1001970025.1%
Z.AI GLM 4.7 Flash635900024.5%
DeepSeek V3 (2024-12-26)1002100024.3%
Mistral Large704700023.3%
Claude Sonnet 466221413022.9%
Ministral 3 8B5834210022.8%
Ministral 8B694500022.8%
Stealth: Aurora Alpha812800021.8%
Z.AI GLM 4.7763200021.7%
MoonshotAI: Kimi K2.54735100018.3%
Mistral Large 2651610016.3%
Gemini 3 Pro (Preview)631340016.1%
GPT-4o Mini (temp=1)561100013.6%
DeepSeek-V2 Chat65000013.0%
Gemini 3 Flash (Preview)65000013.0%
Mistral Medium 3.158400012.5%
Claude Opus 4401900011.7%
Mistral Large 3371800011.0%
Writer: Palmyra X54800009.6%
ByteDance Seed 1.6 Flash4700009.3%
Claude 3.5 Haiku4100008.2%
Ministral 3 14B3080007.8%
Claude Sonnet 4.62280006.0%
o4 Mini High2200004.4%
o4 Mini1600003.2%
Minimax M2.51300002.6%
GPT-4o, Aug. 6th (temp=1)900001.8%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
Claude Opus 4.5000000.0%
Z.AI GLM 5000000.0%
GPT-5.2000000.0%
Grok 4000000.0%
Claude Sonnet 4.5000000.0%
Grok 4.1 Fast000000.0%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
GPT-5 Nano000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Grok 4 Fast000000.0%
Claude Haiku 4.5000000.0%
GPT-4.1 Mini000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Gemini 3.1 Pro (Preview)1001001001009398.6%
Gemini 2.5 Flash1001001001008697.3%
Gemini 2.5 Flash Lite1001001001008196.2%
Qwen 2.5 72B1001001001006693.1%
Hermes 3 405B10010095858192.2%
Gemini 2.5 Pro100100100807891.6%
Z.AI GLM 4.61001001001005190.3%
Cohere Command R+ (Aug. 2024)10010095554178.2%
DeepSeek V3.2898482705876.6%
Mistral NeMO1001009368072.2%
Gemma 3 12B100938374571.2%
Gemma 3 27B1001007774070.1%
DeepSeek V3.11001008049166.0%
Claude 3 Haiku100987724059.8%
Arcee AI: Trinity Mini1001006021056.3%
Z.AI GLM 4.7 Flash99838214055.7%
GPT-4o Mini (temp=0)1001004822054.0%
Z.AI GLM 4.71001002919049.7%
Llama 3.1 70B100100400047.9%
Mistral Large 3100624031046.6%
Stealth: Aurora Alpha10070420042.4%
Hermes 3 70B10061440041.1%
Ministral 3 8B100422720037.8%
Gemini 3 Pro (Preview)8874260037.6%
Mistral Large8556117031.9%
Mistral Large 21005050030.9%
Mistral Small 3.2 24B7850250030.7%
Qwen 3.5 Plus (2026-02-15)10033150029.7%
Rocinante 12B1004000027.9%
Ministral 3 3B1003200026.3%
GPT-4o, Aug. 6th (temp=0)1002470026.1%
Claude 3.5 Sonnet100000020.0%
Claude 3.5 Haiku100000020.0%
Arcee AI: Trinity Large (Preview)562700016.7%
GPT-5621610015.8%
Claude Haiku 4.578000015.6%
GPT-4o, May 13th (temp=1)72000014.5%
GPT-5.1472030013.9%
DeepSeek V3 (2025-03-24)60000011.9%
WizardLM 2 8x22b371500010.4%
Claude Opus 4.551000010.3%
Minimax M2.54500009.0%
Ministral 3 14B3550008.0%
Mistral Small Creative3400006.8%
Claude Opus 43200006.4%
o4 Mini2900005.7%
Gemini 3 Flash (Preview)2600005.2%
Z.AI GLM 4.52500005.0%
GPT-5.22120004.7%
MoonshotAI: Kimi K2.51900003.7%
Llama 3.1 8B1900003.7%
Claude Sonnet 4.51200002.4%
Z.AI GLM 51000001.9%
Mistral Medium 3.1800001.5%
o4 Mini High700001.4%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
Claude Sonnet 4000000.0%
Grok 4000000.0%
Claude Sonnet 4.6000000.0%
Grok 4.1 Fast000000.0%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
GPT-5 Nano000000.0%
Grok 4 Fast000000.0%
DeepSeek V3 (2024-12-26)000000.0%
DeepSeek-V2 Chat000000.0%
Claude 3.7 Sonnet000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4.1 Mini000000.0%
ByteDance Seed 1.6 Flash000000.0%
Writer: Palmyra X5000000.0%
GPT-4o Mini (temp=1)000000.0%
Llama 3.1 Nemotron 70B000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
Ministral 8B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
Gemma 3 27B1001001001009298.5%
Mistral NeMO1001001001009198.2%
Qwen 2.5 72B100100100865888.7%
Gemini 3.1 Pro (Preview)100100100686887.2%
Gemini 2.5 Flash10010094864084.2%
GPT-4o, Aug. 6th (temp=0)10010096745184.1%
GPT-4o, May 13th (temp=0)100100100744784.1%
Mistral Small 3.2 24B1001001001001382.6%
Gemini 2.5 Flash Lite10010082584977.8%
Hermes 3 405B100100100443074.8%
Cohere Command R+ (Aug. 2024)1001007563067.7%
DeepSeek V3.110010010037067.4%
Llama 3.1 70B10010010037067.4%
Gemini 2.5 Pro1009178402366.3%
Ministral 3 8B10087863055.0%
GPT-4o Mini (temp=0)87816124050.8%
DeepSeek V3.291906110050.3%
Arcee AI: Trinity Large (Preview)10096510049.3%
Mistral Large 294743927046.8%
Z.AI GLM 4.610096293045.7%
Qwen 3.5 Plus (2026-02-15)100555411044.3%
Gemma 3 4B74605320041.4%
Claude 3 Haiku10056339039.6%
Arcee AI: Trinity Mini10049355037.8%
Z.AI GLM 4.7766126141037.4%
Llama 3.1 Nemotron 70B1007573037.0%
Rocinante 12B1006500032.9%
Stealth: Aurora Alpha10050130032.7%
Ministral 8B6045300027.0%
Mistral Large706100026.2%
Gemma 3 12B1003100026.2%
GPT-5834500025.8%
GPT-5.1655800024.6%
Claude 3.5 Sonnet723890023.9%
Claude Opus 44238290021.8%
DeepSeek-V2 Chat5237170021.2%
DeepSeek V3 (2024-12-26)3736293020.9%
Z.AI GLM 5100000020.0%
DeepSeek V3 (2025-03-24)100000020.0%
Ministral 3 3B100000020.0%
WizardLM 2 8x22b100000020.0%
Claude Sonnet 4.586000017.3%
Ministral 3B80000016.0%
Hermes 3 70B413700015.6%
GPT-5.261500013.1%
Mistral Small Creative59000011.8%
Z.AI GLM 4.7 Flash391900011.6%
Mistral Large 326240009.9%
GPT-4o, May 13th (temp=1)4200008.4%
Z.AI GLM 4.54200008.3%
Claude Sonnet 4.64000008.0%
Claude Opus 4.52700005.3%
Gemini 3 Pro (Preview)1183004.6%
Mistral Medium 3.12000004.1%
Writer: Palmyra X51900003.7%
Claude 3.7 Sonnet1500003.0%
GPT-4o Mini (temp=1)1000002.0%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
o4 Mini High000000.0%
MoonshotAI: Kimi K2.5000000.0%
o4 Mini000000.0%
Minimax M2.5000000.0%
Claude Sonnet 4000000.0%
Grok 4000000.0%
Grok 4.1 Fast000000.0%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
Gemini 3 Flash (Preview)000000.0%
GPT-5 Nano000000.0%
Grok 4 Fast000000.0%
Claude Haiku 4.5000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
ByteDance Seed 1.6 Flash000000.0%
Ministral 3 14B000000.0%
Llama 3.1 8B000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
Mistral NeMO100100100100100100.0%
WizardLM 2 8x22b1001001001008196.1%
Qwen 2.5 72B1001001001007695.3%
Mistral Small 3.2 24B1001001001004288.3%
GPT-4o, May 13th (temp=0)100100100100480.8%
Gemini 3.1 Pro (Preview)100100100100080.0%
Hermes 3 405B100100100100080.0%
Gemini 2.5 Pro10010072655578.3%
Gemini 2.5 Flash10010010086077.3%
Gemini 2.5 Flash Lite10010090642776.3%
Gemma 3 27B1009468513769.9%
Hermes 3 70B1001009541067.2%
Z.AI GLM 4.61008255544367.0%
DeepSeek V3.1757148191245.2%
GPT-4o, Aug. 6th (temp=0)97634222044.9%
Claude 3 Haiku10094172042.8%
Cohere Command R+ (Aug. 2024)10010070041.4%
Gemma 3 12B10024190028.5%
Qwen 3.5 Plus (2026-02-15)616000024.3%
Z.AI GLM 4.7535200020.9%
DeepSeek V3.2682890020.8%
GPT-5564700020.5%
Grok 4100000020.0%
Llama 3.1 70B100000020.0%
Z.AI GLM 4.7 Flash4441111019.3%
Arcee AI: Trinity Mini91000018.2%
Rocinante 12B681600016.7%
Ministral 3B80200016.5%
Claude Sonnet 467000013.3%
DeepSeek-V2 Chat63000012.6%
Ministral 3 3B421900012.1%
GPT-4o Mini (temp=0)24180008.3%
Arcee AI: Trinity Large (Preview)3600007.2%
Llama 3.1 Nemotron 70B13130005.2%
Mistral Large 21600003.3%
GPT-4o, May 13th (temp=1)1400002.9%
GPT-5.1800001.7%
Mistral Large200000.4%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
o4 Mini High000000.0%
MoonshotAI: Kimi K2.5000000.0%
Claude Opus 4.5000000.0%
o4 Mini000000.0%
Z.AI GLM 5000000.0%
GPT-5.2000000.0%
Gemini 3 Pro (Preview)000000.0%
Claude Opus 4000000.0%
Minimax M2.5000000.0%
Claude Sonnet 4.6000000.0%
Claude Sonnet 4.5000000.0%
Grok 4.1 Fast000000.0%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
Gemini 3 Flash (Preview)000000.0%
Stealth: Aurora Alpha000000.0%
GPT-5 Nano000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Grok 4 Fast000000.0%
Claude 3.5 Sonnet000000.0%
DeepSeek V3 (2024-12-26)000000.0%
Mistral Large 3000000.0%
Claude 3.7 Sonnet000000.0%
Claude Haiku 4.5000000.0%
Z.AI GLM 4.5000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
ByteDance Seed 1.6 Flash000000.0%
Mistral Medium 3.1000000.0%
Writer: Palmyra X5000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Small Creative000000.0%
Ministral 3 14B000000.0%
Ministral 3 8B000000.0%
Llama 3.1 8B000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
Ministral 8B000000.0%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Qwen 2.5 72B1001001001009999.8%
Mistral NeMO1001001001009498.9%
Cohere Command R+ (Aug. 2024)100100100989197.7%
Mistral Small 3.2 24B1001001001007394.6%
Claude 3 Haiku100100100796087.9%
Hermes 3 405B100100100933585.6%
GPT-5.1100100100763381.9%
Z.AI GLM 4.610010098564880.6%
GPT-4o, Aug. 6th (temp=0)100100100100080.0%
GPT-4o, May 13th (temp=1)100100100100080.0%
GPT-4o Mini (temp=0)1009591605279.4%
Gemini 2.5 Pro10010077653876.0%
Z.AI GLM 4.7969285692773.6%
Qwen 3.5 Plus (2026-02-15)1009775482368.5%
Gemini 2.5 Flash Lite1001006756064.6%
Gemini 2.5 Flash1001007925060.7%
Llama 3.1 70B1001001000060.0%
DeepSeek V3.2100885749359.5%
Stealth: Aurora Alpha93877638058.8%
Rocinante 12B100100720054.4%
Llama 3.1 8B100100660053.1%
GPT-51001004520053.1%
Hermes 3 70B100100650052.9%
Arcee AI: Trinity Mini8987870052.5%
DeepSeek-V2 Chat10087560048.6%
o4 Mini High99584621044.7%
GPT-5.268643824039.0%
Llama 3.1 Nemotron 70B10054130033.3%
DeepSeek V3.17969140032.5%
GPT-4o Mini (temp=1)76401713029.3%
Gemma 3 27B7648140027.5%
DeepSeek V3 (2024-12-26)1003300026.6%
o4 Mini695700025.3%
Z.AI GLM 4.7 Flash6232240023.6%
Gemini 3 Flash (Preview)100000020.0%
Ministral 3 8B100000020.0%
WizardLM 2 8x22b85000017.1%
Gemma 3 12B25190008.7%
Claude Haiku 4.54300008.6%
Gemma 3 4B4300008.6%
Arcee AI: Trinity Large (Preview)4100008.2%
Claude Sonnet 42400004.8%
GPT-4o, Aug. 6th (temp=1)1700003.4%
Ministral 3B1700003.4%
Z.AI GLM 4.5500001.0%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
MoonshotAI: Kimi K2.5000000.0%
Claude Opus 4.5000000.0%
Z.AI GLM 5000000.0%
Gemini 3 Pro (Preview)000000.0%
Claude Opus 4000000.0%
Minimax M2.5000000.0%
Grok 4000000.0%
Claude Sonnet 4.6000000.0%
Claude Sonnet 4.5000000.0%
Grok 4.1 Fast000000.0%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
GPT-5 Nano000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Grok 4 Fast000000.0%
Claude 3.5 Sonnet000000.0%
Mistral Large 3000000.0%
Claude 3.7 Sonnet000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
ByteDance Seed 1.6 Flash000000.0%
Mistral Medium 3.1000000.0%
Writer: Palmyra X5000000.0%
Mistral Large 2000000.0%
Mistral Large000000.0%
Mistral Small Creative000000.0%
Ministral 3 14B000000.0%
Ministral 3 3B000000.0%
GPT-4.1 Nano000000.0%
Ministral 8B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral NeMO100100100100100100.0%
Gemini 3.1 Pro (Preview)10010010010010099.9%
GPT-4o, May 13th (temp=0)1001001001009799.4%
Z.AI GLM 4.6100100100967794.7%
Gemma 3 27B1001001001007194.3%
GPT-4o Mini (temp=0)100100100888294.0%
Gemini 2.5 Pro10010096864385.0%
GPT-5.1100100100616184.4%
Arcee AI: Trinity Large (Preview)10010094942582.4%
Mistral Small 3.2 24B100100100100080.0%
Llama 3.1 70B10010010085076.9%
Gemini 2.5 Flash10010079534374.8%
GPT-5.21008280634874.5%
Hermes 3 405B100928180070.5%
Qwen 3.5 Plus (2026-02-15)100917441061.2%
o4 Mini High80777272060.2%
WizardLM 2 8x22b100706962060.1%
Stealth: Aurora Alpha100100880057.7%
MoonshotAI: Kimi K2.5100995924056.5%
GPT-4o, Aug. 6th (temp=0)100100695054.7%
Rocinante 12B100100550051.0%
Z.AI GLM 4.7 Flash10089610050.0%
Z.AI GLM 4.793605637049.2%
DeepSeek V3.19886420045.1%
Hermes 3 70B9891370045.1%
Claude Sonnet 4.510079340042.7%
Gemini 3 Pro (Preview)81575718042.5%
Gemma 3 12B664743352142.4%
DeepSeek V3.293553231042.2%
GPT-4o, May 13th (temp=1)78743225041.9%
Claude 3.5 Haiku10010000040.0%
Arcee AI: Trinity Mini1006100032.2%
Claude 3 Haiku1004670030.6%
Z.AI GLM 51005100030.3%
Gemini 3 Flash (Preview)74201615024.8%
Z.AI GLM 4.5843400023.6%
Llama 3.1 Nemotron 70B892900023.5%
DeepSeek V3 (2024-12-26)614500021.3%
Cohere Command R+ (Aug. 2024)555100021.1%
DeepSeek-V2 Chat583680020.5%
Mistral Large 2742150020.1%
Ministral 3 3B752100019.2%
Mistral Large 374000014.7%
Mistral Large57000011.4%
o4 Mini56000011.3%
Mistral Small Creative55000011.0%
Ministral 8B292500010.8%
Grok 43900007.7%
Claude 3.5 Sonnet3700007.4%
Gemma 3 4B3200006.4%
Claude Opus 4.516150006.3%
Ministral 3 14B2600005.2%
Claude Sonnet 41300002.6%
GPT-5 Mini1000002.1%
Ministral 3B900001.8%
Ministral 3 8B500000.9%
Claude Opus 4.6000000.0%
Claude Opus 4000000.0%
Minimax M2.5000000.0%
Claude Sonnet 4.6000000.0%
Grok 4.1 Fast000000.0%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
GPT-5 Nano000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Grok 4 Fast000000.0%
Claude 3.7 Sonnet000000.0%
Claude Haiku 4.5000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4.1 Mini000000.0%
ByteDance Seed 1.6 Flash000000.0%
Mistral Medium 3.1000000.0%
Writer: Palmyra X5000000.0%
GPT-4o Mini (temp=1)000000.0%
Llama 3.1 8B000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Mistral NeMO100100100100100100.0%
Z.AI GLM 4.610010010010010099.9%
Gemini 2.5 Pro1001001001008897.7%
GPT-5.11001001001008897.6%
Gemini 2.5 Flash1001001001008596.9%
GPT-4o Mini (temp=0)1001001001008396.7%
Gemini 2.5 Flash Lite1001001001006993.7%
Qwen 2.5 72B1001001001006993.7%
Gemini 3 Flash (Preview)100100100947393.2%
Qwen 3.5 Plus (2026-02-15)1001001001006492.8%
GPT-51001001001006392.6%
Stealth: Aurora Alpha1001001001005991.8%
Cohere Command R+ (Aug. 2024)100100100827190.7%
Z.AI GLM 4.7100100100826489.2%
GPT-4o, Aug. 6th (temp=0)100100100736888.2%
o4 Mini High1001001001004088.0%
Hermes 3 405B100100100893985.6%
Arcee AI: Trinity Large (Preview)100100100100080.0%
Claude 3 Haiku1009886684879.8%
GPT-5.210010010095379.6%
Gemini 3 Pro (Preview)1009888623576.5%
Gemma 3 27B10010064615175.4%
Hermes 3 70B10010073444472.4%
WizardLM 2 8x22b10010068563070.8%
DeepSeek V3.11001009448068.3%
GPT-4o, May 13th (temp=1)10010086321967.3%
Mistral Large 2837769681762.9%
DeepSeek V3.2100858136060.4%
Llama 3.1 8B1001001000060.0%
Rocinante 12B1001001000060.0%
Gemma 3 12B100736946057.6%
o4 Mini100100730054.6%
Claude Sonnet 4.5100100730054.5%
GPT-4o, Aug. 6th (temp=1)1001004725054.4%
Z.AI GLM 4.7 Flash1007169181454.3%
Z.AI GLM 4.582716454054.1%
Mistral Large 31001004019753.1%
Llama 3.1 Nemotron 70B928235292652.8%
Mistral Large666043434150.5%
GPT-5 Mini96635732049.7%
Mistral Medium 3.191565347049.4%
Ministral 3 3B100645611647.3%
Arcee AI: Trinity Mini90624329044.6%
Gemma 3 4B100100200044.0%
DeepSeek-V2 Chat10099119043.9%
Ministral 3 8B58535246041.8%
Claude Sonnet 4.6584541382641.7%
Claude 3.5 Sonnet1009870040.9%
GPT-4o Mini (temp=1)10010000040.0%
Claude Sonnet 4725036201738.8%
DeepSeek V3 (2024-12-26)10052270035.7%
GPT-4.1 Nano10043270034.0%
Llama 3.1 70B1006500032.9%
DeepSeek V3 (2025-03-24)10033140029.5%
Ministral 3B875700028.9%
Ministral 8B7025259025.8%
Claude 3.5 Haiku1002900025.7%
Mistral Small Creative49442311025.3%
Minimax M2.5793250023.2%
GPT-4.1 Mini743200021.1%
Grok 44137260020.8%
Ministral 3 14B5927140019.9%
Claude Haiku 4.585000017.0%
GPT-5 Nano3721130014.3%
GPT-4.155000011.0%
ByteDance Seed 1.6 Flash55000011.0%
Claude Opus 454000010.7%
Claude 3.7 Sonnet2120100010.1%
Writer: Palmyra X52400004.8%
Claude Opus 4.51800003.6%
Grok 4.1 Fast1500003.1%
Grok 4 Fast800001.5%
Claude Opus 4.6400000.8%
MoonshotAI: Kimi K2.5300000.6%
Z.AI GLM 5000000.0%
ByteDance Seed 1.6000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5.1100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4o, May 13th (temp=0)1001001001009198.3%
Z.AI GLM 4.71001001001008997.9%
Gemini 2.5 Flash Lite1001001001008597.0%
Gemini 2.5 Flash100100100929196.7%
Stealth: Aurora Alpha10010098918294.1%
DeepSeek V3.1100100100877893.0%
Qwen 3.5 Plus (2026-02-15)1001001001005490.9%
Gemini 2.5 Pro1001001001003787.4%
Hermes 3 405B100100100904386.6%
Hermes 3 70B100100100685985.3%
Gemma 3 27B10010090814984.0%
GPT-5100100100615783.6%
Mistral NeMO1001001001001182.2%
GPT-5.2100100100554880.6%
Qwen 2.5 72B100100100100280.5%
Claude 3 Haiku100100100584480.4%
Arcee AI: Trinity Mini10010010078075.6%
GPT-4o Mini (temp=0)10010086533674.8%
Cohere Command R+ (Aug. 2024)100100100461171.3%
DeepSeek V3.21009662581265.7%
Gemini 3 Pro (Preview)1001007545064.1%
Z.AI GLM 4.7 Flash969361491763.3%
Mistral Large 21006556513260.7%
Mistral Large 31001005933058.4%
MoonshotAI: Kimi K2.5100100820056.4%
Llama 3.1 8B10088850054.5%
o4 Mini10097650052.3%
GPT-4o, Aug. 6th (temp=0)10076670048.6%
Mistral Large10086427047.0%
Gemma 3 12B92674029045.5%
DeepSeek V3 (2024-12-26)100100148044.6%
WizardLM 2 8x22b73564843044.0%
Llama 3.1 70B100100190043.7%
Claude Opus 481656010043.1%
Z.AI GLM 4.568665027342.9%
Claude Sonnet 4.586654811042.0%
Gemini 3 Flash (Preview)10053520040.9%
o4 Mini High96701717040.1%
GPT-4o, May 13th (temp=1)10069230038.3%
Z.AI GLM 58164150032.0%
DeepSeek-V2 Chat9044230031.4%
Ministral 3B6456350031.1%
Rocinante 12B1004900029.7%
Mistral Medium 3.16537250025.3%
Claude 3.5 Sonnet932100022.8%
GPT-4o, Aug. 6th (temp=1)100000020.0%
Claude Haiku 4.599000019.7%
Claude 3.7 Sonnet544200019.1%
Llama 3.1 Nemotron 70B682400018.3%
GPT-5 Mini443400015.7%
Arcee AI: Trinity Large (Preview)433300015.2%
Ministral 3 8B74000014.8%
Mistral Small Creative541400013.6%
Gemma 3 4B55900012.8%
Claude Opus 4.560000012.0%
GPT-4.1 Mini351900010.7%
GPT-4o Mini (temp=1)3300006.6%
Ministral 8B3000006.0%
GPT-4.1 Nano2000004.0%
Minimax M2.51010002.2%
Ministral 3 3B900001.8%
Claude Opus 4.6000000.0%
Claude Sonnet 4000000.0%
Grok 4000000.0%
Claude Sonnet 4.6000000.0%
Grok 4.1 Fast000000.0%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
GPT-5 Nano000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Grok 4 Fast000000.0%
Claude 3.5 Haiku000000.0%
ByteDance Seed 1.6 Flash000000.0%
Writer: Palmyra X5000000.0%
Ministral 3 14B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Mistral NeMO10010010010010099.9%
GPT-51001001001009699.1%
Cohere Command R+ (Aug. 2024)1001001001007294.5%
GPT-5.11001001001006993.8%
Qwen 3.5 Plus (2026-02-15)10010095898593.8%
Stealth: Aurora Alpha1001001001006893.7%
GPT-4o, May 13th (temp=0)1001001001005390.5%
Qwen 2.5 72B100100100797290.3%
Gemini 2.5 Flash100100100905589.0%
Mistral Large 31001001001003386.6%
Gemini 2.5 Flash Lite10010099863483.9%
Mistral Large10010091695282.3%
GPT-4o, May 13th (temp=1)100100100704081.8%
GPT-5.21009492803680.4%
Gemma 3 27B100100100524579.4%
Z.AI GLM 4.710010082763378.3%
Ministral 3B1001009593077.6%
Claude 3.5 Sonnet99969491075.9%
Mistral Small 3.2 24B10010010077075.4%
Mistral Large 210010010063072.7%
Gemini 2.5 Pro10010069523671.3%
Llama 3.1 8B10010010056071.3%
Hermes 3 405B100100100371670.6%
WizardLM 2 8x22b1008362513967.1%
o4 Mini High1001007349064.4%
GPT-4o, Aug. 6th (temp=0)10010010017063.4%
DeepSeek V3 (2024-12-26)817063603862.5%
Arcee AI: Trinity Large (Preview)1001001005060.9%
Hermes 3 70B878563541560.7%
GPT-4o Mini (temp=0)968150413660.6%
Claude Sonnet 4.5100846749059.9%
DeepSeek V3.1967360323158.4%
o4 Mini1001005812755.4%
DeepSeek V3.2676651453853.6%
Llama 3.1 70B10079710050.2%
Gemini 3 Pro (Preview)100743938050.1%
Z.AI GLM 5100565123045.9%
Llama 3.1 Nemotron 70B8675617045.7%
Ministral 3 14B93543829042.7%
Claude 3 Haiku100712415042.0%
GPT-4o Mini (temp=1)10054297037.8%
Gemma 3 12B84602416036.7%
Mistral Medium 3.172712317036.6%
Z.AI GLM 4.577433020034.2%
Ministral 8B8543350032.5%
Rocinante 12B1005230031.0%
Ministral 3 3B1005220030.9%
Z.AI GLM 4.7 Flash633024181429.8%
Claude Sonnet 462442019029.2%
Claude 3.7 Sonnet7448193028.7%
Gemma 3 4B57422019027.6%
MoonshotAI: Kimi K2.5785150026.8%
Minimax M2.51003000025.9%
Ministral 3 8B7129180023.6%
DeepSeek V3 (2025-03-24)615600023.5%
Gemini 3 Flash (Preview)773230022.3%
DeepSeek-V2 Chat484283120.6%
Claude Opus 4.5554800020.6%
Writer: Palmyra X5711370018.2%
GPT-5 Mini521500013.5%
GPT-4.1 Mini67000013.3%
Mistral Small Creative411600011.3%
GPT-4.151000010.3%
Arcee AI: Trinity Mini3670008.5%
ByteDance Seed 1.6 Flash3800007.5%
Claude Opus 41150003.2%
Claude Haiku 4.51100002.3%
GPT-4.1 Nano1100002.3%
Claude Sonnet 4.61000002.0%
Grok 4720001.7%
Claude Opus 4.6000000.0%
Grok 4.1 Fast000000.0%
ByteDance Seed 1.6000000.0%
GPT-5 Nano000000.0%
Grok 4 Fast000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
Claude 3.5 Haiku000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-5.11001001001009498.8%
GPT-51001001001007995.9%
Mistral Small 3.2 24B1001001001007695.1%
Qwen 3.5 Plus (2026-02-15)100100100767389.9%
GPT-4o, May 13th (temp=0)1001001001004589.1%
Gemini 2.5 Flash Lite10010093824984.7%
Qwen 2.5 72B10010010098981.4%
Gemini 2.5 Pro100100100100080.0%
Z.AI GLM 4.6100100100100080.0%
GPT-4o, Aug. 6th (temp=0)100100100831780.0%
Rocinante 12B10010010095079.0%
Z.AI GLM 4.71009472692972.8%
GPT-4o Mini (temp=0)10010050454467.9%
Gemini 2.5 Flash1001007634062.2%
Stealth: Aurora Alpha1001005235057.4%
Gemma 3 27B9391800052.9%
o4 Mini100965211051.8%
Gemma 3 12B100834813048.7%
Gemini 3 Flash (Preview)82716030048.5%
o4 Mini High100745311047.5%
Claude 3 Haiku100603729045.1%
DeepSeek V3 (2024-12-26)9772470043.1%
Gemini 3 Pro (Preview)9761540042.5%
Z.AI GLM 4.7 Flash10085241042.1%
Hermes 3 405B10010070041.4%
DeepSeek-V2 Chat10079210040.0%
Llama 3.1 70B10082160039.6%
Hermes 3 70B10072190038.2%
Arcee AI: Trinity Mini7970380037.5%
Ministral 8B1005900031.8%
DeepSeek V3.17058225331.5%
Minimax M2.57243240027.9%
GPT-5.253433011027.4%
Z.AI GLM 51003000026.1%
WizardLM 2 8x22b725800026.0%
DeepSeek V3.27530250025.8%
Claude Opus 4605200022.5%
Claude Sonnet 4.6100000020.0%
Claude 3.5 Haiku100000020.0%
Ministral 3 8B98000019.6%
Ministral 3B494700019.1%
Claude Sonnet 4.5811100018.3%
Llama 3.1 8B90000018.0%
Claude Haiku 4.5462130013.9%
GPT-4o, May 13th (temp=1)58000011.6%
GPT-4.1 Mini54000010.7%
Mistral Small Creative4600009.2%
Mistral Large2970007.1%
Cohere Command R+ (Aug. 2024)2420005.1%
Arcee AI: Trinity Large (Preview)2500005.1%
GPT-5 Mini2300004.5%
Gemma 3 4B2200004.4%
Mistral Medium 3.11200002.5%
GPT-4o Mini (temp=1)1000002.0%
Mistral Large 2500000.9%
Llama 3.1 Nemotron 70B300000.7%
Claude Opus 4.6000000.0%
MoonshotAI: Kimi K2.5000000.0%
Claude Opus 4.5000000.0%
Claude Sonnet 4000000.0%
Grok 4000000.0%
Grok 4.1 Fast000000.0%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
GPT-5 Nano000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Grok 4 Fast000000.0%
Claude 3.5 Sonnet000000.0%
Mistral Large 3000000.0%
Claude 3.7 Sonnet000000.0%
Z.AI GLM 4.5000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
ByteDance Seed 1.6 Flash000000.0%
Writer: Palmyra X5000000.0%
Ministral 3 14B000000.0%
Ministral 3 3B000000.0%
GPT-4.1 Nano000000.0%