Subject-first sentence starts

Test: Bad Writing Habits

Avg. Score
36.5%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Writer: Palmyra X583.2%$0.01122.0s50%
2Rocinante 12B77.8%$0.001438.4s38%
3Llama 3.1 8B71.0%$0.00031.3m29%
4Ministral 3 14B50.5%$0.000711.7s25%
5Claude Sonnet 4.563.6%$0.03538.1s27%
6Grok 4.1 Fast54.6%$0.001837.8s24%
7Mistral Small Creative49.3%$0.00079.1s22%
8Z.AI GLM 558.9%$0.00841.2m26%
9Claude Haiku 4.552.5%$0.01121.6s21%
10Claude 3.5 Haiku51.4%$0.003510.8s17%
11Mistral Medium 3.143.3%$0.004836.5s25%
12Hermes 3 70B57.9%$0.00101.2m20%
13Grok 4 Fast45.4%$0.001724.1s20%
14Minimax M2.552.7%$0.00341.3m23%
15Hermes 3 405B53.5%$0.003253.2s19%
16Claude Sonnet 4.653.3%$0.03139.3s21%
17Claude Sonnet 453.0%$0.03243.7s22%
18Llama 3.1 Nemotron 70B45.5%$0.003831.7s18%
19Arcee AI: Trinity Large (Preview)44.0%$0.000043.6s19%
20Llama 3.1 70B44.4%$0.001529.4s17%
21Claude 3 Haiku45.6%$0.002514.9s14%
22Claude Opus 4.558.8%$0.07053.4s27%
23Gemini 2.5 Flash Lite34.5%$0.00099.5s18%
24ByteDance Seed 1.6 Flash38.0%$0.001327.3s18%
25Ministral 8B37.6%$0.000410.4s15%
26GPT-4o, Aug. 6th (temp=1)44.8%$0.01824.4s17%
27Ministral 3 8B38.5%$0.000819.6s14%
28DeepSeek V3 (2025-03-24)42.0%$0.001439.4s14%
29GPT-4.139.4%$0.01844.7s21%
30Gemma 3 12B36.6%$0.000441.3s17%
31Mistral Large 240.1%$0.01329.4s16%
32Mistral Large 336.9%$0.003330.3s16%
33GPT-4o Mini (temp=1)38.9%$0.001234.8s13%
34Gemma 3 27B39.8%$0.000652.6s14%
35GPT-4.1 Nano34.2%$0.000713.3s12%
36Claude Opus 4.656.4%$0.0781.2m25%
37Cohere Command R+ (Aug. 2024)45.9%$0.02052.5s13%
38Mistral Large37.3%$0.01430.9s13%
39Gemini 2.5 Flash28.1%$0.005210.6s13%
40Gemma 3 4B29.4%$0.000220.0s11%
41Mistral NeMO28.6%$0.000510.1s10%
42Z.AI GLM 4.628.9%$0.006551.5s17%
43GPT-5.151.4%$0.0541.8m22%
44GPT-4o, May 13th (temp=1)30.5%$0.03314.4s15%
45Ministral 3B23.3%$0.00018.1s10%
46o4 Mini25.8%$0.01525.7s14%
47WizardLM 2 8x22b38.5%$0.00261.8m16%
48GPT-4.1 Mini24.8%$0.002719.0s10%
49Z.AI GLM 4.529.0%$0.005142.1s12%
50DeepSeek V3.233.7%$0.00141.9m17%
51Gemini 2.5 Pro29.5%$0.03636.2s16%
52Claude 3.7 Sonnet32.1%$0.04246.7s17%
53Qwen 3.5 Plus (2026-02-15)22.7%$0.006031.5s12%
54o4 Mini High28.1%$0.02547.2s14%
55MoonshotAI: Kimi K2.543.7%$0.0193.2m22%
56Grok 443.0%$0.0481.7m16%
57Ministral 3 3B18.7%$0.000511.1s5%
58Claude 3.5 Sonnet32.1%$0.04835.5s11%
59Gemini 3 Flash (Preview)17.4%$0.007819.6s7%
60DeepSeek V3.126.6%$0.00201.8m13%
61DeepSeek V3 (2024-12-26)22.0%$0.002154.6s7%
62DeepSeek-V2 Chat21.7%$0.002153.3s6%
63GPT-4o, Aug. 6th (temp=0)17.5%$0.02322.7s8%
64GPT-4o, May 13th (temp=0)18.6%$0.03514.1s8%
65Z.AI GLM 4.7 Flash16.8%$0.00171.2m8%
66GPT-5 Mini17.7%$0.010057.4s7%
67Arcee AI: Trinity Mini11.6%$0.00039.2s0%
68Qwen 3.5 397B A17B33.8%$0.0143.0m15%
69Z.AI GLM 4.716.6%$0.0101.4m8%
70Qwen 2.5 72B8.8%$0.001036.7s2%
71GPT-4o Mini (temp=0)9.5%$0.001234.8s0%
72Gemini 3 Pro (Preview)20.1%$0.05554.4s9%
73Claude Opus 457.5%$0.2091.4m28%
74Stealth: Aurora Alpha0.7%$0.00009.8s0%
75GPT-5.222.9%$0.0561.5m11%
76GPT-5 Nano13.5%$0.00421.4m2%
77GPT-529.2%$0.0652.8m13%
78ByteDance Seed 1.611.8%$0.0132.5m0%
79Gemini 3.1 Pro (Preview)23.9%$0.1071.8m6%
80Mistral Small 3.2 24B25.5%$0.00695.7m9%
36.46%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X510010096888092.7%
Rocinante 12B10010098837891.8%
Claude Sonnet 4.6967978776378.4%
Claude Sonnet 4.51008771676477.9%
Gemma 3 4B1007876423967.1%
Hermes 3 70B1009660562366.9%
Claude Opus 4.5838172652965.9%
Llama 3.1 Nemotron 70B857363413559.2%
Cohere Command R+ (Aug. 2024)998240363458.2%
Claude Opus 4757471422457.3%
Claude Opus 4.6767050464457.3%
WizardLM 2 8x22b976953392757.0%
Gemma 3 12B877353413156.9%
Z.AI GLM 51007867201355.5%
Llama 3.1 8B1001005618054.8%
Claude 3.5 Haiku1006737373154.5%
Claude Haiku 4.51006143363354.4%
Hermes 3 405B10010045141154.0%
MoonshotAI: Kimi K2.5746054482953.2%
Gemini 2.5 Pro636161451148.1%
Gemini 2.5 Flash704842413046.3%
Gemma 3 27B595655431846.2%
GPT-4o Mini (temp=1)904542381546.0%
Arcee AI: Trinity Large (Preview)685047392245.0%
Gemini 2.5 Flash Lite524943423744.7%
Ministral 8B796046231544.5%
Ministral 3 14B755043281843.1%
Claude Sonnet 4685449291142.1%
DeepSeek V3 (2025-03-24)94562626040.3%
GPT-5.1544938362540.2%
DeepSeek V3.2545243391240.1%
GPT-4o, Aug. 6th (temp=1)75582918036.2%
Mistral Small Creative534643171334.6%
ByteDance Seed 1.6 Flash79342923834.6%
Mistral Small 3.2 24B81352928034.6%
GPT-4.162533027034.4%
Ministral 3B6560370032.4%
Z.AI GLM 4.6494128232232.4%
Mistral Large 2393732291831.2%
Claude 3 Haiku732621211430.9%
Llama 3.1 70B10031107029.6%
Mistral Medium 3.1464231151329.2%
GPT-5.242413716528.2%
Minimax M2.573262516028.0%
Grok 4 Fast433131211328.0%
Gemini 3 Pro (Preview)49362720727.6%
Claude 3.7 Sonnet50402820027.6%
Gemini 3 Flash (Preview)38373624027.0%
Mistral Large 3463120191726.4%
GPT-4o, May 13th (temp=1)602616141225.7%
Mistral Large48312720125.4%
Grok 4.1 Fast48272617624.8%
Claude 3.5 Sonnet7023148724.1%
DeepSeek V3.133322520924.0%
DeepSeek-V2 Chat542218111022.9%
Ministral 3 8B6039122022.6%
Ministral 3 3B45411310021.8%
Qwen 3.5 397B A17B4031288021.4%
Z.AI GLM 4.728272314218.9%
o4 Mini High462597017.4%
DeepSeek V3 (2024-12-26)3825213017.4%
GPT-54018109716.9%
GPT-4.1 Mini4123180016.2%
Grok 43028106215.3%
GPT-4o, Aug. 6th (temp=0)2823174014.3%
Mistral NeMO322780013.5%
GPT-4.1 Nano2520162012.3%
Arcee AI: Trinity Mini431700012.0%
Z.AI GLM 4.7 Flash2515140010.9%
Qwen 2.5 72B241953210.7%
Gemini 3.1 Pro (Preview)2316113010.5%
Qwen 3.5 Plus (2026-02-15)21147309.0%
Z.AI GLM 4.523135409.0%
GPT-4o, May 13th (temp=0)24165008.8%
GPT-5 Mini2385007.2%
GPT-5 Nano1796006.3%
o4 Mini11106216.3%
GPT-4o Mini (temp=0)2731006.1%
ByteDance Seed 1.61400002.8%
Stealth: Aurora Alpha500001.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X5100100100100100100.0%
Gemma 3 27B1001001001008597.0%
Claude Sonnet 4.5100100100997695.1%
Claude Haiku 4.510010099947793.9%
GPT-4o Mini (temp=1)10010095957693.0%
Claude Opus 4100100100885789.1%
Rocinante 12B100100100856088.9%
Claude Sonnet 4.6999089847687.5%
Claude 3.5 Haiku10010092905487.2%
Claude Opus 4.61001001001002184.3%
GPT-4o, Aug. 6th (temp=1)10010095685784.1%
Claude 3 Haiku10010090852980.7%
Claude Sonnet 410010088595480.0%
Grok 41009690654679.5%
DeepSeek V3 (2025-03-24)10010090603777.4%
Llama 3.1 8B1001009977876.8%
Z.AI GLM 51008574625675.4%
WizardLM 2 8x22b958969635974.7%
Claude Opus 4.510010082444273.7%
Mistral Large 21008973564873.1%
Gemma 3 4B998181515072.4%
Z.AI GLM 4.5949481553171.0%
DeepSeek V3.2907266665870.6%
Mistral Small Creative998579523770.3%
Minimax M2.5927368665170.1%
Arcee AI: Trinity Large (Preview)1009177481666.4%
Ministral 3 14B966866504665.3%
GPT-5.1907171524165.2%
Mistral Large 310010048452663.8%
Mistral Large937169532762.7%
Ministral 8B827465603162.4%
Gemini 2.5 Flash Lite827460414159.6%
Cohere Command R+ (Aug. 2024)100827243059.5%
Mistral Medium 3.1876057484459.1%
Hermes 3 70B100926238158.5%
Gemma 3 12B716763593258.3%
Grok 4.1 Fast914947474455.6%
GPT-4.1865951383754.1%
Gemini 2.5 Pro676060493454.0%
Gemini 3 Pro (Preview)875751482453.5%
MoonshotAI: Kimi K2.5776159402853.0%
Llama 3.1 70B776560491353.0%
Claude 3.7 Sonnet676454473252.6%
DeepSeek V3.1928039312152.6%
Gemini 2.5 Flash1004843363351.8%
DeepSeek V3 (2024-12-26)1007247191851.2%
ByteDance Seed 1.6 Flash635552513551.1%
GPT-4.1 Nano725650393550.4%
GPT-4o, May 13th (temp=1)685953363249.6%
Grok 4 Fast71696529648.2%
Ministral 3 8B754848373448.2%
GPT-4.1 Mini665958322648.1%
Claude 3.5 Sonnet100484741147.4%
Mistral NeMO100593935347.2%
Hermes 3 405B93843311044.2%
Llama 3.1 Nemotron 70B100532917641.1%
DeepSeek-V2 Chat100641913439.8%
Z.AI GLM 4.6514836322939.2%
Z.AI GLM 4.7743232282838.7%
GPT-5554636331537.0%
GPT-5.2574336321736.9%
Arcee AI: Trinity Mini10041236334.5%
Ministral 3 3B694025161432.6%
Ministral 3B54383624932.0%
Qwen 3.5 Plus (2026-02-15)582725251329.7%
Gemini 3.1 Pro (Preview)423025201927.4%
Gemini 3 Flash (Preview)39322823926.3%
o4 Mini403423161625.8%
GPT-4o Mini (temp=0)373028181525.4%
Z.AI GLM 4.7 Flash333022201724.4%
o4 Mini High392925151123.7%
ByteDance Seed 1.6252323151219.3%
GPT-5 Mini30121010012.6%
GPT-4o, May 13th (temp=0)54520012.4%
GPT-4o, Aug. 6th (temp=0)351642011.4%
Qwen 2.5 72B36966011.2%
GPT-5 Nano2317122111.1%
Qwen 3.5 397B A17B161586610.1%
Mistral Small 3.2 24B1311003.1%
Stealth: Aurora Alpha000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X51001001001007895.7%
Llama 3.1 8B100100100834084.5%
Claude Opus 4100100100625082.5%
Rocinante 12B10010010052270.8%
Claude Opus 4.5836643433153.5%
Claude Haiku 4.51005346353453.4%
Claude Sonnet 41005647372553.0%
Mistral Large 2856747342852.3%
Ministral 3 8B1009129231952.3%
Claude Sonnet 4.5757261291750.9%
Hermes 3 405B88664539047.5%
Hermes 3 70B74694742046.4%
Mistral Medium 3.168615545045.8%
Minimax M2.571565045645.8%
Mistral Large 31006224191744.6%
Ministral 3 14B85584331544.2%
Llama 3.1 70B824841241742.4%
Claude Sonnet 4.674585512040.0%
Claude Opus 4.6574848242339.8%
Mistral Large75454227438.6%
Z.AI GLM 5544844271537.6%
Llama 3.1 Nemotron 70B69434131036.9%
Grok 4.1 Fast71484811035.8%
MoonshotAI: Kimi K2.5544128272635.0%
GPT-5975483032.4%
ByteDance Seed 1.6 Flash553130232332.3%
Qwen 3.5 397B A17B50453429031.6%
Ministral 8B46453320930.8%
Arcee AI: Trinity Large (Preview)886050030.6%
Claude 3.5 Haiku48433427030.4%
Gemma 3 12B61352920029.2%
DeepSeek V3 (2025-03-24)5549310027.1%
Gemma 3 27B493725121127.0%
GPT-4.16041270025.5%
GPT-4o, Aug. 6th (temp=1)42392717025.0%
WizardLM 2 8x22b4339313023.3%
Grok 4 Fast5331186522.8%
GPT-4o Mini (temp=1)4323237620.6%
Claude 3 Haiku7114130019.6%
Mistral Small Creative4437112019.0%
DeepSeek V3 (2024-12-26)6020100018.1%
GPT-5.14525143017.2%
Claude 3.5 Sonnet4816119016.8%
Gemma 3 4B25231414516.2%
Mistral NeMO3918160014.7%
GPT-4.1 Nano3022192014.5%
GPT-4o, May 13th (temp=1)531900014.3%
DeepSeek-V2 Chat3814117013.8%
Claude 3.7 Sonnet342600012.0%
Ministral 3B2518130011.1%
Gemini 2.5 Flash Lite311154010.3%
GPT-4o, May 13th (temp=0)311630010.2%
GPT-5 Nano51000010.2%
Cohere Command R+ (Aug. 2024)411000010.2%
GPT-4.1 Mini28145009.6%
Mistral Small 3.2 24B4150009.2%
DeepSeek V3.223106007.9%
Z.AI GLM 4.62396007.7%
Z.AI GLM 4.53600007.3%
Ministral 3 3B3410007.1%
ByteDance Seed 1.63130007.0%
Qwen 3.5 Plus (2026-02-15)17150006.4%
Gemini 2.5 Flash3100006.3%
o4 Mini15115006.1%
Grok 42600005.1%
o4 Mini High1294005.0%
Gemini 3 Pro (Preview)13110004.7%
GPT-5.22300004.5%
Qwen 2.5 72B1700003.4%
Arcee AI: Trinity Mini1160003.4%
GPT-4o, Aug. 6th (temp=0)1600003.1%
Gemini 3.1 Pro (Preview)1300002.6%
GPT-4o Mini (temp=0)920002.2%
Gemini 2.5 Pro1000002.1%
Z.AI GLM 4.7 Flash1000002.1%
GPT-5 Mini711001.6%
Z.AI GLM 4.7000000.0%
Gemini 3 Flash (Preview)000000.0%
Stealth: Aurora Alpha000000.0%
DeepSeek V3.1000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X5100100100100100100.0%
Claude Sonnet 4.510010094804583.7%
Claude Sonnet 4.61009877604676.2%
GPT-5.11009668675076.1%
Hermes 3 405B1009672585476.0%
Claude Sonnet 41007772625773.7%
Rocinante 12B10010088463173.0%
Grok 4.1 Fast958474535171.4%
Claude Opus 4.610010059514571.0%
Z.AI GLM 51007265645270.5%
Claude Opus 4938273613568.8%
MoonshotAI: Kimi K2.51007065633967.4%
Mistral Small Creative1007167573466.0%
Llama 3.1 8B10010060501665.0%
Mistral Large92756663760.7%
Ministral 3 14B95776354959.6%
Hermes 3 70B1001006530059.0%
Claude Opus 4.5909045392958.5%
Gemma 3 27B1007157442058.4%
Minimax M2.5818167312457.1%
Claude 3.5 Haiku926555471654.9%
Arcee AI: Trinity Large (Preview)876256422454.2%
DeepSeek V3.1854739363247.9%
Mistral Medium 3.1726959251347.6%
GPT-5846935331246.4%
Claude 3.7 Sonnet814140383146.2%
WizardLM 2 8x22b86784025046.0%
Claude 3.5 Sonnet100574519745.7%
Gemma 3 4B605344432545.0%
Mistral Large 279625330044.8%
Claude Haiku 4.564595148144.5%
Grok 4665936312843.9%
GPT-4.1 Nano67574943043.1%
DeepSeek V3 (2025-03-24)1004336211342.5%
Gemma 3 12B88504828042.5%
ByteDance Seed 1.6 Flash1004823212042.3%
Gemini 2.5 Flash Lite785143191741.6%
Qwen 3.5 397B A17B73675711041.6%
GPT-4o, Aug. 6th (temp=1)544238381437.1%
Mistral Small 3.2 24B8455440036.6%
Mistral Large 3843428201636.5%
Gemini 2.5 Flash45453734032.1%
Ministral 3 8B593228251131.1%
GPT-4o Mini (temp=1)57503712031.1%
Ministral 8B8742145029.8%
GPT-4o, May 13th (temp=1)69342215027.9%
Gemini 3 Pro (Preview)59402512127.3%
o4 Mini High7922188025.6%
DeepSeek V3.240383415125.6%
Cohere Command R+ (Aug. 2024)635344024.9%
Mistral NeMO423022191124.9%
GPT-4.14139380023.6%
Grok 4 Fast48332311123.2%
Claude 3 Haiku7711109322.0%
Arcee AI: Trinity Mini752390021.2%
Z.AI GLM 4.76128122020.7%
Z.AI GLM 4.65232115220.5%
Ministral 3B5226169020.4%
Gemini 3 Flash (Preview)37311813019.9%
Ministral 3 3B38241612819.5%
Gemini 2.5 Pro51231111219.4%
GPT-4o, May 13th (temp=0)4429194019.1%
o4 Mini3530217018.6%
Llama 3.1 70B3129250017.1%
Z.AI GLM 4.7 Flash3932103017.0%
Z.AI GLM 4.5472592016.5%
GPT-5.22826199016.3%
DeepSeek V3 (2024-12-26)4025100015.0%
Qwen 3.5 Plus (2026-02-15)2726201014.8%
GPT-4o, Aug. 6th (temp=0)3521100013.3%
Qwen 2.5 72B47840011.9%
Gemini 3.1 Pro (Preview)2317153111.7%
Llama 3.1 Nemotron 70B46500010.3%
GPT-5 Mini22127008.1%
ByteDance Seed 1.620180007.5%
DeepSeek-V2 Chat984004.1%
GPT-4.1 Mini900001.7%
GPT-4o Mini (temp=0)300000.6%
Stealth: Aurora Alpha000000.0%
GPT-5 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Rocinante 12B100100100786187.8%
Writer: Palmyra X51009682764379.4%
Llama 3.1 8B10010066605776.7%
Hermes 3 70B1009393772076.6%
Cohere Command R+ (Aug. 2024)1001008074070.8%
Claude 3 Haiku838171673868.1%
Mistral Medium 3.1868474513766.4%
Claude Opus 4987854474364.0%
Claude 3.5 Haiku1001006554063.8%
Claude Opus 4.5868660433562.1%
Claude Sonnet 41007261422059.0%
Claude Opus 4.6826248351949.4%
Grok 4.1 Fast886932292749.0%
Mistral NeMO75665145648.7%
Minimax M2.5605149373446.1%
Mistral Small Creative100584210142.3%
Z.AI GLM 5814332322142.0%
Ministral 3 8B81694510041.0%
Grok 4 Fast615339331940.9%
Mistral Small 3.2 24B100353430140.0%
Llama 3.1 70B100432817137.7%
Claude Sonnet 4.566483633036.6%
GPT-4o, Aug. 6th (temp=1)60514131036.6%
Mistral Large8176164035.5%
Qwen 3.5 397B A17B544336231333.8%
Claude Haiku 4.5574137191333.5%
Mistral Large 360523815033.0%
Ministral 3 14B71453118032.9%
Qwen 3.5 Plus (2026-02-15)8948260032.5%
GPT-4o, May 13th (temp=0)474629271432.4%
Ministral 8B63473119032.0%
Claude Sonnet 4.610035119031.0%
Gemma 3 27B55432320930.0%
Gemini 2.5 Flash Lite49402722929.5%
Hermes 3 405B54502813029.0%
Gemma 3 12B363431211627.7%
GPT-4.1 Nano48332925027.0%
o4 Mini712119121126.7%
MoonshotAI: Kimi K2.54643329026.1%
Arcee AI: Trinity Large (Preview)46413112025.9%
GPT-4o, May 13th (temp=1)423123171325.2%
Gemma 3 4B61282311024.5%
ByteDance Seed 1.6 Flash48292520024.4%
Llama 3.1 Nemotron 70B56251816624.1%
Ministral 3 3B932620024.1%
DeepSeek V3.24541330023.9%
Gemini 2.5 Flash6030160021.2%
WizardLM 2 8x22b40231918019.8%
GPT-5 Nano741184019.3%
Grok 434321212519.1%
DeepSeek V3 (2025-03-24)38271513018.3%
GPT-5.14233115018.2%
Mistral Large 235241614017.9%
Arcee AI: Trinity Mini4518176017.1%
Claude 3.7 Sonnet562061016.7%
Claude 3.5 Sonnet3731150016.6%
GPT-4o, Aug. 6th (temp=0)37171414016.3%
Z.AI GLM 4.6402030012.6%
Ministral 3B51730012.2%
ByteDance Seed 1.6303000012.0%
GPT-4.1281765011.2%
o4 Mini High4210008.6%
GPT-5 Mini27140008.1%
Gemini 3 Pro (Preview)15128107.2%
GPT-4o Mini (temp=1)2560006.2%
Z.AI GLM 4.519120006.1%
DeepSeek V3 (2024-12-26)16101005.4%
Gemini 3.1 Pro (Preview)2430005.3%
Gemini 2.5 Pro2020004.5%
GPT-51074004.4%
GPT-4.1 Mini955003.7%
Z.AI GLM 4.7 Flash1331003.5%
Gemini 3 Flash (Preview)1400002.7%
Z.AI GLM 4.7750002.6%
Qwen 2.5 72B533002.5%
DeepSeek-V2 Chat1100002.1%
DeepSeek V3.1630001.8%
GPT-4o Mini (temp=0)500000.9%
GPT-5.2300000.6%
Stealth: Aurora Alpha000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X51001001001008196.2%
Rocinante 12B100100100938295.0%
Z.AI GLM 5100100100844986.6%
Llama 3.1 8B100100100784885.2%
Hermes 3 405B10010093913684.1%
Claude Haiku 4.51009084694377.2%
Claude Opus 4.5928475646475.9%
Hermes 3 70B10010085741875.3%
Claude Sonnet 4.61009078564674.2%
Claude Sonnet 4.51009465574071.4%
Claude 3.5 Haiku1009071484370.6%
Claude Sonnet 4978264544769.0%
Mistral Medium 3.1918867483666.2%
Cohere Command R+ (Aug. 2024)1007573512364.6%
GPT-4o, Aug. 6th (temp=1)1008355424264.5%
Claude Opus 4.6827672533764.0%
Arcee AI: Trinity Large (Preview)858570572063.5%
WizardLM 2 8x22b866960504962.8%
Qwen 3.5 397B A17B1006755454261.9%
Grok 4.1 Fast747061594561.6%
GPT-4o Mini (temp=1)967053483860.9%
Mistral Large 3726765642859.2%
Gemma 3 27B886052444056.8%
GPT-4.1876561412756.2%
Claude 3 Haiku1001004529054.8%
Minimax M2.51007143302754.1%
GPT-5.1785951483454.1%
Gemini 2.5 Flash Lite706853502753.9%
GPT-4.1 Nano767368272653.7%
Mistral Large99866120053.4%
Claude Opus 4796850382652.3%
GPT-4o, May 13th (temp=1)755857462351.7%
DeepSeek V3.2695752413450.5%
Mistral Large 2905331312946.8%
MoonshotAI: Kimi K2.5625440383846.2%
Grok 4 Fast605651312645.0%
Mistral Small Creative816536281344.6%
DeepSeek V3 (2025-03-24)696743271644.4%
Z.AI GLM 4.6645235342642.3%
Gemini 2.5 Flash675343311541.9%
Grok 4675532312441.7%
Z.AI GLM 4.5604747282641.6%
Llama 3.1 70B604340382441.1%
Ministral 3B754831271940.2%
o4 Mini High605242281539.5%
GPT-5575044261638.6%
Ministral 3 14B912927252038.4%
GPT-4.1 Mini605232301738.1%
Ministral 3 3B9749319337.9%
Claude 3.5 Sonnet6865560037.7%
Gemini 2.5 Pro56554911234.5%
Gemma 3 4B753431161534.1%
Llama 3.1 Nemotron 70B574837141333.8%
Z.AI GLM 4.7 Flash575423191333.2%
Ministral 3 8B8166130032.2%
o4 Mini63413117731.7%
Claude 3.7 Sonnet68373214831.6%
DeepSeek V3.1474126231630.7%
Mistral Small 3.2 24B60452222029.7%
ByteDance Seed 1.6 Flash45413513427.5%
Gemma 3 12B44343423227.4%
Qwen 3.5 Plus (2026-02-15)43363213125.0%
Ministral 8B5337311024.6%
Gemini 3.1 Pro (Preview)333121131021.7%
DeepSeek-V2 Chat42252219021.6%
Mistral NeMO4340210020.9%
Gemini 3 Pro (Preview)34212018018.6%
GPT-4o, Aug. 6th (temp=0)5918150018.3%
Z.AI GLM 4.720171613013.2%
GPT-4o Mini (temp=0)2516150011.4%
DeepSeek V3 (2024-12-26)2199408.7%
Qwen 2.5 72B19166008.1%
GPT-5.23070007.4%
GPT-5 Nano12109406.9%
Gemini 3 Flash (Preview)1875306.4%
Arcee AI: Trinity Mini13101004.7%
ByteDance Seed 1.61751004.7%
GPT-5 Mini1820003.9%
GPT-4o, May 13th (temp=0)930002.4%
Stealth: Aurora Alpha000000.0%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Rocinante 12B100100100645082.7%
Llama 3.1 8B10010098623879.6%
Llama 3.1 Nemotron 70B10010091514978.2%
GPT-5.11008563575271.5%
Writer: Palmyra X51008863594470.9%
Llama 3.1 70B898278484368.0%
WizardLM 2 8x22b878253533862.5%
GPT-4o Mini (temp=1)936564503060.4%
GPT-4o, Aug. 6th (temp=1)937666481459.5%
Minimax M2.5917052483559.2%
Claude Sonnet 4.5796856524159.2%
Grok 4 Fast907752403759.1%
Qwen 3.5 397B A17B857869382559.0%
Gemini 3.1 Pro (Preview)787454503658.3%
GPT-4.1656359553956.2%
Claude Sonnet 4.6786449473855.1%
Hermes 3 405B737251502855.0%
ByteDance Seed 1.6 Flash100776529054.4%
DeepSeek V3 (2025-03-24)1005045433454.4%
Z.AI GLM 4.5786451413854.4%
Claude Sonnet 4816337343149.4%
DeepSeek V3.2646258441648.9%
GPT-4o, May 13th (temp=1)874638383248.1%
Gemini 2.5 Pro684943423447.1%
GPT-5 Mini564645444046.3%
Claude Opus 4.672714731946.1%
Gemini 3 Flash (Preview)796637242345.9%
Cohere Command R+ (Aug. 2024)70604843845.8%
Qwen 3.5 Plus (2026-02-15)665939313045.0%
Grok 4695043422145.0%
Z.AI GLM 5824746311544.1%
Grok 4.1 Fast644543402744.0%
GPT-5.2525146383043.2%
Claude 3.7 Sonnet64545444043.2%
GPT-5634539353242.8%
MoonshotAI: Kimi K2.5754336322642.5%
Hermes 3 70B534340383441.7%
Claude 3 Haiku66535330040.4%
Mistral Small 3.2 24B10052420038.8%
Mistral Medium 3.1504136362637.8%
DeepSeek-V2 Chat74572523737.1%
Claude Haiku 4.564623016936.1%
Ministral 3 14B685432161035.9%
GPT-4.1 Nano6863376035.0%
GPT-4.1 Mini70362928032.7%
Claude Opus 4693023231932.7%
Mistral Large 3413928262531.9%
GPT-4o, May 13th (temp=0)554228181331.2%
Gemma 3 12B46393931131.0%
Z.AI GLM 4.6643029201130.8%
DeepSeek V3 (2024-12-26)535227111130.7%
Gemini 2.5 Flash574519161530.5%
Claude Opus 4.554523011029.3%
ByteDance Seed 1.66360230029.0%
Gemini 3 Pro (Preview)62382914028.6%
Gemma 3 4B46432623328.2%
Mistral Large 24945318728.0%
o4 Mini51471716727.7%
Z.AI GLM 4.7423128261127.7%
GPT-4o Mini (temp=0)363229231226.8%
GPT-4o, Aug. 6th (temp=0)42383119426.7%
Claude 3.5 Sonnet7931184126.6%
Mistral Large363524231226.1%
Mistral NeMO423519161325.0%
DeepSeek V3.14342229824.9%
o4 Mini High323128221124.6%
Gemma 3 27B36332821424.5%
Claude 3.5 Haiku51282516023.8%
Mistral Small Creative46371610923.6%
Gemini 2.5 Flash Lite312519161320.9%
Ministral 8B4232240019.5%
Z.AI GLM 4.7 Flash32231514618.0%
Ministral 3B4334111017.9%
Qwen 2.5 72B35221611016.8%
Ministral 3 8B413040014.9%
Arcee AI: Trinity Large (Preview)372600012.7%
Ministral 3 3B1400002.8%
GPT-5 Nano553002.7%
Arcee AI: Trinity Mini1200002.3%
Stealth: Aurora Alpha000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X5100100100100100100.0%
GPT-5.1100100100988696.8%
Grok 4.1 Fast100100100986292.0%
Rocinante 12B100100100817591.3%
Claude Sonnet 4.51009886857488.6%
Claude Opus 4.5100100100795286.2%
Claude Sonnet 41009590766585.3%
Minimax M2.5100100100883083.6%
Claude Haiku 4.51009790795183.5%
GPT-4o, Aug. 6th (temp=1)10010097764383.3%
Claude Opus 4.61009675747183.3%
DeepSeek V3 (2025-03-24)1009087874982.7%
GPT-4o Mini (temp=1)928585767382.3%
Z.AI GLM 510010088675582.0%
Claude 3.5 Haiku100100100603478.8%
DeepSeek V3.2918980686578.8%
Mistral Small Creative979181793376.1%
Grok 41009089633776.0%
Llama 3.1 8B100100100472774.8%
Ministral 3 8B10010066594273.5%
Gemma 3 27B828174656573.4%
GPT-5.2837776666372.8%
Grok 4 Fast1009272702571.7%
Gemma 3 12B797871646170.4%
Claude 3.5 Sonnet1009360564070.0%
GPT-4.1 Nano10010076393469.6%
Gemini 2.5 Flash1007265574868.2%
Arcee AI: Trinity Large (Preview)1009758532967.5%
Z.AI GLM 4.5796964646267.4%
GPT-4.1979070483167.3%
Claude Sonnet 4.6846963595566.0%
Ministral 3 14B878555534965.8%
Cohere Command R+ (Aug. 2024)1007764542764.3%
Gemma 3 4B836557555463.0%
Ministral 8B908272462562.9%
Hermes 3 70B948757363461.7%
Ministral 3B100977731060.9%
Claude Opus 4787760542659.1%
Qwen 3.5 Plus (2026-02-15)736554494657.4%
Hermes 3 405B100855843057.3%
GPT-5716362543356.8%
ByteDance Seed 1.6 Flash1007556282256.2%
GPT-4o, May 13th (temp=1)847859332856.2%
MoonshotAI: Kimi K2.5836160502656.1%
GPT-5 Mini615955554955.9%
Llama 3.1 Nemotron 70B100755251055.6%
Gemini 2.5 Pro885551493154.8%
Claude 3 Haiku1009131281853.7%
Gemini 3.1 Pro (Preview)897240373053.6%
Z.AI GLM 4.7846554382653.5%
Claude 3.7 Sonnet805653413653.3%
DeepSeek-V2 Chat100746320652.6%
Mistral Medium 3.1656149473451.4%
Mistral NeMO95795032051.3%
Gemini 2.5 Flash Lite636053413851.2%
Mistral Large76585749849.4%
GPT-4.1 Mini595553413648.8%
Mistral Large 2625448413547.8%
DeepSeek V3.1525145424046.0%
Gemini 3 Pro (Preview)825350331345.9%
Llama 3.1 70B1004038341745.6%
Gemini 3 Flash (Preview)736444251945.0%
o4 Mini High727038261744.6%
Qwen 3.5 397B A17B575349313144.2%
o4 Mini575637292641.0%
Ministral 3 3B74713616239.8%
Z.AI GLM 4.7 Flash514743322439.5%
Mistral Large 365463836337.5%
GPT-4o Mini (temp=0)645636181337.5%
DeepSeek V3 (2024-12-26)10049316037.3%
Z.AI GLM 4.6584734241635.7%
WizardLM 2 8x22b694530151133.9%
GPT-4o, May 13th (temp=0)58482910930.9%
Arcee AI: Trinity Mini54423812830.8%
Mistral Small 3.2 24B4847446028.9%
GPT-5 Nano353531181426.5%
ByteDance Seed 1.642353118025.5%
GPT-4o, Aug. 6th (temp=0)6620129322.0%
Qwen 2.5 72B3910007.9%
Stealth: Aurora Alpha000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Llama 3.1 8B100100100975690.7%
Mistral Large979082683274.1%
Writer: Palmyra X51009657523768.3%
Llama 3.1 Nemotron 70B858054535164.4%
Mistral Large 21007370522764.3%
Grok 4 Fast878166493563.4%
Llama 3.1 70B10010063401263.2%
Claude 3.5 Haiku1007364382660.1%
Minimax M2.5886858551957.8%
Mistral Small Creative817150473657.2%
Ministral 3 14B726760511953.7%
Grok 4745752443853.0%
GPT-5.1656161502352.1%
Claude Opus 4726558461851.7%
Grok 4.1 Fast925340393451.6%
DeepSeek V3 (2025-03-24)716643423651.6%
Ministral 8B897641262150.6%
Mistral Medium 3.1856554341450.5%
Claude Sonnet 4.51006053221750.3%
Claude 3 Haiku1007033291148.5%
Rocinante 12B100774617148.3%
Claude Opus 4.583754535247.7%
Hermes 3 70B604847383545.5%
Hermes 3 405B1004034252544.6%
Ministral 3 8B1004234271543.5%
Gemini 3.1 Pro (Preview)86563830342.6%
MoonshotAI: Kimi K2.5767227181742.1%
Claude Opus 4.6604241392841.9%
Claude Haiku 4.578503524839.1%
GPT-5 Mini604641271938.6%
Qwen 3.5 397B A17B48474743838.5%
Z.AI GLM 5575149211438.3%
o4 Mini47474633535.6%
ByteDance Seed 1.6 Flash534237271935.5%
Arcee AI: Trinity Large (Preview)10040245334.4%
Claude Sonnet 4.6692827212033.3%
WizardLM 2 8x22b10040149032.7%
Claude Sonnet 449444029032.4%
Gemma 3 12B53503515832.3%
GPT-4.151433716931.4%
DeepSeek V3 (2024-12-26)54383620029.8%
Mistral NeMO582520181727.7%
Z.AI GLM 4.5353525211827.0%
Ministral 3B54342119726.7%
GPT-5.2482724201526.6%
GPT-4o Mini (temp=1)342727232126.1%
GPT-4o, Aug. 6th (temp=1)433626121125.6%
Mistral Large 36141251025.6%
o4 Mini High56262511524.4%
Cohere Command R+ (Aug. 2024)6437117023.8%
GPT-4o, May 13th (temp=0)46252116021.4%
Mistral Small 3.2 24B464133018.6%
Qwen 3.5 Plus (2026-02-15)3720147115.7%
Gemini 2.5 Pro29221612015.6%
GPT-52525195415.6%
Qwen 2.5 72B2721189015.1%
DeepSeek-V2 Chat342585014.5%
GPT-4.1 Mini3324160014.4%
Gemini 2.5 Flash Lite2716147413.7%
Z.AI GLM 4.62423174013.5%
Z.AI GLM 4.7 Flash29141312013.4%
GPT-4o, Aug. 6th (temp=0)3220140013.3%
Ministral 3 3B382232013.0%
GPT-4o, May 13th (temp=1)282590012.4%
Claude 3.5 Sonnet47550011.6%
Gemma 3 4B232366011.4%
Claude 3.7 Sonnet2916120011.2%
DeepSeek V3.230143009.3%
GPT-4.1 Nano2496508.8%
Gemini 2.5 Flash23143208.6%
GPT-5 Nano14108337.5%
GPT-4o Mini (temp=0)17140006.1%
Gemini 3 Pro (Preview)1670004.6%
DeepSeek V3.11093004.5%
Gemma 3 27B1432003.7%
Z.AI GLM 4.71700003.5%
Gemini 3 Flash (Preview)210000.7%
Arcee AI: Trinity Mini100000.2%
ByteDance Seed 1.6000000.0%
Stealth: Aurora Alpha000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.510010096957092.4%
Grok 4 Fast1009898887892.3%
Writer: Palmyra X51001001001005891.5%
Grok 410010074695780.1%
Ministral 3 14B10010084634077.4%
Grok 4.1 Fast1007571625272.0%
Qwen 3.5 397B A17B1008368663169.9%
GPT-5.1967058585467.3%
Claude Opus 4917566653867.0%
Claude Sonnet 4.61008371473166.5%
Minimax M2.51006560554765.4%
MoonshotAI: Kimi K2.5866563555264.3%
Claude 3.5 Sonnet978556493364.0%
Llama 3.1 8B10010010016463.9%
Rocinante 12B100968825863.6%
Hermes 3 70B10010076201662.3%
Llama 3.1 Nemotron 70B968243412156.7%
Claude Opus 4.51005744423154.9%
GPT-5716057533454.8%
Mistral NeMO925949403354.5%
Gemma 3 12B96706129953.3%
GPT-4.1 Nano835651462351.8%
Mistral Large 294756022851.7%
Claude Sonnet 4736954431751.4%
Mistral Large 3925348451350.2%
Claude Opus 4.686624743849.3%
Mistral Large64625753849.0%
GPT-4.1767255201948.3%
Gemini 3 Flash (Preview)806944271847.6%
Mistral Small Creative71575553047.1%
Mistral Medium 3.1774747343147.0%
Hermes 3 405B1005527242345.6%
Gemini 3.1 Pro (Preview)100754012045.3%
DeepSeek V3 (2025-03-24)79706214045.1%
Claude 3.7 Sonnet815049221142.7%
GPT-4o, Aug. 6th (temp=1)565537343042.5%
Z.AI GLM 5554745323242.3%
DeepSeek V3.280524037041.8%
Arcee AI: Trinity Large (Preview)624636313041.1%
Claude Haiku 4.5605452221440.2%
GPT-5.2625438281840.0%
Ministral 8B634837341539.4%
Gemma 3 27B635832241638.7%
Claude 3.5 Haiku100352825037.5%
Mistral Small 3.2 24B7169434037.4%
Ministral 3 8B7769380036.7%
GPT-4o Mini (temp=1)69494618136.7%
WizardLM 2 8x22b654838161135.4%
Z.AI GLM 4.6593429272635.1%
Claude 3 Haiku71464016034.6%
ByteDance Seed 1.687492011033.5%
DeepSeek-V2 Chat75433311032.6%
ByteDance Seed 1.6 Flash57383821832.3%
Gemini 2.5 Pro673123201431.0%
Ministral 3B53432927030.4%
Gemini 2.5 Flash Lite55482420229.8%
o4 Mini High393932231529.4%
Gemini 2.5 Flash64322619529.1%
o4 Mini40363528628.8%
GPT-5 Mini453523201427.5%
Qwen 3.5 Plus (2026-02-15)47352718726.7%
Ministral 3 3B48392712025.2%
Z.AI GLM 4.752392312025.1%
Llama 3.1 70B5130287524.2%
GPT-4.1 Mini36353111022.6%
Cohere Command R+ (Aug. 2024)57241810021.9%
GPT-4o, Aug. 6th (temp=0)4938145421.8%
DeepSeek V3.15435134021.1%
Z.AI GLM 4.556211211019.9%
GPT-4o, May 13th (temp=1)31212014117.2%
GPT-4o, May 13th (temp=0)4325160017.0%
Gemini 3 Pro (Preview)2625139515.5%
Gemma 3 4B3022197015.5%
DeepSeek V3 (2024-12-26)4611102013.7%
Qwen 2.5 72B3117137013.6%
Z.AI GLM 4.7 Flash22171311513.5%
GPT-5 Nano23109809.8%
GPT-4o Mini (temp=0)21140007.0%
Arcee AI: Trinity Mini3400006.7%
Stealth: Aurora Alpha000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Rocinante 12B100100100876991.3%
Llama 3.1 8B10010010093980.4%
Claude 3.5 Haiku10010075511969.0%
Ministral 3 14B887765523864.0%
Llama 3.1 70B1008049471357.9%
Grok 4877942403857.3%
Mistral Large 31007245363156.8%
Writer: Palmyra X5100876032055.9%
GPT-4.1716560493355.5%
Grok 4.1 Fast816050503655.3%
Claude Sonnet 4.5857268391054.9%
Grok 4 Fast756952413454.4%
Claude Opus 4.5806153423253.8%
Hermes 3 405B1005248451852.7%
Claude Opus 4.6904949432851.9%
GPT-5.1695349423850.3%
Claude Opus 4925943271948.1%
ByteDance Seed 1.6 Flash635756422047.5%
DeepSeek V3 (2025-03-24)100683434047.3%
Gemini 3.1 Pro (Preview)90813327046.2%
Qwen 3.5 397B A17B685956301044.5%
Claude Haiku 4.5776037341444.4%
Claude 3 Haiku645856222044.0%
Z.AI GLM 5775244311544.0%
Cohere Command R+ (Aug. 2024)664339373343.6%
MoonshotAI: Kimi K2.560565136040.5%
Mistral Small Creative74454238039.7%
Ministral 3 8B69564327039.2%
Minimax M2.590423427038.7%
Ministral 8B66595110437.8%
Gemini 2.5 Pro584634242437.2%
GPT-5 Mini504031302735.7%
GPT-4o Mini (temp=1)583633301935.5%
Claude Sonnet 463552824034.3%
Mistral Medium 3.1434139272034.0%
Arcee AI: Trinity Large (Preview)56463021731.9%
DeepSeek-V2 Chat78272420530.8%
DeepSeek V3 (2024-12-26)513030261330.1%
GPT-4o, May 13th (temp=0)612928201029.8%
Mistral Large5047466029.6%
Gemini 2.5 Flash Lite56393811029.0%
Hermes 3 70B72332711028.8%
Mistral Large 269362611128.7%
Mistral Small 3.2 24B494121181428.3%
Z.AI GLM 4.548322624928.0%
Claude 3.5 Sonnet313028252427.6%
o4 Mini4947279327.1%
GPT-4.1 Nano7535190026.0%
Llama 3.1 Nemotron 70B512919171125.5%
DeepSeek V3.16128258024.4%
o4 Mini High5434270023.0%
Ministral 3B4038289022.9%
GPT-4.1 Mini45361716122.8%
GPT-5.2352919161322.4%
DeepSeek V3.24740174322.2%
Gemma 3 27B48312111022.0%
GPT-4o, May 13th (temp=1)5636170021.8%
Ministral 3 3B3733228019.9%
GPT-4o, Aug. 6th (temp=1)3430259019.6%
GPT-5362218111019.3%
Mistral NeMO5520190018.8%
Gemini 2.5 Flash4523157318.5%
Z.AI GLM 4.63128257018.3%
Claude 3.7 Sonnet3827201017.0%
WizardLM 2 8x22b581282016.1%
ByteDance Seed 1.63718139315.9%
GPT-4o, Aug. 6th (temp=0)382760014.3%
GPT-4o Mini (temp=0)432000012.7%
Stealth: Aurora Alpha60000012.0%
Gemma 3 4B311863011.8%
Gemini 3 Flash (Preview)2815100010.5%
Qwen 2.5 72B311380010.5%
Z.AI GLM 4.73173008.3%
Claude Sonnet 4.63154008.1%
Gemma 3 12B23125008.0%
Gemini 3 Pro (Preview)14109607.7%
Z.AI GLM 4.7 Flash1087004.8%
Arcee AI: Trinity Mini1900003.8%
GPT-5 Nano1032003.0%
Qwen 3.5 Plus (2026-02-15)1010002.2%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Llama 3.1 8B100100100100100100.0%
Writer: Palmyra X51001001001006492.6%
Claude Sonnet 4.51009369676679.1%
GPT-5.11008781626278.4%
Claude 3 Haiku958870696777.9%
Hermes 3 70B10010090631774.2%
Claude 3.5 Haiku1008568585773.6%
GPT-4o Mini (temp=1)1007971665273.5%
Z.AI GLM 4.5857979665772.9%
Z.AI GLM 51009163624572.2%
Claude Opus 4.51009068633972.1%
Gemini 2.5 Flash Lite1008276663571.9%
Llama 3.1 70B908277565171.2%
Claude Sonnet 4959381434070.7%
MoonshotAI: Kimi K2.5877970635170.1%
Claude Haiku 4.51007162605268.9%
GPT-4.1 Mini1008756524467.9%
Minimax M2.51007765514667.7%
Rocinante 12B1006257524863.6%
DeepSeek V3 (2025-03-24)996766463462.4%
GPT-4.1996357514162.1%
Claude Opus 41006955433660.6%
Hermes 3 405B100885347859.2%
Ministral 3 8B887448454058.9%
GPT-4.1 Nano998373261158.4%
o4 Mini High797748444157.8%
Qwen 3.5 397B A17B846855493357.7%
GPT-4o, Aug. 6th (temp=1)928554401857.7%
Llama 3.1 Nemotron 70B786258483956.9%
Gemini 3.1 Pro (Preview)856051464056.4%
Grok 4 Fast817857431755.2%
Claude 3.5 Sonnet916553362353.6%
Grok 4855949363452.5%
GPT-4o, Aug. 6th (temp=0)927741321651.7%
Ministral 8B1007058161151.0%
o4 Mini696551363150.6%
Gemma 3 12B755348403650.3%
Grok 4.1 Fast655348413849.1%
Claude Sonnet 4.6965240302748.9%
GPT-5.2615945433648.8%
Claude 3.7 Sonnet706849332148.2%
Mistral Medium 3.1635743413247.1%
Claude Opus 4.6614646432544.0%
Ministral 3 3B90504522742.9%
Z.AI GLM 4.6715437311641.8%
Gemini 2.5 Pro514848362641.8%
Z.AI GLM 4.7 Flash645645311241.7%
Mistral NeMO846238121241.6%
Qwen 3.5 Plus (2026-02-15)535146382041.6%
GPT-4o, May 13th (temp=1)654940341239.9%
Ministral 3 14B804236291139.7%
Mistral Large 3614946291239.5%
Gemini 2.5 Flash71633619839.2%
Mistral Small Creative64484334639.0%
Gemma 3 27B66473935838.8%
DeepSeek V3.2594436322038.1%
Cohere Command R+ (Aug. 2024)773332281938.0%
DeepSeek V3.1545436351037.8%
Arcee AI: Trinity Large (Preview)55504537037.5%
DeepSeek-V2 Chat763434231235.8%
ByteDance Seed 1.6 Flash543635272535.4%
Mistral Large51433727332.3%
WizardLM 2 8x22b1005340031.4%
GPT-556333025930.6%
GPT-4o, May 13th (temp=0)7054253030.5%
Mistral Large 243423431030.0%
DeepSeek V3 (2024-12-26)96191717029.8%
GPT-5 Mini53392616928.4%
Ministral 3B68201817225.0%
Mistral Small 3.2 24B7221190022.5%
Gemma 3 4B48291814021.8%
Gemini 3 Flash (Preview)332419171321.3%
Arcee AI: Trinity Mini4031209521.1%
GPT-4o Mini (temp=0)252314131117.2%
Gemini 3 Pro (Preview)3224184015.5%
Qwen 2.5 72B4916110015.4%
ByteDance Seed 1.64512106014.5%
GPT-5 Nano2315128011.7%
Z.AI GLM 4.72814122011.2%
Stealth: Aurora Alpha000000.0%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Hermes 3 70B10010094814283.4%
Rocinante 12B1009373606077.2%
Writer: Palmyra X51008378654374.0%
Llama 3.1 8B10010074562370.6%
Claude Sonnet 4.61007977462364.9%
Llama 3.1 Nemotron 70B10010073271362.4%
Claude Opus 4.6766766464159.2%
Hermes 3 405B988263231656.1%
WizardLM 2 8x22b806257552155.2%
Mistral Small Creative947048332955.0%
GPT-5.1725549434152.3%
Claude 3 Haiku736443403551.2%
Claude Sonnet 4.5694946454250.2%
Gemma 3 12B1005143391650.0%
Gemma 3 4B636053472048.6%
Llama 3.1 70B1004341372148.4%
Claude Haiku 4.5785743271644.3%
Z.AI GLM 5855036252544.1%
Claude Opus 4.5825534262143.5%
Claude Opus 467644234041.6%
ByteDance Seed 1.6 Flash675843251240.8%
Claude Sonnet 4724239331840.7%
Z.AI GLM 4.5665036271739.4%
Grok 4.1 Fast755325212039.0%
Minimax M2.510048395339.0%
Gemini 2.5 Flash58515026437.7%
DeepSeek V3.1635137211537.4%
Ministral 3 14B604336252036.9%
Claude 3.7 Sonnet705034201036.7%
GPT-4o, Aug. 6th (temp=1)656221211035.8%
DeepSeek V3 (2025-03-24)7171145332.8%
Cohere Command R+ (Aug. 2024)10041210032.5%
Gemini 2.5 Pro7040322029.0%
Gemini 2.5 Flash Lite46443121228.9%
GPT-4o, Aug. 6th (temp=0)7434257328.5%
DeepSeek V3.2373327252028.3%
Grok 451332723728.2%
GPT-4o, May 13th (temp=1)393326251527.3%
GPT-4o Mini (temp=1)474521131027.0%
GPT-4.159361919026.6%
Ministral 3 8B8134160026.3%
Mistral Medium 3.1462928161226.1%
Z.AI GLM 4.6452523201726.1%
MoonshotAI: Kimi K2.543372916025.0%
Mistral Large 248382513024.6%
o4 Mini40382321024.4%
Qwen 3.5 397B A17B43312712724.1%
o4 Mini High64221715324.1%
GPT-4o, May 13th (temp=0)41352615023.3%
Grok 4 Fast4827259622.9%
GPT-5.245212019522.2%
Arcee AI: Trinity Large (Preview)4229209821.4%
Gemini 3 Flash (Preview)5129129320.7%
Claude 3.5 Sonnet3838260020.4%
Claude 3.5 Haiku37312010019.7%
Mistral NeMO5030110018.1%
GPT-4.1 Mini452897218.1%
Mistral Large 34523165218.1%
Gemini 3 Pro (Preview)3426147417.1%
GPT-531231714017.0%
Mistral Large3827190016.6%
Gemma 3 27B562130016.1%
GPT-4.1 Nano23171616715.7%
Ministral 8B3623160014.9%
Z.AI GLM 4.72923120012.9%
GPT-5 Mini371870012.5%
DeepSeek V3 (2024-12-26)2714138012.3%
Qwen 2.5 72B24159009.5%
Ministral 3B27181009.0%
Mistral Small 3.2 24B24136008.7%
Qwen 3.5 Plus (2026-02-15)4210008.6%
Z.AI GLM 4.7 Flash2440005.5%
DeepSeek-V2 Chat960003.0%
Gemini 3.1 Pro (Preview)1021002.7%
GPT-4o Mini (temp=0)650002.2%
Ministral 3 3B320001.0%
ByteDance Seed 1.6000000.0%
Stealth: Aurora Alpha000000.0%
GPT-5 Nano000000.0%
Arcee AI: Trinity Mini000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Rocinante 12B1001001001009799.4%
Writer: Palmyra X51001001001009198.2%
Claude Sonnet 4.5100100100998697.0%
Cohere Command R+ (Aug. 2024)10010095918594.2%
Claude Opus 4.5999291848489.9%
GPT-4o, Aug. 6th (temp=1)1009489877889.4%
Claude 3 Haiku10010097875587.8%
Gemma 3 27B999189866586.2%
Grok 4.1 Fast10010089845786.2%
Llama 3.1 8B100100100795186.0%
Claude Sonnet 41009083836484.0%
Hermes 3 70B100100100694783.2%
Claude Opus 4.610010074726482.0%
Minimax M2.5969381685779.2%
Hermes 3 405B1009771595676.6%
Claude Opus 410010092573176.0%
Ministral 8B100998167369.9%
Z.AI GLM 5888266634468.6%
GPT-5.11008165543967.7%
Claude 3.5 Haiku969279472367.3%
Claude Haiku 4.51009857423466.1%
Claude Sonnet 4.692717061059.1%
Mistral Large 2837065472257.3%
Llama 3.1 Nemotron 70B797058453156.6%
ByteDance Seed 1.6 Flash736563552756.5%
Grok 4886447443856.2%
Arcee AI: Trinity Large (Preview)756052474656.0%
DeepSeek V3 (2025-03-24)896354403055.1%
Ministral 3 14B84706641753.5%
Mistral Small Creative876646353153.0%
WizardLM 2 8x22b817644342952.9%
GPT-4.1 Nano615752513451.2%
Grok 4 Fast696648413151.0%
Z.AI GLM 4.6846448312350.1%
Gemma 3 12B655349423849.5%
Mistral Large 3796449232347.6%
GPT-4.1504949484047.4%
Ministral 3 8B785248401245.9%
Mistral Large96514133745.6%
Gemini 2.5 Flash Lite565145363344.2%
DeepSeek V3.2544541403843.7%
GPT-4.1 Mini774639381743.6%
Mistral Medium 3.1595251431143.4%
GPT-4o, May 13th (temp=1)515042363643.0%
Claude 3.7 Sonnet81633724842.7%
GPT-4o Mini (temp=1)706031232140.9%
Qwen 3.5 Plus (2026-02-15)524036322937.8%
Gemini 2.5 Pro78443818937.5%
DeepSeek V3 (2024-12-26)10053233035.7%
Claude 3.5 Sonnet60454524034.6%
DeepSeek-V2 Chat8560280034.5%
GPT-58440357634.3%
Llama 3.1 70B605724161133.5%
Z.AI GLM 4.7514629201832.7%
Ministral 3B605124111031.2%
Z.AI GLM 4.7 Flash5942368530.3%
Arcee AI: Trinity Mini5551358029.8%
Gemini 3 Pro (Preview)463128281128.8%
DeepSeek V3.1502926241328.5%
GPT-5.240402919626.8%
Mistral NeMO352726232126.4%
Gemma 3 4B45382118024.5%
o4 Mini High5243178024.1%
Mistral Small 3.2 24B6733210024.1%
Gemini 2.5 Flash52272120024.1%
MoonshotAI: Kimi K2.54330158720.7%
Z.AI GLM 4.53530260018.1%
Ministral 3 3B611680016.9%
o4 Mini3924147016.7%
GPT-5 Mini3328130014.8%
Gemini 3 Flash (Preview)302855414.4%
Qwen 3.5 397B A17B2020107011.5%
GPT-4o, Aug. 6th (temp=0)28220009.9%
Qwen 2.5 72B4061009.5%
Gemini 3.1 Pro (Preview)15132005.9%
GPT-5 Nano2170005.6%
GPT-4o, May 13th (temp=0)1250003.3%
GPT-4o Mini (temp=0)753003.0%
ByteDance Seed 1.6300000.5%
Stealth: Aurora Alpha000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Llama 3.1 8B10010072514072.6%
Writer: Palmyra X5998266561864.2%
Llama 3.1 Nemotron 70B10010074251162.0%
Ministral 3 14B1008358362460.3%
GPT-5 Nano1001001000060.1%
Mistral Large 2987460511359.2%
Llama 3.1 70B836362454259.0%
Grok 4.1 Fast100925940058.2%
Z.AI GLM 51005856352154.0%
Mistral Small Creative856454381250.6%
Claude Opus 4.5565447412143.8%
Mistral NeMO100100160043.1%
Rocinante 12B100100110042.2%
Cohere Command R+ (Aug. 2024)8568560041.9%
Hermes 3 405B8068490039.2%
o4 Mini High7975354038.4%
Claude Sonnet 4.679582424638.2%
Claude Sonnet 4.595482423038.0%
Ministral 3 8B94462418437.5%
Mistral Medium 3.158474137337.1%
Arcee AI: Trinity Large (Preview)10067160036.5%
Claude Sonnet 46655535036.0%
Grok 4 Fast574629232135.1%
Mistral Large 3524340251434.5%
Hermes 3 70B89352320033.2%
Claude 3.5 Haiku756500028.1%
Claude Opus 4353325221125.1%
Claude Haiku 4.5524996023.1%
Claude 3.7 Sonnet5325213020.2%
Minimax M2.57015141020.0%
Claude Opus 4.640241916119.8%
Mistral Small 3.2 24B801900019.8%
MoonshotAI: Kimi K2.55120164318.7%
GPT-4o, May 13th (temp=1)731610018.1%
Z.AI GLM 4.65323121018.0%
DeepSeek V3 (2024-12-26)4816133015.9%
GPT-4o, Aug. 6th (temp=1)2926193015.2%
Gemma 3 27B551450014.8%
o4 Mini31161610014.4%
GPT-5363300013.9%
GPT-4.12322220013.1%
Z.AI GLM 4.548520010.9%
Mistral Large281933010.6%
DeepSeek V3 (2025-03-24)351700010.5%
GPT-4o, May 13th (temp=0)411100010.3%
DeepSeek-V2 Chat25200009.0%
Ministral 8B35100008.9%
ByteDance Seed 1.6 Flash25160008.1%
DeepSeek V3.13900007.7%
Grok 43520007.3%
Ministral 3B16146007.2%
GPT-4o Mini (temp=1)20120006.4%
Gemini 2.5 Pro2084006.3%
Gemini 3 Pro (Preview)20110006.3%
Qwen 2.5 72B3000006.0%
Qwen 3.5 397B A17B1465005.1%
Claude 3 Haiku2330005.0%
Gemma 3 4B1480004.3%
Z.AI GLM 4.71900003.8%
Gemma 3 12B1620003.7%
WizardLM 2 8x22b870003.1%
Z.AI GLM 4.7 Flash1600003.1%
Claude 3.5 Sonnet1600003.1%
GPT-4.1 Mini1600003.1%
GPT-5.11131003.1%
DeepSeek V3.2653003.0%
Gemini 3 Flash (Preview)1400002.9%
GPT-4.1 Nano540001.9%
ByteDance Seed 1.6900001.8%
Ministral 3 3B800001.6%
Qwen 3.5 Plus (2026-02-15)800001.5%
Gemini 2.5 Flash Lite310000.8%
GPT-5 Mini300000.5%
Gemini 3.1 Pro (Preview)000000.0%
GPT-5.2000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Gemini 2.5 Flash000000.0%
GPT-4o Mini (temp=0)000000.0%
Arcee AI: Trinity Mini000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X51001001001003987.8%
Rocinante 12B100100100952283.2%
Cohere Command R+ (Aug. 2024)100100100641074.9%
Z.AI GLM 51008478463869.2%
Claude Sonnet 4.61009070363666.5%
GPT-5.1878167494365.3%
Grok 410010078371065.0%
Claude Opus 4.6948151513462.3%
Claude Opus 4827956541156.4%
Arcee AI: Trinity Large (Preview)856560561556.3%
Mistral Small Creative756357463855.9%
Claude Haiku 4.592716825552.1%
MoonshotAI: Kimi K2.5757455221047.0%
Claude Opus 4.5766636342046.5%
GPT-5946726221945.6%
Claude 3.5 Sonnet100554823045.0%
Claude Sonnet 483564035042.8%
Grok 4.1 Fast71635425042.5%
Hermes 3 405B9071428042.3%
GPT-4.1625143312242.1%
o4 Mini100463417239.9%
WizardLM 2 8x22b655631242339.8%
Mistral Large 272483434839.4%
Grok 4 Fast80413733038.3%
DeepSeek V3.172514616037.1%
Ministral 3 14B545134232336.8%
Mistral Medium 3.164564124036.8%
o4 Mini High57514723536.7%
Minimax M2.5514539262136.3%
ByteDance Seed 1.6 Flash100452115036.0%
GPT-4o, May 13th (temp=0)54544327035.7%
DeepSeek V3.2624627231133.9%
Z.AI GLM 4.647424026933.0%
Claude Sonnet 4.55150495231.5%
Hermes 3 70B7443363031.2%
Gemini 2.5 Pro353431292430.6%
Gemini 2.5 Flash Lite61423314030.2%
DeepSeek V3 (2025-03-24)423931271130.0%
Llama 3.1 70B1004121028.7%
Ministral 3B6548270028.0%
Claude 3.7 Sonnet43402921126.9%
Claude 3.5 Haiku7534230026.4%
Qwen 3.5 Plus (2026-02-15)53272723026.0%
GPT-4o, Aug. 6th (temp=1)4742401025.9%
Ministral 3 8B65291817025.8%
Mistral Large823980025.7%
Ministral 8B6050170025.4%
ByteDance Seed 1.653401713024.6%
Gemma 3 27B5534205524.0%
Mistral Small 3.2 24B713150021.6%
Gemini 2.5 Flash43242319021.5%
Qwen 3.5 397B A17B5326250020.8%
Claude 3 Haiku3331280018.4%
Gemini 3 Pro (Preview)4231160017.7%
Gemma 3 4B531987017.5%
Llama 3.1 Nemotron 70B2827265017.1%
DeepSeek V3 (2024-12-26)552300015.5%
Gemini 3 Flash (Preview)32231111015.4%
Z.AI GLM 4.7 Flash551332014.4%
Gemma 3 12B3222160013.9%
GPT-4o Mini (temp=1)282675313.9%
GPT-4o, Aug. 6th (temp=0)58730013.7%
GPT-4o, May 13th (temp=1)2517167213.1%
Llama 3.1 8B2821104012.7%
GPT-5.22315129011.9%
GPT-4.1 Nano232291010.8%
Mistral NeMO28104308.9%
Mistral Large 328142008.8%
Z.AI GLM 4.717117006.9%
GPT-4o Mini (temp=0)3400006.7%
DeepSeek-V2 Chat2580006.6%
Gemini 3.1 Pro (Preview)1398006.1%
GPT-5 Mini1251003.6%
Arcee AI: Trinity Mini1700003.4%
GPT-4.1 Mini1420003.1%
Ministral 3 3B1300002.6%
GPT-5 Nano920002.2%
Z.AI GLM 4.5300000.6%
Qwen 2.5 72B100000.2%
Stealth: Aurora Alpha000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Rocinante 12B100989669072.6%
Mistral Small Creative1007265623166.2%
Claude Opus 4975846403655.4%
Ministral 8B834747423951.8%
Ministral 3 14B75635946750.2%
Hermes 3 405B1005633282548.4%
Z.AI GLM 5965441251746.8%
Claude Opus 4.6714542383345.8%
Arcee AI: Trinity Large (Preview)78685710744.0%
Claude 3.5 Haiku10060512042.5%
Writer: Palmyra X5825940151241.4%
Minimax M2.5554739362540.5%
GPT-5 Nano1008350037.5%
Claude Opus 4.5584931242336.9%
Llama 3.1 70B56453834836.1%
Llama 3.1 Nemotron 70B66454117835.6%
ByteDance Seed 1.6 Flash594136281435.4%
Ministral 3 8B76383119032.9%
Llama 3.1 8B6969125031.2%
Claude 3 Haiku45453523530.7%
o4 Mini High50433219028.9%
Mistral Large 37744170027.8%
Hermes 3 70B71312112027.2%
Grok 4 Fast66312414027.2%
DeepSeek V3 (2024-12-26)6329276025.1%
Z.AI GLM 4.664241911023.6%
Grok 4.1 Fast3535328823.5%
MoonshotAI: Kimi K2.54939168022.3%
GPT-4.1 Nano332317161220.2%
GPT-4o, Aug. 6th (temp=1)3730267020.1%
o4 Mini4328270019.8%
Claude Sonnet 4.55628140019.7%
Gemma 3 27B31242220219.5%
WizardLM 2 8x22b791160019.3%
Claude Haiku 4.55023156018.8%
GPT-4o Mini (temp=1)46171610218.3%
DeepSeek V3 (2025-03-24)35252010018.0%
GPT-5454320018.0%
DeepSeek V3.1661750017.6%
Qwen 3.5 397B A17B26232117017.4%
DeepSeek V3.23030240016.7%
GPT-4o, May 13th (temp=1)27221717016.5%
GPT-5.1421498716.0%
Mistral Small 3.2 24B501890015.5%
Gemma 3 12B3223165015.4%
Claude Sonnet 42827113214.2%
Mistral Large2925160013.9%
Mistral Medium 3.12622210013.7%
Cohere Command R+ (Aug. 2024)3419160013.6%
Gemini 2.5 Flash Lite3118113012.7%
Gemini 2.5 Pro391860012.6%
Grok 42114137512.0%
GPT-4.1 Mini302610011.3%
Mistral Large 257000011.3%
GPT-4o, Aug. 6th (temp=0)391050010.9%
GPT-4.12216142010.7%
Mistral NeMO421110010.7%
Gemini 3 Pro (Preview)411000010.1%
Claude Sonnet 4.6171513008.9%
Claude 3.5 Sonnet161311007.9%
Z.AI GLM 4.526110007.3%
Gemini 2.5 Flash3400006.7%
Claude 3.7 Sonnet16140005.9%
Z.AI GLM 4.72900005.7%
Gemma 3 4B1690004.9%
GPT-4o, May 13th (temp=0)1070003.5%
ByteDance Seed 1.61330003.1%
DeepSeek-V2 Chat1500002.9%
GPT-4o Mini (temp=0)1300002.5%
Ministral 3 3B1200002.3%
GPT-5.21100002.2%
Arcee AI: Trinity Mini900001.8%
Qwen 2.5 72B400000.8%
Ministral 3B200000.5%
Z.AI GLM 4.7 Flash100000.2%
Gemini 3.1 Pro (Preview)000000.0%
GPT-5 Mini000000.0%
Gemini 3 Flash (Preview)000000.0%
Stealth: Aurora Alpha000000.0%
Qwen 3.5 Plus (2026-02-15)000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X5100100100816188.3%
Z.AI GLM 51009996905487.8%
Hermes 3 70B10010093727187.3%
Rocinante 12B10010071585576.7%
Claude Haiku 4.510010075602571.9%
Llama 3.1 8B10010065623171.8%
Grok 4.1 Fast1008378574071.6%
Claude Sonnet 4.5897575574969.1%
Arcee AI: Trinity Large (Preview)878759535367.7%
GPT-4.1 Nano917368464564.4%
Claude Opus 4.61007973392964.0%
Claude Opus 4.5887372444163.7%
Minimax M2.5736958544860.6%
Claude Sonnet 4.6857955471957.1%
MoonshotAI: Kimi K2.5948955281656.4%
Claude 3 Haiku838267191052.3%
Hermes 3 405B706649472351.0%
Ministral 3 14B876847312050.4%
Gemini 2.5 Flash Lite786142382348.2%
GPT-4o, Aug. 6th (temp=1)695640383647.9%
Mistral Small Creative825639392047.3%
Claude Opus 4635441393646.7%
Cohere Command R+ (Aug. 2024)716542281544.4%
GPT-4.1564544373643.7%
Claude 3.7 Sonnet594947392142.9%
Llama 3.1 70B904835231341.8%
Mistral Small 3.2 24B85653029041.7%
Llama 3.1 Nemotron 70B655037302841.7%
GPT-4o, May 13th (temp=1)67634135041.2%
GPT-4.1 Mini74574821240.5%
Z.AI GLM 4.6504240373240.4%
Mistral Medium 3.1665131282339.9%
Gemma 3 27B625638241539.0%
Gemma 3 12B614944211738.6%
Grok 4 Fast684238222238.5%
Claude 3.5 Haiku73514710837.7%
Grok 4634642201336.9%
Claude Sonnet 4100551410035.8%
DeepSeek V3.157494415734.6%
ByteDance Seed 1.6 Flash60523418834.4%
Gemini 3.1 Pro (Preview)74473213534.3%
GPT-5.1483938261833.7%
o4 Mini High503826211429.7%
Ministral 3 8B96261412029.6%
Qwen 3.5 Plus (2026-02-15)59432319028.8%
DeepSeek-V2 Chat43403416327.2%
Z.AI GLM 4.7 Flash433027191326.5%
WizardLM 2 8x22b46312524626.5%
Gemini 2.5 Pro512922141225.6%
Claude 3.5 Sonnet4342367025.5%
GPT-4o, Aug. 6th (temp=0)39372220825.1%
Ministral 3 3B43352412824.6%
Gemini 3 Pro (Preview)4634309224.3%
DeepSeek V3.2412623191224.2%
Z.AI GLM 4.5422725161124.0%
o4 Mini51331615523.8%
GPT-5 Nano1001400022.9%
Mistral Large7017168122.5%
GPT-4o Mini (temp=1)402219191122.3%
Gemini 2.5 Flash35282719121.9%
Mistral Large 336292511821.8%
Ministral 8B5831101019.9%
Gemma 3 4B262514141418.7%
Mistral NeMO33281715018.6%
DeepSeek V3 (2025-03-24)3431172016.9%
Arcee AI: Trinity Mini28231614016.2%
Qwen 3.5 397B A17B353550014.9%
Z.AI GLM 4.72815118513.4%
Mistral Large 2311698012.6%
GPT-5 Mini371654012.5%
GPT-4o, May 13th (temp=0)2820120012.0%
Ministral 3B2721112012.0%
ByteDance Seed 1.64180009.9%
Gemini 3 Flash (Preview)12104305.9%
Qwen 2.5 72B15111005.1%
DeepSeek V3 (2024-12-26)1800003.6%
GPT-4o Mini (temp=0)1100002.2%
GPT-51100002.2%
GPT-5.2900001.8%
Stealth: Aurora Alpha000000.0%