Subordinate conjunction sentence starts

Test: Bad Writing Habits

Avg. Score
32.0%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1ByteDance Seed 1.6 Flash45.7%$0.001327.3s11%
2Gemini 2.5 Flash Lite43.8%$0.00099.5s11%
3Writer: Palmyra X544.4%$0.01122.0s9%
4Gemma 3 4B42.9%$0.000220.0s9%
5Gemma 3 12B37.9%$0.000441.3s10%
6Rocinante 12B48.7%$0.001438.4s6%
7Z.AI GLM 541.3%$0.00841.2m10%
8Qwen 3.5 Plus (2026-02-15)38.5%$0.006031.5s9%
9Mistral Small Creative36.9%$0.00079.1s7%
10Qwen 3.5 397B A17B54.2%$0.0143.0m10%
11Gemma 3 27B39.2%$0.000652.6s8%
12o4 Mini29.5%$0.01525.7s9%
13Claude Sonnet 445.8%$0.03243.7s7%
14Claude Opus 4.634.7%$0.0781.2m14%
15GPT-5 Nano38.3%$0.00421.4m8%
16Z.AI GLM 4.736.5%$0.0101.4m9%
17GPT-4o, Aug. 6th (temp=1)45.8%$0.01824.4s4%
18Gemini 3 Pro (Preview)40.3%$0.05554.4s9%
19GPT-5 Mini33.0%$0.010057.4s8%
20GPT-5.135.4%$0.0541.8m12%
21Cohere Command R+ (Aug. 2024)47.1%$0.02052.5s3%
22Hermes 3 70B47.9%$0.00101.2m3%
23Mistral NeMO38.7%$0.000510.1s0%
24GPT-4o Mini (temp=1)40.9%$0.001234.8s0%
25GPT-4.1 Nano36.0%$0.000713.3s0%
26Llama 3.1 Nemotron 70B37.6%$0.003831.7s0%
27Ministral 3 14B33.2%$0.000711.7s0%
28Llama 3.1 8B41.4%$0.00031.3m0%
29o4 Mini High26.5%$0.02547.2s5%
30GPT-534.6%$0.0652.8m10%
31Claude Haiku 4.535.3%$0.01121.6s0%
32Arcee AI: Trinity Mini30.9%$0.00039.2s0%
33Claude 3 Haiku31.4%$0.002514.9s0%
34Gemini 2.5 Flash31.2%$0.005210.6s0%
35Grok 4 Fast31.9%$0.001724.1s0%
36Mistral Large35.1%$0.01430.9s0%
37GPT-5.224.8%$0.0561.5m8%
38Z.AI GLM 4.7 Flash36.3%$0.00171.2m0%
39Claude Sonnet 4.637.6%$0.03139.3s0%
40Ministral 3 8B27.6%$0.000819.6s0%
41Z.AI GLM 4.531.8%$0.005142.1s0%
42GPT-4o, May 13th (temp=1)33.2%$0.03314.4s0%
43Z.AI GLM 4.632.0%$0.006551.5s0%
44Mistral Large 328.0%$0.003330.3s0%
45Ministral 8B23.9%$0.000410.4s0%
46GPT-4.1 Mini25.4%$0.002719.0s0%
47Mistral Medium 3.128.3%$0.004836.5s0%
48GPT-4o Mini (temp=0)27.0%$0.001234.8s0%
49Arcee AI: Trinity Large (Preview)27.9%$0.000043.6s0%
50Hermes 3 405B29.9%$0.003253.2s0%
51Gemini 3 Flash (Preview)25.2%$0.007819.6s0%
52Claude 3.7 Sonnet36.6%$0.04246.7s0%
53Mistral Large 226.5%$0.01329.4s0%
54Llama 3.1 70B23.0%$0.001529.4s0%
55DeepSeek-V2 Chat26.5%$0.002153.3s0%
56DeepSeek V3.234.6%$0.00141.9m0%
57Minimax M2.529.0%$0.00341.3m0%
58Ministral 3B18.4%$0.00018.1s0%
59DeepSeek V3 (2024-12-26)25.4%$0.002154.6s0%
60Qwen 2.5 72B22.0%$0.001036.7s0%
61Gemini 2.5 Pro29.5%$0.03636.2s0%
62DeepSeek V3.131.7%$0.00201.8m0%
63GPT-4.125.9%$0.01844.7s0%
64GPT-4o, Aug. 6th (temp=0)23.4%$0.02322.7s0%
65GPT-4o, May 13th (temp=0)23.3%$0.03514.1s0%
66Claude Sonnet 4.526.0%$0.03538.1s0%
67ByteDance Seed 1.636.2%$0.0132.5m0%
68DeepSeek V3 (2025-03-24)17.4%$0.001439.4s0%
69Ministral 3 3B13.0%$0.000511.1s0%
70WizardLM 2 8x22b27.8%$0.00261.8m0%
71Claude Opus 4.535.1%$0.07053.4s0%
72Stealth: Aurora Alpha12.5%$0.00009.8s0%
73Claude 3.5 Haiku12.2%$0.003510.8s0%
74Grok 4.1 Fast14.6%$0.001837.8s0%
75Claude Opus 439.1%$0.2091.4m8%
76MoonshotAI: Kimi K2.537.5%$0.0193.2m0%
77Grok 428.4%$0.0481.7m0%
78Claude 3.5 Sonnet16.6%$0.04835.5s0%
79Gemini 3.1 Pro (Preview)30.7%$0.1071.8m0%
80Mistral Small 3.2 24B5.9%$0.00695.7m0%
32.00%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemma 3 27B1001001001006492.8%
Cohere Command R+ (Aug. 2024)100100100100080.0%
Gemini 2.5 Flash Lite10010070646078.8%
Hermes 3 70B10010010071074.3%
Gemma 3 4B10010010071074.3%
DeepSeek V3.110010010062072.3%
Gemini 2.5 Pro1001008870071.6%
Gemma 3 12B1001007971070.2%
GPT-4o Mini (temp=0)1001007965068.9%
Mistral Large 2100887263064.5%
Gemini 3 Pro (Preview)1008552513163.6%
GPT-4o, Aug. 6th (temp=1)1001001000060.0%
Rocinante 12B1001001000060.0%
Llama 3.1 8B100100960059.2%
Z.AI GLM 4.7100625342051.3%
Claude Sonnet 4.610085710051.2%
Writer: Palmyra X5100100490049.7%
Ministral 3 8B10070670047.4%
DeepSeek V3.210083490046.4%
Z.AI GLM 510060600044.0%
Arcee AI: Trinity Large (Preview)10056480040.9%
Ministral 3 14B10010000040.0%
GPT-4.150494947038.8%
MoonshotAI: Kimi K2.51008800037.5%
Grok 41008300036.7%
ByteDance Seed 1.6 Flash10050280035.4%
GPT-4o, May 13th (temp=0)937500033.4%
Z.AI GLM 4.51006500033.0%
Claude Opus 47064300032.8%
Mistral Large9838270032.5%
Grok 4 Fast1006100032.2%
Ministral 8B6752420032.0%
Claude 3 Haiku897000031.9%
WizardLM 2 8x22b7242400030.9%
Gemini 2.5 Flash786800029.2%
Claude Sonnet 4.51004200028.5%
GPT-4o, May 13th (temp=1)717000028.4%
Mistral Small Creative716700027.6%
Mistral NeMO963700026.6%
GPT-5 Mini1003000026.0%
Claude Opus 4.6585500022.6%
GPT-56324200021.5%
ByteDance Seed 1.6100000020.0%
Z.AI GLM 4.7 Flash100000020.0%
DeepSeek V3 (2025-03-24)100000020.0%
DeepSeek V3 (2024-12-26)100000020.0%
Mistral Large 3100000020.0%
DeepSeek-V2 Chat100000020.0%
Claude Haiku 4.5100000020.0%
Hermes 3 405B100000020.0%
Llama 3.1 70B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Arcee AI: Trinity Mini100000020.0%
GPT-4.1 Nano100000020.0%
Ministral 3B100000020.0%
GPT-4o, Aug. 6th (temp=0)88000017.5%
Claude Sonnet 483000016.7%
Qwen 2.5 72B75000014.9%
GPT-4.1 Mini69000013.9%
Claude Opus 4.565000013.0%
Qwen 3.5 Plus (2026-02-15)59000011.8%
Gemini 3 Flash (Preview)51000010.1%
Qwen 3.5 397B A17B28220009.9%
Mistral Medium 3.14800009.6%
o4 Mini4700009.4%
Gemini 3.1 Pro (Preview)4700009.3%
GPT-5.124220009.1%
GPT-5 Nano24210009.1%
Claude 3.7 Sonnet4000007.9%
o4 Mini High3100006.3%
GPT-5.23000006.0%
Minimax M2.5000000.0%
Grok 4.1 Fast000000.0%
Z.AI GLM 4.6000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Sonnet000000.0%
Claude 3.5 Haiku000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Small 3.2 24B000000.0%
Ministral 3 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 2.5 Flash Lite10010095747188.0%
Claude Sonnet 4.610010010061072.2%
GPT-5 Mini10010075362968.1%
ByteDance Seed 1.6 Flash1009745423764.2%
Mistral Medium 3.11006853514763.8%
Rocinante 12B1001006552063.3%
GPT-4o, Aug. 6th (temp=1)1001006947063.3%
GPT-5737069643862.7%
GPT-4o, May 13th (temp=1)1001001000060.0%
Gemma 3 27B1001005246059.5%
Minimax M2.5100945147058.4%
Mistral Large 31001004943058.3%
Writer: Palmyra X51001004441057.1%
Claude Sonnet 4.51001004341056.8%
Cohere Command R+ (Aug. 2024)100100680053.5%
WizardLM 2 8x22b1006836352653.0%
Gemma 3 4B100615746052.9%
GPT-5.11005145412752.7%
GPT-4.1 Nano100100570051.5%
Claude Haiku 4.593784638051.2%
Mistral Large10083710051.0%
Grok 4100100520050.4%
Gemini 3 Flash (Preview)100100450049.0%
Gemini 3 Pro (Preview)10079650048.9%
Claude Opus 4.6100823428048.9%
Claude Sonnet 4100100430048.7%
Claude 3.7 Sonnet83724543048.5%
Z.AI GLM 4.510071610046.5%
GPT-4.1 Mini10070590045.8%
Ministral 8B10064630045.5%
Z.AI GLM 510074500044.7%
Qwen 3.5 Plus (2026-02-15)100563227043.1%
GPT-4o Mini (temp=1)10058520042.0%
Mistral NeMO8662610041.8%
Mistral Large 27467560039.3%
Hermes 3 70B1009400038.9%
GPT-4.19052510038.5%
GPT-5.28358500038.1%
Gemma 3 12B8951380035.7%
DeepSeek V3.210034320033.3%
MoonshotAI: Kimi K2.51006200032.3%
GPT-5 Nano52383834032.3%
Ministral 3 8B1005500031.0%
Gemini 2.5 Pro1004900029.8%
Grok 4.1 Fast1004300028.7%
Mistral Small Creative5441360026.4%
Gemini 2.5 Flash834900026.3%
DeepSeek-V2 Chat884400026.3%
Claude Opus 4.5595400022.5%
o4 Mini733400021.4%
Claude Opus 44439190020.4%
ByteDance Seed 1.6100000020.0%
Z.AI GLM 4.7 Flash100000020.0%
DeepSeek V3 (2024-12-26)100000020.0%
Llama 3.1 70B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Arcee AI: Trinity Mini100000020.0%
Ministral 3 3B100000020.0%
Claude 3 Haiku89000017.9%
o4 Mini High2929270017.0%
Ministral 3 14B76000015.2%
Claude 3.5 Sonnet67000013.3%
DeepSeek V3.166000013.2%
GPT-4o Mini (temp=0)64000012.8%
GPT-4o, May 13th (temp=0)60000012.0%
Z.AI GLM 4.657000011.5%
Arcee AI: Trinity Large (Preview)55000011.0%
Hermes 3 405B54000010.8%
Qwen 2.5 72B51000010.2%
Llama 3.1 8B4300008.7%
Gemini 3.1 Pro (Preview)4100008.3%
Z.AI GLM 4.73800007.6%
GPT-4o, Aug. 6th (temp=0)700001.4%
Qwen 3.5 397B A17B000000.0%
Stealth: Aurora Alpha000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Grok 4 Fast000000.0%
Claude 3.5 Haiku000000.0%
Mistral Small 3.2 24B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Llama 3.1 8B1001008282072.8%
GPT-4.1 Nano1001008377072.1%
Claude Haiku 4.510010010057071.4%
Claude Opus 410010010056071.2%
Qwen 3.5 397B A17B10010086362068.6%
Mistral Medium 3.11001008043064.6%
ByteDance Seed 1.6 Flash976959443661.1%
Mistral Large1001001000060.0%
Writer: Palmyra X510096740054.0%
MoonshotAI: Kimi K2.510085820053.3%
GPT-4o, Aug. 6th (temp=1)9691790053.3%
Rocinante 12B10094670052.2%
Gemma 3 4B100100540050.9%
o4 Mini High100863531050.5%
Claude Sonnet 410064580044.4%
GPT-5.181534336042.7%
Qwen 2.5 72B7471650042.0%
Llama 3.1 70B10010000040.0%
Z.AI GLM 4.68257530038.4%
DeepSeek V3.21008800037.7%
Z.AI GLM 58355500037.6%
Claude Opus 4.51008800037.5%
Mistral Large 310046410037.4%
Gemini 3 Pro (Preview)1008600037.2%
Arcee AI: Trinity Large (Preview)1008300036.7%
Gemini 2.5 Pro7567420036.7%
GPT-5 Nano9743380035.7%
Grok 41007800035.6%
GPT-4o Mini (temp=1)1006900033.9%
Ministral 3 14B1006400032.8%
Ministral 3B887500032.5%
Qwen 3.5 Plus (2026-02-15)5655500032.1%
GPT-4o, May 13th (temp=1)887100031.8%
DeepSeek V3 (2024-12-26)867100031.5%
GPT-55856430031.3%
Ministral 8B1005600031.2%
Grok 4 Fast1005500031.0%
Claude Sonnet 4.6827100030.7%
Mistral Small Creative1004200028.4%
Claude Opus 4.6634600021.8%
Claude 3.7 Sonnet525100020.6%
ByteDance Seed 1.6100000020.0%
GPT-4.1100000020.0%
Claude 3.5 Haiku100000020.0%
Mistral Large 2100000020.0%
Hermes 3 405B100000020.0%
Gemma 3 12B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Claude 3 Haiku100000020.0%
WizardLM 2 8x22b484200017.9%
GPT-4o, Aug. 6th (temp=0)82000016.4%
Hermes 3 70B79000015.9%
o4 Mini423400015.3%
GPT-4o, May 13th (temp=0)71000014.3%
Arcee AI: Trinity Mini66000013.2%
GPT-5 Mini343100013.0%
Grok 4.1 Fast64000012.8%
Gemini 2.5 Flash Lite62000012.3%
GPT-5.2292900011.5%
GPT-4.1 Mini54000010.9%
Claude Sonnet 4.551000010.2%
Gemini 3.1 Pro (Preview)4900009.8%
Stealth: Aurora Alpha4900009.8%
Mistral Small 3.2 24B4520009.4%
DeepSeek V3.14600009.2%
Gemma 3 27B4200008.5%
Z.AI GLM 4.73400006.8%
Ministral 3 8B3300006.6%
Minimax M2.5000000.0%
Gemini 3 Flash (Preview)000000.0%
Z.AI GLM 4.7 Flash000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Claude 3.5 Sonnet000000.0%
DeepSeek-V2 Chat000000.0%
Z.AI GLM 4.5000000.0%
Gemini 2.5 Flash000000.0%
GPT-4o Mini (temp=0)000000.0%
Ministral 3 3B000000.0%
Cohere Command R+ (Aug. 2024)000000.0%
Mistral NeMO000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-4o Mini (temp=1)10010096937592.7%
Mistral NeMO10010085824782.7%
Claude Haiku 4.510010068524573.0%
Z.AI GLM 51001001000060.0%
ByteDance Seed 1.61001001000060.0%
Writer: Palmyra X510088840054.5%
Claude Sonnet 4100100690053.9%
Gemma 3 27B100100650053.0%
Gemma 3 12B75686356052.2%
Z.AI GLM 4.510082780052.0%
GPT-4o Mini (temp=0)10082660049.6%
Gemini 3 Pro (Preview)100100410048.3%
GPT-51004341282547.3%
Grok 410063560043.8%
Claude 3.7 Sonnet10056500041.1%
Claude Opus 4.510010000040.0%
Z.AI GLM 4.610010000040.0%
GPT-4o, Aug. 6th (temp=1)10010000040.0%
Hermes 3 70B10010000040.0%
Llama 3.1 8B10010000040.0%
GPT-4o, May 13th (temp=0)1009800039.6%
Gemma 3 4B7169560039.4%
o4 Mini High8575360039.2%
Ministral 3 14B1009400038.9%
Mistral Large7470500038.8%
Cohere Command R+ (Aug. 2024)1009300038.5%
GPT-4.1 Nano1008100036.1%
Gemini 2.5 Flash Lite6156560034.5%
Mistral Large 27460360034.1%
DeepSeek V3.11006900033.9%
GPT-4o, May 13th (temp=1)937100032.8%
Qwen 3.5 397B A17B97272315032.3%
Claude Sonnet 4.6827800032.0%
Rocinante 12B985800031.2%
Gemini 3 Flash (Preview)1005500031.0%
Qwen 3.5 Plus (2026-02-15)1004500028.9%
Mistral Medium 3.15949350028.5%
o4 Mini1003400026.8%
GPT-5.137323130026.0%
Grok 4 Fast685900025.3%
Mistral Small Creative724200023.0%
Z.AI GLM 4.7 Flash605000022.0%
DeepSeek-V2 Chat614900022.0%
GPT-5 Nano6622220022.0%
Z.AI GLM 4.7100000020.0%
DeepSeek V3 (2025-03-24)100000020.0%
Claude 3.5 Sonnet100000020.0%
Hermes 3 405B100000020.0%
Llama 3.1 70B100000020.0%
Arcee AI: Trinity Mini100000020.0%
Ministral 3B100000020.0%
Grok 4.1 Fast593900019.6%
Mistral Small 3.2 24B87640019.4%
Claude Sonnet 4.596000019.2%
Arcee AI: Trinity Large (Preview)91000018.2%
Mistral Large 388000017.7%
GPT-5 Mini3228270017.5%
GPT-4.1 Mini86000017.2%
Claude Opus 4483700016.9%
DeepSeek V3 (2024-12-26)81000016.1%
Minimax M2.571000014.3%
MoonshotAI: Kimi K2.568000013.7%
GPT-4o, Aug. 6th (temp=0)68000013.7%
Claude Opus 4.665000013.0%
Gemini 2.5 Pro63000012.5%
ByteDance Seed 1.6 Flash57000011.4%
Ministral 3 8B55000011.0%
GPT-5.2272600010.7%
DeepSeek V3.250000010.0%
GPT-4.14900009.7%
Gemini 3.1 Pro (Preview)000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Haiku000000.0%
Gemini 2.5 Flash000000.0%
Llama 3.1 Nemotron 70B000000.0%
Qwen 2.5 72B000000.0%
Claude 3 Haiku000000.0%
Ministral 3 3B000000.0%
Ministral 8B000000.0%
WizardLM 2 8x22b000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Cohere Command R+ (Aug. 2024)10010098856990.4%
DeepSeek V3.21001009795078.5%
Gemma 3 4B10010010085076.9%
Writer: Palmyra X510010010068073.5%
Qwen 3.5 397B A17B1009669632670.9%
GPT-4o Mini (temp=1)100888181069.8%
Arcee AI: Trinity Mini100947776069.4%
Z.AI GLM 4.6100797571065.1%
Claude Sonnet 41001001000060.0%
Hermes 3 70B1001001000060.0%
ByteDance Seed 1.6100100910058.2%
Gemini 2.5 Flash Lite100100680053.7%
Claude 3 Haiku9378710048.4%
GPT-4o Mini (temp=0)8581700047.2%
Grok 410075520045.3%
GPT-4o, May 13th (temp=1)8875570044.0%
Gemini 2.5 Flash10070440042.9%
Z.AI GLM 4.7 Flash10059540042.6%
Gemma 3 12B10056550042.2%
Mistral Large 26968680041.1%
Claude Haiku 4.57566640040.9%
Z.AI GLM 510060420040.4%
Claude Sonnet 4.510010000040.0%
Ministral 8B10010000040.0%
Llama 3.1 Nemotron 70B1009300038.5%
Qwen 3.5 Plus (2026-02-15)10048390037.3%
GPT-5 Mini10043420036.9%
Mistral Large908900035.9%
Claude Opus 4.56755530034.8%
GPT-4.1 Mini987500034.5%
Gemma 3 27B1007000034.1%
Claude Opus 47056370032.8%
Z.AI GLM 4.5887400032.2%
Mistral Small Creative1005900031.8%
Gemini 3.1 Pro (Preview)896400030.7%
Claude Sonnet 4.6836800030.4%
Claude Opus 4.65451460030.2%
ByteDance Seed 1.6 Flash5746460029.8%
DeepSeek V3 (2024-12-26)816800029.6%
Minimax M2.5746800028.5%
Gemini 3 Pro (Preview)1004200028.4%
Rocinante 12B696400026.7%
o4 Mini794300024.4%
Mistral Medium 3.1515000020.0%
MoonshotAI: Kimi K2.5100000020.0%
GPT-5 Nano100000020.0%
Claude 3.5 Sonnet100000020.0%
Mistral Large 3100000020.0%
GPT-4o, May 13th (temp=0)100000020.0%
GPT-4o, Aug. 6th (temp=1)100000020.0%
Llama 3.1 70B100000020.0%
Ministral 3 14B100000020.0%
Qwen 2.5 72B100000020.0%
Mistral Small 3.2 24B100000020.0%
Arcee AI: Trinity Large (Preview)100000020.0%
Mistral NeMO100000020.0%
Z.AI GLM 4.791000018.2%
Stealth: Aurora Alpha77000015.4%
GPT-4.1 Nano77000015.4%
Llama 3.1 8B76000015.2%
DeepSeek V3.175000014.9%
Ministral 3B71000014.2%
GPT-5.2362900013.0%
DeepSeek-V2 Chat63000012.7%
GPT-5302900011.8%
Gemini 2.5 Pro54000010.9%
Claude 3.7 Sonnet54000010.8%
Gemini 3 Flash (Preview)4600009.3%
Grok 4 Fast4500009.0%
o4 Mini High3300006.5%
WizardLM 2 8x22b2700005.3%
GPT-5.1000000.0%
Grok 4.1 Fast000000.0%
GPT-4.1000000.0%
DeepSeek V3 (2025-03-24)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Claude 3.5 Haiku000000.0%
Hermes 3 405B000000.0%
Ministral 3 8B000000.0%
Ministral 3 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Cohere Command R+ (Aug. 2024)10010010091078.2%
Rocinante 12B10010010079075.7%
Gemma 3 4B10010010059071.8%
Claude 3 Haiku100987772069.5%
GPT-4o, May 13th (temp=1)1001006152062.6%
GPT-4o Mini (temp=0)100995750061.2%
Llama 3.1 Nemotron 70B1001001000060.0%
Mistral NeMO100100960059.2%
Qwen 3.5 397B A17B1009555202057.9%
Gemma 3 27B99775343054.3%
GPT-4o, May 13th (temp=0)97595855053.8%
Gemma 3 12B10097690053.3%
Qwen 3.5 Plus (2026-02-15)100634846051.3%
WizardLM 2 8x22b854847332948.4%
Z.AI GLM 4.774666438048.2%
Writer: Palmyra X510090420046.4%
Arcee AI: Trinity Large (Preview)10069600045.8%
Gemini 2.5 Flash Lite10060600044.1%
DeepSeek-V2 Chat10010000040.0%
GPT-4o Mini (temp=1)10010000040.0%
Llama 3.1 8B10010000040.0%
GPT-5706121201838.3%
Minimax M2.57560530037.5%
GPT-4o, Aug. 6th (temp=1)1007500034.9%
Claude 3.5 Sonnet1007400034.7%
GPT-4.1 Nano1006700033.3%
Claude Sonnet 4.61006600033.2%
Claude 3.7 Sonnet8540400033.0%
Mistral Large 21006500033.0%
DeepSeek V3.1947000032.9%
Z.AI GLM 4.7 Flash857200031.3%
Gemini 3.1 Pro (Preview)45413832031.3%
Claude Opus 4.67448340031.2%
Claude Sonnet 45655400030.1%
GPT-5 Nano61402920030.0%
o4 Mini High8534300029.9%
Qwen 2.5 72B6346410029.8%
Gemini 2.5 Pro6447370029.6%
Mistral Medium 3.11004200028.4%
Mistral Small Creative1003900027.8%
Ministral 8B716700027.6%
Z.AI GLM 4.6874700026.8%
GPT-4.14544420026.3%
Gemini 2.5 Flash795000025.9%
GPT-5 Mini43292822024.4%
Stealth: Aurora Alpha5240290024.2%
Claude Opus 45531310023.5%
DeepSeek V3 (2024-12-26)575400022.4%
GPT-5.144232120021.8%
MoonshotAI: Kimi K2.5100000020.0%
ByteDance Seed 1.6100000020.0%
DeepSeek V3 (2025-03-24)100000020.0%
DeepSeek V3.2100000020.0%
Hermes 3 70B100000020.0%
Arcee AI: Trinity Mini100000020.0%
Llama 3.1 70B98000019.6%
Grok 497000019.4%
Ministral 3 8B494400018.6%
Claude Opus 4.592000018.3%
Gemini 3 Flash (Preview)474100017.5%
Z.AI GLM 5454300017.5%
Claude Sonnet 4.5473700016.9%
Hermes 3 405B83000016.7%
Claude Haiku 4.5404000016.1%
Grok 4 Fast473200015.9%
o4 Mini2926210015.3%
Gemini 3 Pro (Preview)423000014.4%
Z.AI GLM 4.567000013.3%
Mistral Large63000012.7%
GPT-4.1 Mini63000012.5%
Ministral 3 14B54000010.9%
Mistral Large 353000010.5%
Grok 4.1 Fast4500009.1%
ByteDance Seed 1.6 Flash3800007.6%
GPT-5.22300004.7%
GPT-4o, Aug. 6th (temp=0)000000.0%
Claude 3.5 Haiku000000.0%
Mistral Small 3.2 24B000000.0%
Ministral 3 3B000000.0%
Ministral 3B000000.0%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B1001001001006091.9%
Gemini 3.1 Pro (Preview)100100100635283.1%
GPT-4o Mini (temp=0)1007977756980.1%
Z.AI GLM 4.5100100100100080.0%
GPT-5.11009085793277.3%
Claude Opus 410010066535274.1%
Claude Sonnet 41001008671071.5%
Claude 3.7 Sonnet10010010052070.4%
GPT-4o Mini (temp=1)1001007866068.8%
Z.AI GLM 4.71001009546068.3%
ByteDance Seed 1.6 Flash10010010038067.5%
Qwen 3.5 Plus (2026-02-15)1001007062066.4%
GPT-5.21009977282765.9%
Z.AI GLM 4.7 Flash1001006357064.0%
Claude Sonnet 4.579777768060.3%
Z.AI GLM 4.61001001000060.0%
Hermes 3 405B1001001000060.0%
Gemini 2.5 Flash Lite1001001000060.0%
Rocinante 12B1001001000060.0%
Z.AI GLM 5100716861060.0%
Gemini 2.5 Pro100865653058.9%
Claude Opus 4.5100716359058.7%
Gemini 2.5 Flash1005050474558.2%
GPT-5 Mini1007544373457.9%
GPT-4o, Aug. 6th (temp=1)10096890057.1%
Arcee AI: Trinity Mini100100750054.9%
Cohere Command R+ (Aug. 2024)10098680053.3%
Claude Opus 4.6100565149051.1%
GPT-4.1 Mini8585760049.0%
Mistral Large100100430048.6%
MoonshotAI: Kimi K2.59578600046.6%
Gemma 3 12B10066540044.0%
Mistral Large 310076440044.0%
Claude Sonnet 4.610061590044.0%
GPT-4.110061550043.2%
Minimax M2.510059570043.1%
DeepSeek V3 (2024-12-26)8364620041.8%
Claude 3.5 Sonnet10010000040.0%
DeepSeek V3.110010000040.0%
Llama 3.1 70B10010000040.0%
Llama 3.1 Nemotron 70B10010000040.0%
Gemini 3 Pro (Preview)10051420038.5%
Hermes 3 70B1008500036.9%
GPT-4.1 Nano868300033.9%
Writer: Palmyra X55854520032.9%
Ministral 3 14B6859370032.9%
Gemini 3 Flash (Preview)1006400032.8%
DeepSeek V3 (2025-03-24)1006200032.3%
DeepSeek-V2 Chat1005700031.4%
Claude 3 Haiku817600031.3%
Mistral Large 21005600031.1%
GPT-4o, Aug. 6th (temp=0)797400030.6%
Claude Haiku 4.5706500027.1%
o4 Mini874800026.9%
GPT-5 Nano7528240025.4%
Ministral 3 8B6842100023.9%
Mistral NeMO635600023.7%
Mistral Medium 3.1634400021.5%
Grok 4.1 Fast535200020.9%
DeepSeek V3.2515000020.1%
ByteDance Seed 1.6100000020.0%
Llama 3.1 8B100000020.0%
Ministral 3B100000020.0%
Ministral 3 3B96000019.2%
WizardLM 2 8x22b93000018.5%
GPT-5494300018.3%
Gemma 3 4B86000017.2%
Mistral Small Creative81000016.1%
o4 Mini High403400014.8%
Arcee AI: Trinity Large (Preview)74000014.7%
Ministral 8B71000014.2%
Gemma 3 27B69000013.9%
GPT-4o, May 13th (temp=1)60000012.0%
Grok 4 Fast54000010.9%
GPT-4o, May 13th (temp=0)54000010.9%
Mistral Small 3.2 24B200000.4%
Grok 4000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Haiku000000.0%
Qwen 2.5 72B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Z.AI GLM 4.71001001001009098.1%
Writer: Palmyra X51001001001007294.4%
Claude Opus 4.5100100100795586.9%
Ministral 3 14B10010094736285.8%
Claude Opus 4.610010074717083.1%
Z.AI GLM 510010091753480.1%
Gemma 3 27B10010091534577.9%
Qwen 3.5 397B A17B10010074664877.6%
GPT-5 Nano10010088722276.4%
Qwen 3.5 Plus (2026-02-15)100100100413875.7%
GPT-51009865555474.4%
Gemini 3 Pro (Preview)1009383563573.4%
GPT-5.11009474534272.6%
Gemini 3.1 Pro (Preview)1009769642971.9%
GPT-4.110010010044068.8%
Llama 3.1 8B1001007568068.4%
GPT-4o, Aug. 6th (temp=1)1001007268068.2%
Mistral Small Creative1001009835066.6%
Gemini 2.5 Flash1001009733066.0%
Mistral Medium 3.11001006960065.9%
GPT-5 Mini10010010027065.4%
Claude Sonnet 4686866615664.0%
DeepSeek V3 (2025-03-24)100796760061.1%
GPT-5.21001006739061.0%
Ministral 3 8B1006453434160.3%
Gemini 3 Flash (Preview)878443434259.9%
Llama 3.1 Nemotron 70B100100960059.2%
GPT-4o Mini (temp=1)100726360058.9%
Gemma 3 12B100965345058.8%
MoonshotAI: Kimi K2.5100756051057.2%
Grok 41001004341056.8%
Claude Sonnet 4.695646352054.8%
Mistral Large 2100636342053.5%
GPT-4.1 Nano10075720049.4%
Mistral Large 310081650049.1%
GPT-4o, Aug. 6th (temp=0)10075700049.0%
DeepSeek V3.1100504240046.3%
Stealth: Aurora Alpha100100310046.2%
Gemma 3 4B10065630045.6%
ByteDance Seed 1.610068570045.2%
GPT-4o, May 13th (temp=1)10067590045.1%
Mistral Large10072500044.4%
Arcee AI: Trinity Mini7968650042.6%
ByteDance Seed 1.6 Flash10064480042.3%
o4 Mini100393635041.9%
Gemini 2.5 Pro8684360041.3%
Claude 3.5 Haiku10010000040.0%
Rocinante 12B10010000040.0%
Cohere Command R+ (Aug. 2024)8361560040.0%
GPT-4o Mini (temp=0)7264590039.1%
Arcee AI: Trinity Large (Preview)7663530038.3%
Gemini 2.5 Flash Lite10047430038.0%
Z.AI GLM 4.7 Flash8755440037.2%
GPT-4.1 Mini6355540034.5%
Z.AI GLM 4.56154520033.4%
DeepSeek V3 (2024-12-26)1006700033.3%
o4 Mini High60413430032.9%
Hermes 3 70B1005800031.6%
Ministral 8B1005800031.6%
Mistral NeMO826600029.6%
Claude Sonnet 4.51004500028.9%
Z.AI GLM 4.65847380028.5%
Ministral 3 3B706300026.7%
Grok 4 Fast4843390026.0%
Claude 3.7 Sonnet4843360025.5%
GPT-4o, May 13th (temp=0)685500024.5%
DeepSeek-V2 Chat545200021.2%
Hermes 3 405B100000020.0%
Llama 3.1 70B100000020.0%
Claude Haiku 4.5544200019.2%
DeepSeek V3.23530280018.7%
Minimax M2.5494300018.3%
Claude 3.5 Sonnet88000017.5%
Grok 4.1 Fast533400017.2%
WizardLM 2 8x22b57000011.5%
Claude Opus 4000000.0%
Qwen 2.5 72B000000.0%
Mistral Small 3.2 24B000000.0%
Claude 3 Haiku000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Mistral Small Creative1009582715480.6%
Ministral 3 8B100100100100080.0%
Mistral Large10010010096079.2%
Llama 3.1 Nemotron 70B10010010091078.2%
ByteDance Seed 1.6 Flash100969690076.5%
Gemini 3.1 Pro (Preview)1008180544471.8%
Ministral 8B10010010048069.5%
Grok 4 Fast10010010046069.2%
Qwen 3.5 397B A17B1008353493964.9%
MoonshotAI: Kimi K2.51001006161064.4%
Mistral Medium 3.11001001000060.0%
GPT-5.21006447422054.7%
Claude Opus 4100100660053.2%
Rocinante 12B82786637052.6%
Grok 4100545450051.6%
Claude Sonnet 4100100570051.5%
Z.AI GLM 4.7100534947049.7%
Ministral 3 14B100100420048.5%
o4 Mini10086430045.9%
Stealth: Aurora Alpha65625050045.2%
Claude 3.7 Sonnet10057520041.8%
GPT-5 Mini10064420041.1%
GPT-575683327040.6%
Gemini 2.5 Flash Lite10054480040.5%
ByteDance Seed 1.610010000040.0%
Claude 3.5 Haiku10010000040.0%
Arcee AI: Trinity Large (Preview)10010000040.0%
Arcee AI: Trinity Mini10010000040.0%
Ministral 3B10010000040.0%
Mistral Large 210052480039.8%
Grok 4.1 Fast1009900039.8%
Llama 3.1 8B1009800039.6%
Hermes 3 70B1009600039.2%
Claude Opus 4.69653450038.8%
Hermes 3 405B1009300038.5%
GPT-4.1 Mini1009100038.2%
Qwen 2.5 72B1008200036.4%
GPT-4o, Aug. 6th (temp=1)868300033.9%
GPT-5.1712523232232.7%
DeepSeek V3 (2025-03-24)937000032.6%
GPT-4o, May 13th (temp=1)827200030.9%
o4 Mini High5151500030.2%
Claude Sonnet 4.6786800029.3%
Gemini 3 Pro (Preview)1004600029.3%
DeepSeek V3.15049430028.3%
Gemini 2.5 Flash5447390028.1%
GPT-5 Nano6140340026.8%
Mistral Large 3686000025.4%
DeepSeek V3.2893700025.2%
Mistral NeMO824100024.5%
Gemma 3 27B535200021.1%
Claude Opus 4.5100000020.0%
Z.AI GLM 5100000020.0%
Minimax M2.5100000020.0%
GPT-4o Mini (temp=1)100000020.0%
Llama 3.1 70B100000020.0%
Gemma 3 4B100000020.0%
WizardLM 2 8x22b100000020.0%
Z.AI GLM 4.7 Flash494800019.4%
Cohere Command R+ (Aug. 2024)96000019.2%
Claude 3 Haiku83000016.7%
GPT-4o, Aug. 6th (temp=0)81000016.1%
Gemini 2.5 Pro80000016.0%
Qwen 3.5 Plus (2026-02-15)75000014.9%
Writer: Palmyra X569000013.9%
Claude Haiku 4.568000013.7%
Z.AI GLM 4.568000013.7%
Gemini 3 Flash (Preview)66000013.2%
GPT-4o, May 13th (temp=0)66000013.2%
DeepSeek-V2 Chat60000012.0%
Claude Sonnet 4.559000011.8%
Gemma 3 12B58000011.6%
GPT-4.157000011.4%
DeepSeek V3 (2024-12-26)56000011.1%
Mistral Small 3.2 24B56000011.1%
Z.AI GLM 4.652000010.3%
Ministral 3 3B3100006.2%
Claude 3.5 Sonnet000000.0%
GPT-4o Mini (temp=0)000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B1001001001004889.5%
Claude Sonnet 4.6100100100726988.4%
Mistral NeMO10010010081076.1%
GPT-5 Mini1008875644073.5%
GPT-4o Mini (temp=1)100918682071.8%
Gemini 3.1 Pro (Preview)1001009452069.2%
Grok 4 Fast1001007263067.0%
Claude 3.7 Sonnet1001001000060.0%
Hermes 3 70B1001001000060.0%
GPT-4o, Aug. 6th (temp=1)100100930058.5%
Claude 3 Haiku100100930058.5%
Mistral Small Creative100686757058.2%
Rocinante 12B100100710054.3%
ByteDance Seed 1.6100100680053.7%
GPT-4.1 Nano10086760052.4%
GPT-5 Nano100745730052.2%
Claude Opus 4100100600052.0%
Gemini 2.5 Flash100664745051.6%
GPT-5965750292551.5%
ByteDance Seed 1.6 Flash686449373249.9%
Qwen 3.5 Plus (2026-02-15)75635951049.4%
DeepSeek V3.2100100440048.8%
GPT-4o, May 13th (temp=0)10079600047.8%
Gemini 3 Flash (Preview)72575650047.2%
Z.AI GLM 4.58685640047.0%
o4 Mini High88525040046.0%
Claude Sonnet 4.510069570045.4%
Gemini 2.5 Flash Lite9279530044.7%
Mistral Large 310067520043.6%
Grok 4.1 Fast78544240042.7%
Writer: Palmyra X557575345042.5%
MoonshotAI: Kimi K2.510010000040.0%
Z.AI GLM 4.7 Flash10010000040.0%
Claude 3.5 Sonnet10010000040.0%
DeepSeek-V2 Chat10010000040.0%
Llama 3.1 Nemotron 70B10010000040.0%
GPT-5.2100512523039.7%
Z.AI GLM 51009400038.9%
o4 Mini8954450037.6%
Gemma 3 12B6861580037.3%
DeepSeek V3 (2024-12-26)1008300036.7%
Gemini 2.5 Pro49464540036.0%
Qwen 2.5 72B1007800035.6%
Cohere Command R+ (Aug. 2024)1007700035.4%
Z.AI GLM 4.66160550035.2%
Ministral 3 8B6854460033.7%
DeepSeek V3.11006600033.2%
Hermes 3 405B1006600033.2%
GPT-4o Mini (temp=0)887700032.9%
Arcee AI: Trinity Large (Preview)1006200032.3%
GPT-4.11005300030.5%
Ministral 3 14B1005300030.5%
GPT-4o, May 13th (temp=1)777500030.3%
Gemini 3 Pro (Preview)5951410030.1%
Grok 41005000030.0%
Gemma 3 27B895400028.7%
GPT-5.163292625028.7%
Z.AI GLM 4.75346420028.2%
Claude Opus 4.64946420027.4%
Claude Haiku 4.5785700027.1%
Mistral Small 3.2 24B645680025.5%
Minimax M2.5725300025.0%
Mistral Large753900022.8%
GPT-4o, Aug. 6th (temp=0)100000020.0%
Llama 3.1 70B100000020.0%
Arcee AI: Trinity Mini100000020.0%
Llama 3.1 8B100000020.0%
WizardLM 2 8x22b100000020.0%
Claude Opus 4.5524400019.2%
Mistral Medium 3.1464400018.0%
Ministral 3B83000016.7%
Gemma 3 4B72000014.5%
Mistral Large 271000014.3%
Ministral 8B68000013.5%
Stealth: Aurora Alpha363000013.4%
Ministral 3 3B55000011.0%
Claude Sonnet 4000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Llama 3.1 8B10010010093078.5%
Gemma 3 12B10010010065073.0%
MoonshotAI: Kimi K2.510010010054070.9%
ByteDance Seed 1.6 Flash10010010054070.9%
Qwen 3.5 397B A17B10010010032066.3%
Rocinante 12B1001007952066.3%
GPT-5.11008452472461.3%
Grok 4 Fast1001005449060.7%
Grok 494916355060.5%
Hermes 3 70B1001001000060.0%
Mistral Small Creative100776548058.0%
Hermes 3 405B100100770055.4%
DeepSeek-V2 Chat100100690053.9%
Gemma 3 27B100595651053.2%
Gemini 2.5 Pro100874138053.2%
Gemini 3.1 Pro (Preview)100100560051.2%
DeepSeek V3.2100100550051.0%
ByteDance Seed 1.610077760050.5%
Claude Opus 4.563636362050.3%
Ministral 3 8B100962917048.4%
GPT-5 Nano864241333046.4%
Z.AI GLM 4.510068600045.6%
Mistral Large10075510045.1%
Qwen 3.5 Plus (2026-02-15)7871690043.8%
Claude 3.7 Sonnet10056540042.0%
Ministral 3B1009800039.6%
GPT-5.2656228211938.9%
o4 Mini61504934038.6%
GPT-4o, Aug. 6th (temp=1)1009300038.5%
Ministral 8B78494321037.9%
Claude 3 Haiku1008900037.9%
Stealth: Aurora Alpha1007500035.0%
Writer: Palmyra X51006700033.3%
DeepSeek V3 (2024-12-26)1006100032.2%
Mistral Large 31006100032.2%
Claude Sonnet 41006000031.9%
Ministral 3 3B916800031.9%
GPT-5 Mini7545350030.8%
Qwen 2.5 72B797100030.2%
Z.AI GLM 4.65353410029.2%
Z.AI GLM 4.71004100028.2%
GPT-5825400027.1%
GPT-4.1675800025.0%
Arcee AI: Trinity Large (Preview)685000023.5%
Gemini 2.5 Flash Lite644800022.4%
Claude Opus 4.6575000021.3%
DeepSeek V3.1545100020.9%
Claude Opus 4544800020.4%
Z.AI GLM 5544700020.2%
Claude Sonnet 4.5100000020.0%
GPT-4o, May 13th (temp=1)100000020.0%
Claude 3.5 Haiku100000020.0%
GPT-4.1 Mini100000020.0%
Llama 3.1 70B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Ministral 3 14B100000020.0%
Cohere Command R+ (Aug. 2024)100000020.0%
Mistral NeMO100000020.0%
Gemma 3 4B100000020.0%
WizardLM 2 8x22b100000020.0%
o4 Mini High524700019.9%
Arcee AI: Trinity Mini98000019.6%
DeepSeek V3 (2025-03-24)94000018.9%
GPT-4.1 Nano94000018.9%
Claude 3.5 Sonnet89000017.9%
Z.AI GLM 4.7 Flash79000015.9%
Gemini 3 Pro (Preview)413300014.8%
GPT-4o Mini (temp=1)70000014.1%
GPT-4o Mini (temp=0)69000013.9%
GPT-4o, May 13th (temp=0)68000013.5%
Claude Haiku 4.567000013.3%
Gemini 3 Flash (Preview)63000012.5%
Grok 4.1 Fast4500008.9%
Mistral Medium 3.14200008.3%
Minimax M2.5000000.0%
Claude Sonnet 4.6000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Mistral Large 2000000.0%
Gemini 2.5 Flash000000.0%
Mistral Small 3.2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Nano10010092766586.6%
o4 Mini High100100100684181.7%
Hermes 3 405B100100100100080.0%
Hermes 3 70B100100100100080.0%
Qwen 3.5 Plus (2026-02-15)10010010097079.4%
Gemini 3.1 Pro (Preview)10010010096079.2%
Claude Sonnet 410010010083076.7%
Z.AI GLM 4.7 Flash10010010077075.4%
o4 Mini1009983423972.6%
MoonshotAI: Kimi K2.510010010057071.5%
Gemini 3 Pro (Preview)1008578444069.5%
ByteDance Seed 1.693868370066.5%
Claude Opus 4.51001008642065.6%
GPT-5.11009661412664.7%
Gemma 3 4B100886868064.6%
Z.AI GLM 4.793767471062.9%
GPT-4o, Aug. 6th (temp=0)88817262060.5%
Claude Opus 41001001000060.0%
DeepSeek-V2 Chat1001001000060.0%
Claude 3.5 Haiku1001001000060.0%
Gemini 2.5 Flash1001001000060.0%
Llama 3.1 70B1001001000060.0%
Llama 3.1 Nemotron 70B1001001000060.0%
GPT-4.1 Nano1001001000060.0%
Mistral NeMO100100970059.4%
GPT-4.1100915452059.2%
Ministral 3 8B100675958056.7%
Ministral 3B100100830056.6%
GPT-51007851231854.1%
Gemini 2.5 Pro1004442393952.9%
Gemini 3 Flash (Preview)100100610052.2%
Arcee AI: Trinity Mini100100610052.2%
Claude 3 Haiku100100580051.6%
Claude Opus 4.6100694542051.2%
Z.AI GLM 4.610093560049.6%
Z.AI GLM 4.510085630049.4%
Gemma 3 27B10092460047.6%
Claude Sonnet 4.610077470044.8%
Minimax M2.510064580044.4%
GPT-4o, May 13th (temp=0)10068540044.4%
DeepSeek V3 (2024-12-26)10071470043.7%
Qwen 2.5 72B9363620043.5%
GPT-5 Mini9963550043.4%
Claude Haiku 4.510059480041.4%
DeepSeek V3.110057470040.9%
Writer: Palmyra X58961520040.4%
Ministral 8B10052500040.3%
Cohere Command R+ (Aug. 2024)10010000040.0%
ByteDance Seed 1.6 Flash100383226039.2%
Grok 4 Fast1008900037.9%
GPT-4o, Aug. 6th (temp=1)1008800037.5%
GPT-4.1 Mini1008300036.7%
DeepSeek V3 (2025-03-24)1007600035.2%
Claude 3.5 Sonnet938200034.9%
GPT-5.28067230034.1%
Llama 3.1 8B917900034.1%
Mistral Small Creative8740380033.1%
Mistral Large6561350032.2%
GPT-4o Mini (temp=1)836900030.6%
Gemini 2.5 Flash Lite1005100030.1%
Gemma 3 12B1005000030.0%
Z.AI GLM 55746420029.2%
GPT-4o Mini (temp=0)635400023.4%
Claude Sonnet 4.5100000020.0%
Claude 3.7 Sonnet100000020.0%
Mistral Large 2100000020.0%
Arcee AI: Trinity Large (Preview)100000020.0%
Ministral 3 14B593900019.6%
DeepSeek V3.2583800019.2%
Grok 4.1 Fast78000015.6%
Rocinante 12B75000014.9%
Stealth: Aurora Alpha462800014.7%
Ministral 3 3B68000013.5%
GPT-4o, May 13th (temp=1)57000011.5%
Mistral Medium 3.13100006.3%
Mistral Small 3.2 24B200000.5%
Grok 4000000.0%
Mistral Large 3000000.0%
WizardLM 2 8x22b000000.0%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemma 3 4B797572676371.1%
DeepSeek V3.1100767670064.4%
ByteDance Seed 1.6 Flash1001006846062.8%
Claude 3.7 Sonnet1001001000060.0%
Cohere Command R+ (Aug. 2024)1001001000060.0%
Minimax M2.510098790055.5%
Hermes 3 70B100100710054.3%
Claude Haiku 4.5100100670053.3%
Gemini 2.5 Flash82655554051.2%
WizardLM 2 8x22b100674541050.6%
Z.AI GLM 510064540043.7%
Mistral Small Creative9870410041.8%
Gemini 2.5 Flash Lite7871560041.1%
Claude 3.5 Sonnet10010000040.0%
Hermes 3 405B10010000040.0%
Arcee AI: Trinity Mini10010000040.0%
GPT-4.11009800039.6%
Z.AI GLM 4.51009600039.2%
Arcee AI: Trinity Large (Preview)1009600039.2%
Llama 3.1 8B1009600039.2%
GPT-4o, Aug. 6th (temp=1)1009300038.5%
Ministral 3 14B8560440037.8%
Z.AI GLM 4.61008600037.2%
Claude Opus 46968450036.3%
Gemma 3 12B6160570035.5%
DeepSeek V3 (2024-12-26)1007200034.5%
GPT-4o, May 13th (temp=1)1005900031.8%
DeepSeek V3.26052450031.4%
GPT-5.160472622030.7%
GPT-4o Mini (temp=1)767200029.6%
Writer: Palmyra X55450420029.1%
Gemini 2.5 Pro686000025.7%
Gemini 3 Pro (Preview)4441380024.8%
MoonshotAI: Kimi K2.5695100024.0%
Z.AI GLM 4.7575600022.7%
Gemini 3 Flash (Preview)634900022.3%
Qwen 3.5 397B A17B6231170022.1%
GPT-5.25030240020.8%
GPT-5 Mini633700020.0%
Claude Opus 4.5100000020.0%
Claude Sonnet 4100000020.0%
GPT-4.1 Mini100000020.0%
Llama 3.1 70B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Mistral NeMO100000020.0%
Z.AI GLM 4.7 Flash97000019.4%
DeepSeek-V2 Chat623300018.9%
Claude 3 Haiku94000018.9%
Claude Sonnet 4.689000017.9%
DeepSeek V3 (2025-03-24)89000017.9%
GPT-4.1 Nano89000017.9%
Claude Opus 4.6454400017.8%
Mistral Medium 3.1434300017.2%
GPT-4o, Aug. 6th (temp=0)76000015.2%
o4 Mini452800014.5%
Ministral 8B60000011.9%
GPT-4o Mini (temp=0)57000011.4%
GPT-5282600010.8%
ByteDance Seed 1.64800009.6%
Mistral Large 24400008.8%
Grok 4.1 Fast4000008.1%
Grok 43800007.7%
Rocinante 12B3300006.7%
o4 Mini High2900005.7%
GPT-5 Nano2500005.0%
Gemini 3.1 Pro (Preview)000000.0%
Claude Sonnet 4.5000000.0%
Stealth: Aurora Alpha000000.0%
Qwen 3.5 Plus (2026-02-15)000000.0%
Grok 4 Fast000000.0%
Mistral Large 3000000.0%
GPT-4o, May 13th (temp=0)000000.0%
Claude 3.5 Haiku000000.0%
Mistral Large000000.0%
Gemma 3 27B000000.0%
Qwen 2.5 72B000000.0%
Mistral Small 3.2 24B000000.0%
Ministral 3 8B000000.0%
Ministral 3 3B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-4o, Aug. 6th (temp=1)10010076747083.9%
Mistral NeMO1009482765781.9%
Hermes 3 70B1001009685076.2%
Llama 3.1 8B10010010060071.9%
WizardLM 2 8x22b1008476373566.3%
GPT-4.1 Nano1001006561065.2%
Gemini 3 Pro (Preview)100100930058.7%
Claude Haiku 4.5100100860057.2%
DeepSeek V3.210095890056.9%
Gemini 2.5 Flash Lite100676353056.6%
GPT-4o, Aug. 6th (temp=0)82776757056.5%
Rocinante 12B100100810056.1%
Claude Sonnet 4100100750054.9%
Arcee AI: Trinity Mini100100740054.7%
Qwen 3.5 Plus (2026-02-15)1006638363254.3%
Ministral 3 14B100635848053.7%
ByteDance Seed 1.6 Flash100685343052.8%
Arcee AI: Trinity Large (Preview)100100620052.3%
Z.AI GLM 59785760051.6%
Minimax M2.5100100550051.0%
Gemma 3 12B100525250050.6%
Claude Opus 410094540049.6%
Z.AI GLM 4.7 Flash100724032048.9%
GPT-51006527251947.3%
Cohere Command R+ (Aug. 2024)10069630046.4%
Claude Opus 4.610094330045.3%
GPT-4o, May 13th (temp=1)10069540044.8%
Claude Sonnet 4.67472620041.5%
Gemma 3 4B10058430040.2%
GPT-4o Mini (temp=1)10010000040.0%
Llama 3.1 Nemotron 70B10010000040.0%
Gemma 3 27B10010000040.0%
Writer: Palmyra X57872460039.1%
Gemini 2.5 Flash1008800037.5%
GPT-4.1 Mini1008600037.2%
DeepSeek V3 (2024-12-26)6961550037.1%
Claude Opus 4.51008300036.5%
GPT-5 Nano977800034.9%
Ministral 8B1007100034.3%
Gemini 3 Flash (Preview)414034342033.7%
Mistral Medium 3.16057500033.3%
Mistral Large 35756490032.3%
GPT-4.11006000031.9%
o4 Mini High66352725030.6%
DeepSeek V3.15350430029.1%
Qwen 3.5 397B A17B10026190029.0%
Gemini 3.1 Pro (Preview)38383730028.7%
Hermes 3 405B796400028.7%
Claude Sonnet 4.54646460027.8%
Z.AI GLM 4.65249350027.1%
Grok 4795400026.7%
Grok 4 Fast884000025.8%
Qwen 2.5 72B646300025.3%
GPT-5.1754400023.8%
GPT-5.2575500022.5%
Z.AI GLM 4.74037330021.9%
DeepSeek-V2 Chat604100020.1%
Claude 3.5 Haiku100000020.0%
Mistral Large 2100000020.0%
Ministral 3 8B100000020.0%
Claude 3.7 Sonnet98000019.6%
Claude 3 Haiku96000019.2%
MoonshotAI: Kimi K2.5484200018.1%
GPT-5 Mini88000017.5%
Mistral Large71000014.3%
Gemini 2.5 Pro54000010.9%
Z.AI GLM 4.551000010.1%
Mistral Small Creative4800009.5%
GPT-4o, May 13th (temp=0)4700009.4%
Grok 4.1 Fast3200006.4%
o4 Mini2900005.8%
ByteDance Seed 1.6000000.0%
Stealth: Aurora Alpha000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Claude 3.5 Sonnet000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 70B000000.0%
Mistral Small 3.2 24B000000.0%
Ministral 3 3B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4 Fast100100100924988.1%
Ministral 3 14B100100100100080.0%
GPT-5 Nano10010010067073.4%
Claude Sonnet 41001007667068.5%
Claude Opus 41001007763068.0%
Gemini 2.5 Flash Lite100888658066.4%
Qwen 3.5 397B A17B1008576462366.0%
Z.AI GLM 4.7 Flash100936258062.5%
Llama 3.1 Nemotron 70B1001001000060.0%
Llama 3.1 8B1001001000060.0%
Mistral Small Creative100100870057.4%
Hermes 3 405B100100810056.1%
MoonshotAI: Kimi K2.5100100740054.7%
Mistral Large 310083690050.6%
Z.AI GLM 510068630046.2%
ByteDance Seed 1.6 Flash62605650045.4%
Z.AI GLM 4.610067520043.6%
Claude 3.7 Sonnet10010000040.0%
Ministral 3 8B10010000040.0%
Cohere Command R+ (Aug. 2024)10010000040.0%
Gemini 2.5 Flash8160570039.7%
Ministral 3B1009300038.5%
GPT-4.1 Nano1008800037.5%
GPT-4.1 Mini939100036.7%
Ministral 3 3B60444231035.5%
GPT-4o Mini (temp=1)868600034.5%
o4 Mini10040330034.5%
Mistral Large 21007100034.3%
Z.AI GLM 4.71007000034.1%
Grok 4.1 Fast7160360033.5%
Mistral Medium 3.16754450033.1%
GPT-4o, Aug. 6th (temp=0)857800032.6%
WizardLM 2 8x22b926800032.0%
DeepSeek-V2 Chat936700031.9%
Mistral Large1005700031.4%
Writer: Palmyra X51005600031.1%
Qwen 3.5 Plus (2026-02-15)777400030.1%
o4 Mini High1003700027.5%
GPT-4o, May 13th (temp=0)686300026.2%
Gemma 3 27B685600024.7%
Claude Opus 4.6585000021.6%
GPT-5.13937290021.1%
ByteDance Seed 1.6544900020.7%
Claude Opus 4.5100000020.0%
Claude Sonnet 4.5100000020.0%
Claude Haiku 4.5100000020.0%
GPT-4o, Aug. 6th (temp=1)100000020.0%
GPT-4o, May 13th (temp=1)100000020.0%
DeepSeek V3.1100000020.0%
Arcee AI: Trinity Large (Preview)100000020.0%
Arcee AI: Trinity Mini100000020.0%
Mistral NeMO100000020.0%
Gemini 3 Flash (Preview)544100019.1%
Claude Sonnet 4.691000018.2%
Llama 3.1 70B88000017.5%
DeepSeek V3 (2025-03-24)83000016.7%
Claude 3 Haiku81000016.1%
Hermes 3 70B76000015.2%
GPT-5 Mini423300015.0%
Rocinante 12B75000014.9%
Z.AI GLM 4.570000014.1%
Gemma 3 12B70000014.1%
Gemma 3 4B67000013.3%
DeepSeek V3.262000012.3%
Stealth: Aurora Alpha59000011.8%
Minimax M2.558000011.6%
Grok 458000011.6%
Gemini 3 Pro (Preview)51000010.2%
GPT-529150008.7%
Gemini 3.1 Pro (Preview)000000.0%
Gemini 2.5 Pro000000.0%
GPT-5.2000000.0%
GPT-4.1000000.0%
Claude 3.5 Sonnet000000.0%
DeepSeek V3 (2024-12-26)000000.0%
Claude 3.5 Haiku000000.0%
GPT-4o Mini (temp=0)000000.0%
Qwen 2.5 72B000000.0%
Mistral Small 3.2 24B000000.0%
Ministral 8B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 1.610010010086077.2%
Writer: Palmyra X51001006761065.5%
Mistral NeMO100856648059.6%
Claude 3 Haiku100100910058.2%
Hermes 3 70B100100860057.2%
Cohere Command R+ (Aug. 2024)100100790055.9%
Rocinante 12B100100770055.4%
GPT-4o, Aug. 6th (temp=0)10098720054.1%
Claude Sonnet 4.610093740053.2%
Claude 3.7 Sonnet10076760050.3%
Z.AI GLM 586545243046.9%
Gemma 3 27B8679570044.5%
GPT-5.18064590040.5%
Claude Sonnet 410010000040.0%
Gemini 3 Flash (Preview)1009600039.2%
Claude Opus 4.510051450039.0%
o4 Mini7976390038.9%
GPT-4o, May 13th (temp=1)1009400038.9%
Qwen 2.5 72B1009400038.9%
DeepSeek V3 (2025-03-24)1009100038.2%
GPT-510051390037.9%
Gemma 3 4B8158500037.7%
GPT-4o, Aug. 6th (temp=1)968900037.1%
Gemini 3 Pro (Preview)10047370036.9%
DeepSeek-V2 Chat837900032.5%
Grok 4 Fast1006100032.2%
GPT-4o, May 13th (temp=0)946600032.0%
Z.AI GLM 4.61005800031.6%
GPT-5 Nano10036210031.6%
Mistral Large896400030.7%
Claude Opus 4.65351420029.1%
Claude Opus 4856000029.0%
ByteDance Seed 1.6 Flash1004300028.7%
Qwen 3.5 397B A17B1002100024.2%
Claude Haiku 4.5575300021.9%
DeepSeek V3.2574800021.1%
Minimax M2.5100000020.0%
Grok 4100000020.0%
Z.AI GLM 4.7 Flash100000020.0%
Claude 3.5 Sonnet100000020.0%
Z.AI GLM 4.5100000020.0%
Mistral Medium 3.1100000020.0%
DeepSeek V3.1100000020.0%
GPT-4o Mini (temp=0)100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Mistral Small 3.2 24B100000020.0%
Arcee AI: Trinity Large (Preview)100000020.0%
WizardLM 2 8x22b100000020.0%
Hermes 3 405B91000018.2%
Ministral 3 14B89000017.9%
Ministral 3 3B89000017.9%
Llama 3.1 8B88000017.5%
Llama 3.1 70B83000016.7%
GPT-5.22825200014.8%
GPT-4o Mini (temp=1)74000014.7%
GPT-4.1 Nano69000013.9%
MoonshotAI: Kimi K2.568000013.5%
Mistral Small Creative68000013.5%
Gemma 3 12B60000012.0%
Gemini 2.5 Flash Lite60000012.0%
Gemini 2.5 Flash57000011.4%
Mistral Large 254000010.8%
Qwen 3.5 Plus (2026-02-15)53000010.5%
Z.AI GLM 4.752000010.4%
Mistral Large 352000010.4%
Stealth: Aurora Alpha4300008.7%
GPT-5 Mini2900005.7%
Gemini 3.1 Pro (Preview)000000.0%
o4 Mini High000000.0%
Gemini 2.5 Pro000000.0%
Claude Sonnet 4.5000000.0%
Grok 4.1 Fast000000.0%
GPT-4.1000000.0%
DeepSeek V3 (2024-12-26)000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
Ministral 3 8B000000.0%
Arcee AI: Trinity Mini000000.0%
Ministral 8B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 41001008572071.4%
Mistral Small Creative1001006659064.9%
Qwen 3.5 397B A17B10010010020063.9%
Llama 3.1 Nemotron 70B1001001000060.0%
ByteDance Seed 1.6 Flash100626240052.7%
Gemini 3 Pro (Preview)100804038051.7%
Rocinante 12B100100570051.4%
Claude Opus 410061600044.1%
Gemma 3 12B7270670041.9%
Mistral Large 310010000040.0%
Hermes 3 70B10010000040.0%
Llama 3.1 8B10010000040.0%
Cohere Command R+ (Aug. 2024)1009800039.6%
Qwen 2.5 72B1008800037.5%
WizardLM 2 8x22b1007800035.6%
GPT-4o, Aug. 6th (temp=1)967800034.9%
Minimax M2.51007200034.5%
Claude Sonnet 4.5967600034.4%
o4 Mini8946350034.2%
GPT-5937800034.1%
Claude 3.7 Sonnet1006900033.9%
Z.AI GLM 4.76360460033.7%
GPT-4o Mini (temp=1)868100033.4%
Ministral 3B1006600033.2%
Claude 3 Haiku837900032.5%
Gemma 3 27B837700032.1%
ByteDance Seed 1.61005300030.6%
DeepSeek V3.2935800030.3%
GPT-5 Nano1004500028.9%
Z.AI GLM 4.7 Flash756100027.1%
Z.AI GLM 5636300025.0%
o4 Mini High903200024.4%
Gemini 2.5 Pro634900022.2%
MoonshotAI: Kimi K2.5614800021.8%
Gemini 2.5 Flash535000020.5%
Claude Opus 4.6554600020.2%
GPT-4.1 Mini100000020.0%
Llama 3.1 70B100000020.0%
Mistral Large100000020.0%
Ministral 3 8B100000020.0%
Mistral NeMO100000020.0%
Arcee AI: Trinity Mini96000019.2%
GPT-4o, May 13th (temp=1)94000018.9%
Grok 4 Fast633100018.7%
DeepSeek V3 (2024-12-26)86000017.2%
Claude Haiku 4.575000014.9%
Z.AI GLM 4.572000014.5%
Arcee AI: Trinity Large (Preview)65000013.0%
Qwen 3.5 Plus (2026-02-15)64000012.8%
Writer: Palmyra X564000012.8%
Mistral Large 263000012.5%
DeepSeek V3.154000010.9%
GPT-5 Mini3900007.8%
GPT-5.13200006.4%
GPT-5.22100004.3%
Gemini 3.1 Pro (Preview)000000.0%
Claude Opus 4.5000000.0%
Grok 4000000.0%
Claude Sonnet 4.6000000.0%
Grok 4.1 Fast000000.0%
Z.AI GLM 4.6000000.0%
GPT-4.1000000.0%
Gemini 3 Flash (Preview)000000.0%
Stealth: Aurora Alpha000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Claude 3.5 Sonnet000000.0%
GPT-4o, May 13th (temp=0)000000.0%
DeepSeek-V2 Chat000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Claude 3.5 Haiku000000.0%
Mistral Medium 3.1000000.0%
Hermes 3 405B000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemini 2.5 Flash Lite000000.0%
Ministral 3 14B000000.0%
Mistral Small 3.2 24B000000.0%
Ministral 3 3B000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
Ministral 8B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Rocinante 12B100100100918194.3%
Z.AI GLM 4.7 Flash10010088856687.7%
GPT-4.1 Nano1001007771069.7%
Gemini 2.5 Flash Lite1006969493564.6%
Hermes 3 70B100838151062.9%
Gemma 3 4B1001005454061.7%
DeepSeek V3.2100817451061.1%
Minimax M2.5100985744060.0%
Cohere Command R+ (Aug. 2024)100100880057.5%
Claude Haiku 4.597635750053.4%
GPT-5 Nano100715139052.1%
Ministral 3 3B10083750051.6%
Qwen 3.5 Plus (2026-02-15)99764534050.8%
Claude Opus 4.6100724238050.5%
WizardLM 2 8x22b100624740049.7%
Claude Opus 4.592734340049.6%
Z.AI GLM 4.779754436046.9%
Gemini 3 Pro (Preview)10089440046.5%
GPT-4o, Aug. 6th (temp=1)8876570044.1%
Z.AI GLM 4.610066450042.2%
GPT-4o Mini (temp=1)7472620041.5%
Qwen 3.5 397B A17B805029271840.9%
ByteDance Seed 1.610010000040.0%
DeepSeek V3.110010000040.0%
GPT-4o, Aug. 6th (temp=0)6861600037.8%
Z.AI GLM 58761410037.8%
Claude 3 Haiku988900037.5%
Claude Sonnet 4.61007900035.9%
DeepSeek-V2 Chat1007400034.7%
GPT-4o, May 13th (temp=1)1006800033.5%
Claude 3.7 Sonnet1006300032.7%
Mistral Large818100032.3%
Qwen 2.5 72B1005600031.2%
Mistral NeMO1005500031.0%
GPT-5 Mini10028260030.9%
GPT-4.1 Mini886500030.5%
DeepSeek V3 (2024-12-26)6052380030.0%
Arcee AI: Trinity Mini816800029.6%
Writer: Palmyra X5895300028.5%
Z.AI GLM 4.5726900028.4%
Claude Sonnet 4.5825900028.2%
o4 Mini5852270027.5%
GPT-4o Mini (temp=0)686300026.2%
GPT-5.142333023025.6%
ByteDance Seed 1.6 Flash952800024.7%
GPT-4o, May 13th (temp=0)784300024.2%
Grok 4 Fast4937340024.1%
GPT-4.1645300023.5%
Gemini 2.5 Pro625400023.2%
Gemma 3 27B544800020.4%
Claude Opus 4100000020.0%
Claude Sonnet 4100000020.0%
Claude 3.5 Haiku100000020.0%
Hermes 3 405B100000020.0%
Llama 3.1 70B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Llama 3.1 8B100000020.0%
Arcee AI: Trinity Large (Preview)85000016.9%
MoonshotAI: Kimi K2.576000015.2%
Grok 475000014.9%
o4 Mini High69000013.8%
Ministral 3 14B68000013.5%
Mistral Large 363000012.7%
Gemini 2.5 Flash61000012.2%
Mistral Medium 3.152000010.4%
Gemini 3.1 Pro (Preview)4100008.2%
GPT-5.22700005.5%
GPT-52700005.3%
Gemini 3 Flash (Preview)2200004.3%
Grok 4.1 Fast000000.0%
Stealth: Aurora Alpha000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Claude 3.5 Sonnet000000.0%
Mistral Large 2000000.0%
Gemma 3 12B000000.0%
Mistral Small Creative000000.0%
Mistral Small 3.2 24B000000.0%
Ministral 3 8B000000.0%
Ministral 8B000000.0%
Ministral 3B000000.0%