AI-ism word frequency

Test: Bad Writing Habits

Avg. Score
31.8%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1ByteDance Seed 1.6 Flash69.1%$0.001327.3s51%
2GPT-5 Mini66.0%$0.010057.4s49%
3Claude Sonnet 4.670.1%$0.03139.3s48%
4Minimax M2.560.6%$0.00341.3m44%
5Claude Haiku 4.557.6%$0.01121.6s38%
6GPT-574.3%$0.0652.8m62%
7Z.AI GLM 554.9%$0.00841.2m39%
8GPT-5.261.7%$0.0561.5m48%
9GPT-5.163.5%$0.0541.8m48%
10Claude Opus 4.663.8%$0.0781.2m47%
11Claude Sonnet 4.554.8%$0.03538.1s35%
12Writer: Palmyra X546.9%$0.01122.0s32%
13Gemini 3 Flash (Preview)41.4%$0.007819.6s30%
14Claude Opus 4.558.1%$0.07053.4s41%
15Z.AI GLM 4.7 Flash46.0%$0.00171.2m34%
16GPT-5 Nano46.7%$0.00421.4m36%
17Z.AI GLM 4.748.0%$0.0101.4m34%
18Z.AI GLM 4.543.2%$0.005142.1s29%
19Claude 3.7 Sonnet49.3%$0.04246.7s33%
20Mistral Small Creative36.1%$0.00079.1s24%
21Mistral Medium 3.139.2%$0.004836.5s28%
22Qwen 3.5 Plus (2026-02-15)38.9%$0.006031.5s26%
23ByteDance Seed 1.654.9%$0.0132.5m35%
24Rocinante 12B43.4%$0.001438.4s20%
25Ministral 3 14B33.4%$0.000711.7s20%
26Llama 3.1 8B38.9%$0.00031.3m26%
27GPT-4.138.3%$0.01844.7s25%
28DeepSeek V3 (2025-03-24)36.2%$0.001439.4s21%
29Mistral Large 237.0%$0.01329.4s21%
30Gemini 3 Pro (Preview)45.3%$0.05554.4s31%
31Mistral Large 335.0%$0.003330.3s20%
32Mistral Large36.6%$0.01430.9s21%
33Arcee AI: Trinity Large (Preview)35.8%$0.000043.6s19%
34Claude Sonnet 439.3%$0.03243.7s24%
35DeepSeek V3.239.5%$0.00141.9m26%
36Ministral 8B27.2%$0.000410.4s15%
37Ministral 3 8B28.4%$0.000819.6s14%
38Z.AI GLM 4.631.6%$0.006551.5s19%
39Qwen 3.5 397B A17B46.9%$0.0143.0m31%
40Hermes 3 70B30.1%$0.00101.2m18%
41Hermes 3 405B32.1%$0.003253.2s14%
42MoonshotAI: Kimi K2.546.7%$0.0193.2m31%
43Claude 3.5 Sonnet34.1%$0.04835.5s19%
44WizardLM 2 8x22b33.9%$0.00261.8m17%
45DeepSeek V3.130.5%$0.00201.8m19%
46Gemma 3 27B21.2%$0.000652.6s16%
47DeepSeek V3 (2024-12-26)23.9%$0.002154.6s13%
48Gemini 2.5 Pro26.8%$0.03636.2s17%
49DeepSeek-V2 Chat22.0%$0.002153.3s12%
50Ministral 3B17.4%$0.00018.1s7%
51Grok 4 Fast15.5%$0.001724.1s11%
52o4 Mini17.8%$0.01525.7s12%
53Grok 4.1 Fast17.5%$0.001837.8s10%
54Ministral 3 3B13.6%$0.000511.1s5%
55Arcee AI: Trinity Mini13.1%$0.00039.2s1%
56Gemini 2.5 Flash12.8%$0.005210.6s2%
57Gemma 3 12B13.1%$0.000441.3s5%
58Mistral NeMO10.7%$0.000510.1s1%
59o4 Mini High16.3%$0.02547.2s10%
60GPT-4.1 Mini11.8%$0.002719.0s0%
61Claude 3.5 Haiku10.1%$0.003510.8s0%
62Claude Opus 454.8%$0.2091.4m37%
63Llama 3.1 70B13.4%$0.001529.4s0%
64Cohere Command R+ (Aug. 2024)20.1%$0.02052.5s4%
65Gemini 2.5 Flash Lite8.1%$0.00099.5s0%
66Claude 3 Haiku9.7%$0.002514.9s0%
67GPT-4.1 Nano8.1%$0.000713.3s0%
68Gemma 3 4B8.3%$0.000220.0s0%
69Stealth: Aurora Alpha4.5%$0.00009.8s0%
70Gemini 3.1 Pro (Preview)35.0%$0.1071.8m23%
71Qwen 2.5 72B8.0%$0.001036.7s0%
72Grok 419.4%$0.0481.7m14%
73Llama 3.1 Nemotron 70B5.0%$0.003831.7s0%
74GPT-4o Mini (temp=1)2.3%$0.001234.8s0%
75GPT-4o, May 13th (temp=0)8.9%$0.03514.1s0%
76GPT-4o Mini (temp=0)0.8%$0.001234.8s0%
77GPT-4o, May 13th (temp=1)7.5%$0.03314.4s0%
78GPT-4o, Aug. 6th (temp=1)2.9%$0.01824.4s0%
79GPT-4o, Aug. 6th (temp=0)3.1%$0.02322.7s0%
80Mistral Small 3.2 24B12.0%$0.00695.7m0%
31.76%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 1.6 Flash827769554866.2%
GPT-5756363565662.5%
GPT-5 Mini766059514858.8%
GPT-5.2686659514557.5%
Claude Sonnet 4.6666551502952.0%
GPT-5.1686043423749.9%
Claude Opus 4.6644946454449.8%
Claude Opus 4.5645454401645.4%
Claude Opus 4615147353245.1%
ByteDance Seed 1.6635037363143.5%
Minimax M2.5676234282543.2%
Claude 3.7 Sonnet533939393541.0%
Z.AI GLM 4.7534640333040.5%
Z.AI GLM 4.7 Flash60544741040.2%
Writer: Palmyra X563585521039.4%
Z.AI GLM 4.5545448211939.3%
Qwen 3.5 397B A17B565243252039.0%
Qwen 3.5 Plus (2026-02-15)694632241036.3%
GPT-5 Nano414136342735.9%
DeepSeek V3 (2025-03-24)55443933034.0%
Claude Haiku 4.5453532322533.7%
Rocinante 12B59583911033.5%
Z.AI GLM 5433527251829.7%
Gemini 3 Flash (Preview)453928201529.6%
MoonshotAI: Kimi K2.5622524221429.4%
Hermes 3 70B58342825028.9%
Claude Sonnet 4.5403530241328.5%
Gemini 3 Pro (Preview)523131131127.7%
Llama 3.1 8B5443380027.1%
Mistral Medium 3.1353227181225.0%
WizardLM 2 8x22b5438320024.6%
Gemini 3.1 Pro (Preview)44402711024.4%
Mistral Large 3343318171323.1%
Ministral 3 14B4943211022.8%
DeepSeek V3.248252119022.5%
Mistral Large41232114821.6%
Hermes 3 405B39282414021.1%
DeepSeek V3 (2024-12-26)4139250021.1%
GPT-4.1312120151420.1%
Mistral Large 241221916019.7%
Gemini 2.5 Pro33281414017.8%
Llama 3.1 70B88000017.6%
Ministral 3B672000017.5%
Claude 3.5 Sonnet27241616217.0%
Mistral Small Creative24231414015.2%
Arcee AI: Trinity Large (Preview)403120014.4%
Ministral 3 8B2827160014.1%
Z.AI GLM 4.62015136010.6%
Cohere Command R+ (Aug. 2024)51000010.2%
Claude Sonnet 43753009.1%
DeepSeek-V2 Chat30150008.9%
Grok 4.1 Fast181411008.6%
DeepSeek V3.119127007.6%
Arcee AI: Trinity Mini3700007.5%
Ministral 8B2960006.9%
Mistral NeMO11100004.3%
Gemma 3 27B1380004.2%
Grok 4 Fast1110002.4%
Ministral 3 3B1200002.4%
Gemini 2.5 Flash600001.1%
GPT-4o, May 13th (temp=1)400000.8%
Gemma 3 4B400000.8%
Mistral Small 3.2 24B300000.7%
Grok 4000000.1%
o4 Mini High000000.0%
o4 Mini000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemma 3 12B000000.0%
Gemini 2.5 Flash Lite000000.0%
Llama 3.1 Nemotron 70B000000.0%
Qwen 2.5 72B000000.0%
Claude 3 Haiku000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5747370696570.2%
ByteDance Seed 1.6 Flash756358544859.6%
Rocinante 12B1007553361455.7%
Claude Opus 4.6706749464154.7%
GPT-5.1735950493653.7%
GPT-5.2655150501947.2%
GPT-5 Mini636150352546.6%
Minimax M2.5675754272746.5%
Claude Opus 4656038373045.7%
Claude Sonnet 4.5755347351845.4%
Qwen 3.5 397B A17B585344403145.3%
Claude Haiku 4.5694941363145.2%
ByteDance Seed 1.6464541363640.7%
Z.AI GLM 4.7565245361340.2%
Arcee AI: Trinity Large (Preview)724632232239.1%
Claude Opus 4.5484440323138.9%
Z.AI GLM 4.7 Flash515140331838.6%
GPT-5 Nano504038353038.5%
Claude Sonnet 4.6603736332638.3%
Z.AI GLM 5644038252137.5%
MoonshotAI: Kimi K2.5613832292436.8%
Qwen 3.5 Plus (2026-02-15)523733311633.8%
Gemini 3 Flash (Preview)513830292033.7%
Gemini 3 Pro (Preview)484432301433.6%
Hermes 3 405B60543913033.3%
DeepSeek V3 (2025-03-24)6753397033.1%
Claude 3.7 Sonnet414035181529.8%
WizardLM 2 8x22b46422815026.2%
Writer: Palmyra X5393026201525.8%
DeepSeek V3 (2024-12-26)675200024.0%
Mistral Small Creative50292510423.6%
DeepSeek V3.236342717022.9%
GPT-4.1342719191522.7%
Mistral Medium 3.132312416922.4%
Gemini 3.1 Pro (Preview)4741203022.2%
Claude Sonnet 44638260022.0%
Claude 3.5 Sonnet4035210019.4%
Llama 3.1 8B5222180018.2%
Hermes 3 70B622000016.2%
Mistral Large 2501990015.4%
DeepSeek-V2 Chat75000015.0%
Mistral Large421785014.6%
Z.AI GLM 4.53416162013.6%
Llama 3.1 70B62000012.4%
Ministral 3 8B271990010.8%
Gemini 2.5 Pro272700010.8%
Grok 4.1 Fast2814100010.4%
Gemma 3 27B28120008.1%
Gemma 3 12B2600005.1%
Mistral Small 3.2 24B1790005.1%
DeepSeek V3.11770004.8%
Ministral 3 3B1860004.8%
Ministral 3 14B13100004.6%
Z.AI GLM 4.62300004.5%
Stealth: Aurora Alpha900001.8%
Mistral Large 3510001.3%
Cohere Command R+ (Aug. 2024)600001.1%
Gemini 2.5 Flash500001.0%
Gemma 3 4B200000.4%
o4 Mini High000000.0%
o4 Mini000000.0%
Grok 4000000.0%
Grok 4 Fast000000.0%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o, May 13th (temp=1)000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemini 2.5 Flash Lite000000.0%
Llama 3.1 Nemotron 70B000000.0%
Qwen 2.5 72B000000.0%
Claude 3 Haiku000000.0%
Arcee AI: Trinity Mini000000.0%
Mistral NeMO000000.0%
GPT-4.1 Nano000000.0%
Ministral 8B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5928887847986.2%
ByteDance Seed 1.6 Flash949189767284.2%
GPT-5.1828282726977.3%
Claude Haiku 4.5878077746677.0%
GPT-5.2827878696674.6%
GPT-5 Mini887671696073.0%
Claude Sonnet 4.6857773715772.5%
Claude Opus 4.5857767676371.7%
Claude Sonnet 4.5897970595670.5%
Claude Opus 4.6787772616169.9%
Hermes 3 405B937768545268.8%
Claude Opus 4817968635268.6%
Z.AI GLM 4.7817373645168.5%
Qwen 3.5 397B A17B807169595366.5%
Z.AI GLM 5727068635966.2%
Gemini 3 Pro (Preview)757067595665.4%
Minimax M2.5807069672862.7%
Z.AI GLM 4.7 Flash797065544061.8%
ByteDance Seed 1.6786966484561.1%
Writer: Palmyra X5747262583960.8%
DeepSeek V3.2676362564057.8%
MoonshotAI: Kimi K2.5686058564657.4%
Claude 3.7 Sonnet767552424056.7%
Gemini 3 Flash (Preview)676358573756.3%
Z.AI GLM 4.5705655524655.7%
Qwen 3.5 Plus (2026-02-15)695754513853.9%
GPT-4.1796653393053.5%
WizardLM 2 8x22b665853474353.4%
Claude Sonnet 4676553523153.4%
DeepSeek V3 (2025-03-24)716655541953.1%
Mistral Large 3665858413952.1%
Mistral Large 2696853432150.9%
Claude 3.5 Sonnet706460471250.6%
Ministral 8B785150363349.4%
Gemini 3.1 Pro (Preview)605444423547.1%
Rocinante 12B100574035246.7%
GPT-5 Nano504945454446.5%
Mistral Medium 3.1695943313046.3%
Ministral 3 8B59585554045.2%
DeepSeek V3.1565248352843.9%
Llama 3.1 8B575651302343.4%
Mistral Large594342373442.9%
Gemini 2.5 Pro654846321741.6%
DeepSeek V3 (2024-12-26)786037201041.0%
Hermes 3 70B85453736040.6%
Gemma 3 27B453938383639.2%
Ministral 3 14B593834342538.0%
Arcee AI: Trinity Large (Preview)673431302637.7%
Grok 4554339292137.2%
Z.AI GLM 4.6723130301836.2%
o4 Mini453534332233.7%
Mistral Small Creative50504021132.2%
Grok 4.1 Fast55424021031.5%
Grok 4 Fast333224221525.2%
Arcee AI: Trinity Mini5945200024.9%
DeepSeek-V2 Chat35332520022.6%
Cohere Command R+ (Aug. 2024)594400020.8%
Gemma 3 12B37222119420.6%
GPT-4.1 Mini5321171018.4%
Llama 3.1 70B712100018.3%
Gemini 2.5 Flash Lite29261812017.0%
Ministral 3B523100016.6%
Gemini 2.5 Flash443350016.5%
Mistral NeMO561194015.8%
GPT-4o, May 13th (temp=1)3317158014.6%
Claude 3.5 Haiku403200014.3%
Mistral Small 3.2 24B62000012.5%
Claude 3 Haiku52500011.5%
o4 Mini High391080011.3%
Ministral 3 3B38100009.6%
Gemma 3 4B3800007.7%
Qwen 2.5 72B1500002.9%
GPT-4o, Aug. 6th (temp=1)840002.4%
GPT-4o, Aug. 6th (temp=0)1200002.3%
GPT-4.1 Nano910002.0%
GPT-4o Mini (temp=1)820002.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 Nemotron 70B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.6959186866784.9%
GPT-5838276727277.1%
ByteDance Seed 1.6 Flash928679525171.8%
Claude Haiku 4.5807773695570.9%
GPT-5.2787471656570.6%
Claude Opus 4.6827973625670.6%
GPT-5 Mini727170706469.4%
GPT-5.1757468636268.3%
Claude Opus 4727171655867.3%
Minimax M2.5877768535267.2%
Z.AI GLM 5717067656066.5%
Claude Sonnet 4.5787170654465.6%
Claude 3.7 Sonnet817965525065.3%
Claude Opus 4.5858459504464.6%
Mistral Large 2696464615763.0%
ByteDance Seed 1.6767167484361.1%
WizardLM 2 8x22b736656555060.0%
Hermes 3 405B926160543059.1%
Writer: Palmyra X5726363544259.0%
Claude Sonnet 4727058474758.5%
Mistral Large716858563557.6%
Rocinante 12B736557523857.0%
Z.AI GLM 4.7645754474753.9%
Z.AI GLM 4.5825350492952.4%
Arcee AI: Trinity Large (Preview)875547413252.4%
MoonshotAI: Kimi K2.5605950443750.0%
Gemini 3 Pro (Preview)645446454049.6%
Cohere Command R+ (Aug. 2024)625854412948.7%
Z.AI GLM 4.6755049402948.6%
GPT-5 Nano535244444347.3%
Mistral Medium 3.1575151453347.2%
Z.AI GLM 4.7 Flash555148433947.1%
Mistral Large 3646449292646.5%
DeepSeek V3.1565148443346.5%
Gemini 3 Flash (Preview)654944422946.0%
Mistral Small Creative524444413944.2%
Gemma 3 27B604746352843.2%
Ministral 3 14B836527271242.9%
Qwen 3.5 397B A17B604542372742.5%
Mistral Small 3.2 24B9777360041.9%
DeepSeek V3.2675041231940.1%
Llama 3.1 8B65553932038.3%
GPT-4.1534138382238.1%
Gemini 3.1 Pro (Preview)563937272336.3%
Qwen 3.5 Plus (2026-02-15)463933292634.6%
Gemini 2.5 Pro453936361734.5%
Gemma 3 12B504429232133.3%
Gemini 2.5 Flash453130292832.5%
Hermes 3 70B8940340032.4%
Grok 4533131181729.8%
Ministral 3B79371911029.3%
GPT-4o, May 13th (temp=1)49452416828.6%
DeepSeek-V2 Chat473127191327.4%
Ministral 8B43413016526.8%
Ministral 3 8B6933280025.9%
Claude 3 Haiku5532277124.3%
o4 Mini High362523181723.9%
DeepSeek V3 (2024-12-26)302926171523.7%
o4 Mini40332120022.8%
Grok 4.1 Fast35272323021.6%
DeepSeek V3 (2025-03-24)50281513021.2%
Ministral 3 3B33242318019.7%
Arcee AI: Trinity Mini4726230019.4%
Gemini 2.5 Flash Lite513700017.7%
GPT-4.1 Mini4226120016.2%
Grok 4 Fast2323197014.4%
GPT-4.1 Nano3817160014.1%
Mistral NeMO3319170013.9%
Claude 3.5 Sonnet3617141013.6%
GPT-4o, May 13th (temp=0)262485012.6%
Gemma 3 4B21151010211.8%
Llama 3.1 70B32150009.4%
GPT-4o Mini (temp=0)26190009.0%
Qwen 2.5 72B3030006.4%
GPT-4o, Aug. 6th (temp=1)1743005.0%
GPT-4o Mini (temp=1)1500003.0%
Stealth: Aurora Alpha752002.9%
GPT-4o, Aug. 6th (temp=0)1020002.5%
Llama 3.1 Nemotron 70B1020002.4%
Claude 3.5 Haiku000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.6888076656574.9%
Claude Opus 4.6848274625871.9%
ByteDance Seed 1.6 Flash797875715671.7%
Claude Haiku 4.5868380644571.5%
GPT-5817368646369.8%
Hermes 3 405B959457554869.7%
GPT-5.1797765645968.7%
ByteDance Seed 1.6917763565568.4%
Claude Opus 4.5777468625467.2%
GPT-5 Mini757069625866.9%
WizardLM 2 8x22b797469604966.1%
Minimax M2.5706968655665.5%
Z.AI GLM 5796562615965.2%
Mistral Large 2716968665064.9%
Claude Opus 4757470515164.5%
Qwen 3.5 397B A17B776765555463.8%
Claude Sonnet 4.5747371493059.4%
Writer: Palmyra X5747367582559.4%
GPT-5.2666457555158.7%
Gemini 3 Pro (Preview)666361504957.8%
Arcee AI: Trinity Large (Preview)827553472957.2%
MoonshotAI: Kimi K2.5766654493455.7%
Z.AI GLM 4.7595857534254.0%
Z.AI GLM 4.7 Flash706258473053.4%
Gemini 3 Flash (Preview)616051504152.6%
Claude Sonnet 4696256472852.3%
Qwen 3.5 Plus (2026-02-15)595751484652.1%
Mistral Small Creative686357392450.1%
Gemini 2.5 Pro625452483249.7%
Gemini 3.1 Pro (Preview)655049423648.6%
Hermes 3 70B674841414047.2%
Rocinante 12B86854022046.5%
Z.AI GLM 4.5585641373645.8%
DeepSeek V3.1705946312245.7%
Claude 3.7 Sonnet604948402945.4%
Ministral 3 14B754540333144.8%
Llama 3.1 8B67625630444.0%
DeepSeek V3.2575540392843.8%
GPT-5 Nano704638382543.2%
Ministral 3B755737262143.2%
DeepSeek V3 (2025-03-24)70644141043.1%
Mistral Large 3753937342742.3%
Mistral Large69584528440.7%
GPT-4.1464338373339.3%
Mistral Medium 3.1454137322836.8%
Z.AI GLM 4.6474239342036.2%
Claude 3.5 Sonnet62503217032.2%
DeepSeek V3 (2024-12-26)493727272031.9%
Ministral 8B6048329731.0%
Grok 4.1 Fast484032141229.1%
Arcee AI: Trinity Mini7630259028.1%
Grok 4404025191327.5%
Mistral Small 3.2 24B48442120026.6%
Mistral NeMO5135325225.0%
Grok 4 Fast48372116024.3%
Ministral 3 3B5345130022.2%
Gemma 3 12B614530021.9%
Gemma 3 27B4526264320.8%
Ministral 3 8B5028212020.3%
Cohere Command R+ (Aug. 2024)732250020.1%
DeepSeek-V2 Chat661097018.5%
o4 Mini High3224230015.7%
GPT-4.1 Nano432182014.7%
Claude 3.5 Haiku521370014.4%
Gemini 2.5 Flash362490013.8%
o4 Mini382410012.6%
GPT-4o, May 13th (temp=1)23168309.8%
Claude 3 Haiku22138609.8%
GPT-4.1 Mini2787108.6%
GPT-4o, Aug. 6th (temp=0)3800007.6%
GPT-4o, May 13th (temp=0)3800007.5%
Llama 3.1 70B3500007.1%
GPT-4o Mini (temp=1)1300002.5%
Gemini 2.5 Flash Lite1200002.4%
Llama 3.1 Nemotron 70B1000002.0%
Stealth: Aurora Alpha600001.2%
GPT-4o, Aug. 6th (temp=1)300000.6%
Qwen 2.5 72B300000.6%
GPT-4o Mini (temp=0)000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5858280756777.8%
Minimax M2.5918080735676.2%
GPT-5 Mini847574727275.5%
Claude Sonnet 4.6917769676674.1%
GPT-5.1888272656274.0%
ByteDance Seed 1.6 Flash797372716972.7%
Z.AI GLM 5898382594371.3%
GPT-5.2757170706971.0%
Claude Opus 4887467605869.4%
Claude Opus 4.5797973635068.9%
Claude Sonnet 4.5797876712766.3%
Claude Opus 4.6806662605865.1%
ByteDance Seed 1.6936753524662.2%
Qwen 3.5 397B A17B786759554861.3%
Claude 3.7 Sonnet856862523961.1%
Claude Haiku 4.5875955534860.4%
Gemini 3 Pro (Preview)725952514455.5%
Z.AI GLM 4.7 Flash716154474054.6%
Z.AI GLM 4.7625857514554.4%
Writer: Palmyra X5605749473950.3%
GPT-4.1636143433549.0%
MoonshotAI: Kimi K2.5725953382348.9%
GPT-5 Nano575548434148.7%
Rocinante 12B685857382248.5%
Claude Sonnet 4805438373348.4%
DeepSeek V3 (2025-03-24)785149371946.8%
Llama 3.1 8B585552521546.5%
Mistral Medium 3.1544944434046.2%
Mistral Small Creative734241403445.8%
Z.AI GLM 4.5615148472245.8%
DeepSeek V3.2584646423344.8%
WizardLM 2 8x22b544746462944.6%
DeepSeek V3.1545447392544.0%
Z.AI GLM 4.6754338323143.9%
Qwen 3.5 Plus (2026-02-15)564946392843.5%
Gemini 3 Flash (Preview)575040322240.2%
Ministral 8B584433311936.8%
Mistral Large 2593837302036.8%
Arcee AI: Trinity Large (Preview)655529211436.7%
Mistral Large58474427035.2%
DeepSeek V3 (2024-12-26)59484620034.8%
Claude 3 Haiku51433832533.9%
Mistral Large 3514824232133.5%
Grok 4534327271432.7%
Gemini 3.1 Pro (Preview)443830282432.7%
Ministral 3 14B44433736232.4%
Hermes 3 405B47433826030.6%
Claude 3.5 Sonnet6548380030.4%
Gemini 2.5 Pro49472922029.7%
DeepSeek-V2 Chat60343119128.9%
Gemma 3 27B43382929027.9%
o4 Mini High403332181226.9%
Llama 3.1 Nemotron 70B5250160023.4%
Ministral 3 3B44272317022.3%
Claude 3.5 Haiku554500020.0%
GPT-4o, May 13th (temp=0)5525140018.9%
Cohere Command R+ (Aug. 2024)423953218.1%
Ministral 3 8B612610017.6%
Gemma 3 12B4221146016.6%
Arcee AI: Trinity Mini373591016.3%
Ministral 3B472161015.1%
Hermes 3 70B471780014.5%
Grok 4 Fast242176512.8%
Qwen 2.5 72B2820150012.7%
Gemma 3 4B3413115012.7%
o4 Mini261680010.0%
Grok 4.1 Fast1413111109.9%
Llama 3.1 70B25147009.3%
Gemini 2.5 Flash Lite22113007.2%
Gemini 2.5 Flash2720005.9%
Mistral Small 3.2 24B2250005.5%
Mistral NeMO1390004.4%
GPT-4o Mini (temp=1)1900003.8%
GPT-4o, May 13th (temp=1)943003.3%
Stealth: Aurora Alpha500001.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4.1 Mini000000.0%
GPT-4o Mini (temp=0)000000.0%
GPT-4.1 Nano000000.0%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5676666635863.9%
Claude Sonnet 4.6706156515057.7%
ByteDance Seed 1.6 Flash736352464155.1%
GPT-5 Mini716748453653.6%
Claude Haiku 4.5787350481853.5%
GPT-5.2585452515153.2%
Minimax M2.5595647444349.8%
Z.AI GLM 4.5585647474049.5%
GPT-5.1605147393646.7%
Claude Sonnet 4.5565349443146.5%
Claude Opus 4.5594646443746.3%
Claude 3.7 Sonnet585547442445.6%
Claude Opus 4.6615643402745.4%
Z.AI GLM 4.7 Flash504744382641.1%
GPT-5 Nano614039282538.7%
Hermes 3 405B555350261038.7%
Qwen 3.5 Plus (2026-02-15)543838372438.0%
Arcee AI: Trinity Large (Preview)493837353137.7%
Claude Opus 4493939292536.2%
ByteDance Seed 1.6544828262235.7%
Z.AI GLM 5534539271034.8%
MoonshotAI: Kimi K2.5504730242334.7%
Rocinante 12B51414038034.0%
Z.AI GLM 4.7423230292932.3%
Mistral Medium 3.1453930211931.0%
Mistral Small Creative413530302030.9%
Gemini 3.1 Pro (Preview)413634321230.9%
Writer: Palmyra X5383333321329.6%
Gemini 3 Pro (Preview)353330271327.7%
GPT-4.1443521181826.9%
Llama 3.1 8B5546300026.4%
Mistral Large 2323225232126.4%
Claude 3.5 Sonnet47292921025.0%
Ministral 3 8B5637251023.9%
Mistral Large 3342923151222.8%
Mistral Large332721181122.0%
Claude Sonnet 4372316161521.3%
DeepSeek V3.228252215017.9%
Gemini 3 Flash (Preview)3925222017.7%
Hermes 3 70B522500015.4%
Ministral 3 14B242476012.4%
Z.AI GLM 4.6311920010.4%
Qwen 3.5 397B A17B251296010.3%
DeepSeek V3.1201410008.8%
Gemini 2.5 Pro26170008.7%
DeepSeek-V2 Chat31102008.6%
DeepSeek V3 (2024-12-26)16160006.4%
Grok 415141006.0%
Cohere Command R+ (Aug. 2024)1585005.4%
o4 Mini2420005.1%
Qwen 2.5 72B2500005.1%
DeepSeek V3 (2025-03-24)2110004.6%
Llama 3.1 70B2300004.6%
Gemma 3 12B2000004.1%
Mistral NeMO2000004.0%
Gemini 2.5 Flash1244004.0%
Claude 3 Haiku1800003.7%
Mistral Small 3.2 24B1700003.5%
Ministral 8B765003.4%
Grok 4 Fast1600003.2%
Gemini 2.5 Flash Lite1400002.8%
GPT-4o, Aug. 6th (temp=1)1200002.4%
Ministral 3B532002.0%
Ministral 3 3B800001.6%
Grok 4.1 Fast700001.4%
Gemma 3 27B210000.6%
Claude 3.5 Haiku300000.6%
o4 Mini High300000.6%
GPT-4o, May 13th (temp=0)200000.3%
Arcee AI: Trinity Mini100000.1%
Stealth: Aurora Alpha000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o, May 13th (temp=1)000000.0%
GPT-4.1 Mini000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 Nemotron 70B000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
WizardLM 2 8x22b000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5757574736972.9%
Claude Opus 4.6756763535061.5%
Claude Sonnet 4.6635449493950.8%
ByteDance Seed 1.6 Flash615651483650.5%
GPT-5 Mini585453413848.8%
Minimax M2.5625453363047.1%
GPT-5.2514746443243.8%
GPT-5 Nano464644413642.5%
GPT-5.1454141403841.1%
Claude Opus 458574036940.1%
Claude Haiku 4.5554135343139.3%
Gemini 3 Pro (Preview)564639371438.4%
Z.AI GLM 5504439342337.9%
Mistral Medium 3.1543933312937.1%
Llama 3.1 8B614937191836.7%
GPT-4.1683631251835.6%
Gemini 3 Flash (Preview)463535332634.9%
Qwen 3.5 397B A17B454240311734.8%
Claude 3.7 Sonnet444138381134.3%
Claude Sonnet 4.573462715232.6%
MoonshotAI: Kimi K2.5464227251631.2%
Z.AI GLM 4.7 Flash433332271830.7%
Z.AI GLM 4.7333129282128.3%
Gemini 3.1 Pro (Preview)413529231428.3%
Claude Opus 4.552333312827.8%
Rocinante 12B48402710125.3%
DeepSeek V3.2383325191024.9%
DeepSeek V3 (2025-03-24)34343219023.9%
Z.AI GLM 4.55337109022.0%
Ministral 3 14B4226249220.5%
Qwen 2.5 72B100000020.0%
Qwen 3.5 Plus (2026-02-15)36301512519.8%
ByteDance Seed 1.6393398819.6%
Writer: Palmyra X53630254018.9%
Z.AI GLM 4.64721178118.6%
Claude 3.5 Sonnet3532128017.5%
Mistral Small Creative31241710417.2%
Mistral Large 23915129015.0%
DeepSeek V3.12920171013.5%
Arcee AI: Trinity Large (Preview)451192013.5%
Hermes 3 70B291470010.1%
Gemma 3 27B3380008.1%
Mistral Large 32964007.8%
Claude Sonnet 418142006.8%
Gemini 2.5 Pro1398006.3%
Grok 4 Fast2920006.3%
Hermes 3 405B2730006.0%
Claude 3.5 Haiku2600005.3%
Ministral 3B1860005.0%
Mistral Large1950004.8%
o4 Mini1371004.0%
Grok 4.1 Fast1250003.3%
Ministral 3 8B1230003.0%
Ministral 8B654003.0%
Grok 41200002.4%
Gemma 3 12B900001.8%
Cohere Command R+ (Aug. 2024)900001.8%
DeepSeek V3 (2024-12-26)900001.7%
Ministral 3 3B700001.4%
Mistral Small 3.2 24B700001.4%
o4 Mini High000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, May 13th (temp=0)000000.0%
DeepSeek-V2 Chat000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o, May 13th (temp=1)000000.0%
GPT-4.1 Mini000000.0%
Gemini 2.5 Flash000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 70B000000.0%
Gemini 2.5 Flash Lite000000.0%
Llama 3.1 Nemotron 70B000000.0%
Claude 3 Haiku000000.0%
Arcee AI: Trinity Mini000000.0%
Mistral NeMO000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
WizardLM 2 8x22b000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 1.6 Flash878785786881.1%
GPT-5828181807880.4%
Claude Sonnet 4.6858176767578.6%
GPT-5.1817975747476.5%
GPT-5 Mini807975755973.6%
Claude Opus 4.6766966656267.5%
Minimax M2.5737270684966.6%
Claude Haiku 4.5896563595666.3%
GPT-5.2767068664865.5%
ByteDance Seed 1.6706964635564.1%
MoonshotAI: Kimi K2.5727162545462.5%
Claude Opus 4.5696864595262.1%
Claude Sonnet 4.5716464585161.6%
Writer: Palmyra X5686361575561.0%
Ministral 3 8B827159444460.1%
Claude Opus 4646158585559.1%
GPT-4.1696461613958.9%
Z.AI GLM 5656359544657.3%
Claude 3.5 Sonnet656554524856.8%
Z.AI GLM 4.7715656544356.0%
Ministral 8B726457473755.5%
Claude 3.7 Sonnet696855473855.4%
GPT-5 Nano706349464254.1%
Gemini 3 Flash (Preview)605756494854.0%
Mistral Large686154483052.4%
DeepSeek V3 (2025-03-24)785956481952.0%
Ministral 3 14B585351474751.4%
Gemini 3 Pro (Preview)606046424149.7%
Gemini 2.5 Pro575547454249.3%
Arcee AI: Trinity Large (Preview)626046443248.8%
DeepSeek V3.2565653413548.3%
Mistral Medium 3.1686149303047.5%
Z.AI GLM 4.5765638352446.0%
Claude Sonnet 4574844423545.3%
Mistral Large 2695745322044.5%
Z.AI GLM 4.7 Flash545246393244.5%
Z.AI GLM 4.6524844383743.8%
o4 Mini655148321542.0%
Gemini 3.1 Pro (Preview)505050342641.8%
Mistral Small Creative64534939441.8%
DeepSeek V3.1484641403341.6%
Mistral Large 3494645442241.2%
Rocinante 12B68595721041.0%
Qwen 3.5 397B A17B625136252439.7%
Qwen 3.5 Plus (2026-02-15)524536211934.8%
o4 Mini High484335252234.5%
Llama 3.1 8B5955540033.5%
DeepSeek V3 (2024-12-26)57502510028.3%
Hermes 3 405B583824111028.2%
Gemma 3 27B59291918025.1%
Grok 4 Fast352421201923.7%
Grok 4332924151122.3%
Llama 3.1 70B6036130021.7%
Claude 3.5 Haiku31312222021.2%
Ministral 3 3B5333153020.8%
Ministral 3B31282718020.7%
Gemma 3 12B252423201220.7%
Hermes 3 70B3635265020.5%
Grok 4.1 Fast36262112720.5%
GPT-4.1 Mini3729149017.7%
Mistral Small 3.2 24B434100016.9%
DeepSeek-V2 Chat601193016.6%
Gemini 2.5 Flash Lite382396115.4%
Gemini 2.5 Flash2723213014.9%
Mistral NeMO3125171014.9%
WizardLM 2 8x22b382900013.5%
GPT-4o, May 13th (temp=1)431700012.1%
Arcee AI: Trinity Mini25250009.9%
Stealth: Aurora Alpha21158409.8%
Cohere Command R+ (Aug. 2024)17137408.1%
Gemma 3 4B1798006.9%
GPT-4.1 Nano3300006.6%
Claude 3 Haiku2330005.2%
Qwen 2.5 72B910002.0%
GPT-4o Mini (temp=1)300000.7%
Llama 3.1 Nemotron 70B200000.3%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.6917977767479.4%
GPT-5 Mini858079767278.4%
GPT-5857675726875.3%
Claude Opus 4.5898170696775.1%
Claude Opus 4.6837776726674.8%
GPT-5.2777373726471.7%
GPT-5.1727270645867.0%
ByteDance Seed 1.6797267605666.8%
ByteDance Seed 1.6 Flash806864616066.7%
Z.AI GLM 4.5787566585666.3%
Z.AI GLM 5766863615464.3%
MoonshotAI: Kimi K2.5736662575362.2%
Claude Haiku 4.5797466484361.9%
GPT-5 Nano706161595661.4%
Claude Opus 4756462564660.5%
Minimax M2.5736962484760.0%
Gemini 3 Pro (Preview)706658555059.8%
Claude 3.7 Sonnet666464624259.6%
Claude Sonnet 4.5826448474657.4%
Claude 3.5 Sonnet875553533756.8%
Qwen 3.5 397B A17B636156515156.4%
Gemini 3.1 Pro (Preview)666563444055.4%
Rocinante 12B726565512455.3%
DeepSeek-V2 Chat765655523354.3%
Claude Sonnet 4636156503553.0%
Hermes 3 405B795250403651.3%
Ministral 3 8B707062342151.3%
Hermes 3 70B756739373650.9%
Gemini 3 Flash (Preview)545251454449.1%
Ministral 3 14B656542373047.9%
Z.AI GLM 4.7 Flash545150413947.3%
Mistral Large 2656047392547.1%
Qwen 3.5 Plus (2026-02-15)605953342947.1%
Gemini 2.5 Pro545248473447.0%
Arcee AI: Trinity Large (Preview)656046332946.6%
Mistral Large 3575454412746.5%
Mistral Medium 3.1605346403446.4%
Writer: Palmyra X5605348403246.3%
Mistral Large534948473346.1%
Z.AI GLM 4.7565047403145.0%
Mistral Small Creative574540363642.8%
DeepSeek V3.2585041343042.6%
Z.AI GLM 4.6595143361841.5%
Gemini 2.5 Flash565143361440.2%
DeepSeek V3.1544941411339.8%
GPT-4.1594738292339.1%
Llama 3.1 8B646333201538.8%
Ministral 8B544827272736.8%
Ministral 3B524535302036.4%
Grok 4474333262334.3%
DeepSeek V3 (2025-03-24)393733321631.4%
Grok 4.1 Fast483532241430.6%
DeepSeek V3 (2024-12-26)54492712729.9%
Gemma 3 4B543832141329.9%
Llama 3.1 70B6665160029.2%
GPT-4.1 Mini63333215028.6%
Grok 4 Fast423333171227.4%
Cohere Command R+ (Aug. 2024)6561100027.0%
o4 Mini36323030226.1%
Gemma 3 27B42402819226.0%
GPT-4o, May 13th (temp=0)48342016424.5%
GPT-4o, May 13th (temp=1)37341817722.7%
Ministral 3 3B39312516022.1%
Gemma 3 12B57201714121.8%
Claude 3 Haiku39352110021.0%
Arcee AI: Trinity Mini39262513020.6%
o4 Mini High3930179019.0%
Mistral NeMO29232215719.0%
Stealth: Aurora Alpha262018141218.0%
GPT-4o Mini (temp=1)3632160016.7%
Gemini 2.5 Flash Lite3922145116.1%
Mistral Small 3.2 24B2921214014.8%
Llama 3.1 Nemotron 70B30151413014.5%
GPT-4.1 Nano30141411013.7%
Qwen 2.5 72B2722119013.7%
WizardLM 2 8x22b2921170013.4%
GPT-4o, Aug. 6th (temp=0)251993011.1%
GPT-4o, Aug. 6th (temp=1)21136108.2%
Claude 3.5 Haiku1800003.6%
GPT-4o Mini (temp=0)1310003.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.6919190886484.9%
ByteDance Seed 1.6 Flash918785837684.4%
GPT-5817877767176.7%
GPT-5 Mini817873736975.0%
Claude Opus 4.6847670706472.8%
ByteDance Seed 1.6917569655370.5%
GPT-5.1887167605969.0%
Claude Haiku 4.5717068625164.5%
Minimax M2.5747169574964.1%
Arcee AI: Trinity Large (Preview)876960534763.1%
Claude Opus 4.5717064624762.8%
Claude Sonnet 4.5776865643962.6%
Writer: Palmyra X5737165544461.3%
Claude Opus 4756060604860.5%
Z.AI GLM 5736456555360.1%
GPT-5.2666462594960.1%
DeepSeek V3 (2025-03-24)1005947414057.4%
Claude Sonnet 4825857543557.2%
Mistral Large717057493355.9%
GPT-5 Nano615959484855.1%
Qwen 3.5 397B A17B666458483554.2%
Mistral Large 3696952424054.1%
Rocinante 12B916049432353.3%
Llama 3.1 8B635454534153.1%
Claude 3.5 Sonnet635755473851.9%
GPT-4.1636150403950.4%
Claude 3.7 Sonnet605452483950.4%
Ministral 3 14B656456333250.0%
Gemini 3 Pro (Preview)635348454149.9%
Z.AI GLM 4.7575652463849.8%
Z.AI GLM 4.5595553413348.3%
Gemini 3 Flash (Preview)674947453248.0%
Mistral Large 2605151463248.0%
MoonshotAI: Kimi K2.5575648453347.9%
Claude 3.5 Haiku786353261847.8%
Ministral 3 8B716540342647.3%
Qwen 3.5 Plus (2026-02-15)595145383445.3%
Hermes 3 70B684943382444.4%
Z.AI GLM 4.7 Flash574641393744.1%
Mistral Medium 3.1494848453043.9%
DeepSeek V3.2534945382642.4%
Gemini 3.1 Pro (Preview)484242362438.4%
Mistral Small Creative534840282238.3%
Z.AI GLM 4.6524129271933.7%
DeepSeek V3 (2024-12-26)54453915431.5%
DeepSeek V3.1493231251730.6%
Ministral 3 3B474030181430.0%
DeepSeek-V2 Chat473830241230.0%
Grok 4433532221829.8%
Gemini 2.5 Pro473425221829.1%
Hermes 3 405B54481810927.7%
o4 Mini High362926191625.3%
Grok 4 Fast38272724925.1%
Ministral 3B482918171225.1%
Mistral NeMO552817131224.9%
Ministral 8B44302822024.9%
Cohere Command R+ (Aug. 2024)6937190024.9%
o4 Mini352721211623.8%
GPT-4o, May 13th (temp=0)3431299822.2%
GPT-4.1 Mini614220020.9%
Gemma 3 27B5026159220.2%
Grok 4.1 Fast35231717519.4%
Mistral Small 3.2 24B4524211018.3%
Arcee AI: Trinity Mini5424110017.8%
Llama 3.1 70B632500017.6%
Gemini 2.5 Flash32231411517.0%
Gemini 2.5 Flash Lite3018110011.9%
Qwen 2.5 72B481000011.6%
GPT-4.1 Nano181710009.0%
Stealth: Aurora Alpha19175008.2%
GPT-4o, Aug. 6th (temp=1)23124108.1%
Gemma 3 4B2050004.9%
Gemma 3 12B1670004.6%
GPT-4o, Aug. 6th (temp=0)1600003.1%
GPT-4o, May 13th (temp=1)1500003.0%
Claude 3 Haiku300000.5%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 Nemotron 70B000000.0%
WizardLM 2 8x22b000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.61008577777683.1%
GPT-5818079747377.1%
Minimax M2.5817674706673.7%
GPT-5 Mini837569666672.1%
Claude Opus 4.5777369676770.5%
Claude Haiku 4.5777470675568.6%
Claude Opus 4887262615567.7%
ByteDance Seed 1.6 Flash888171504867.6%
Claude Sonnet 4.5787164636067.4%
ByteDance Seed 1.6857069614766.2%
GPT-5.1746868575564.5%
Z.AI GLM 5816867564563.3%
Z.AI GLM 4.7777465543961.6%
Claude 3.7 Sonnet696462595161.0%
WizardLM 2 8x22b10010066281060.9%
GPT-5.2796661573659.9%
Claude Sonnet 4757159484559.6%
Claude Opus 4.6686257564858.0%
DeepSeek V3 (2025-03-24)835755553957.7%
MoonshotAI: Kimi K2.5786160464257.4%
Claude 3.5 Sonnet726758552956.4%
Rocinante 12B926150443155.6%
Llama 3.1 8B706155494155.4%
Gemini 3 Flash (Preview)675752444352.6%
Gemini 3 Pro (Preview)676453453252.1%
Z.AI GLM 4.5736051413551.9%
Z.AI GLM 4.7 Flash695748463350.3%
Qwen 3.5 397B A17B595746414148.9%
DeepSeek-V2 Chat745047393348.7%
Ministral 3 14B615648443248.2%
Hermes 3 70B676147372948.2%
Mistral Large 3705852282346.3%
GPT-5 Nano554745433945.8%
Arcee AI: Trinity Large (Preview)615955312245.4%
DeepSeek V3.1615537362943.6%
DeepSeek V3.2504948402542.5%
Gemini 3.1 Pro (Preview)525243372742.1%
Writer: Palmyra X5615352271040.6%
Qwen 3.5 Plus (2026-02-15)514837363140.6%
Mistral Medium 3.1504737343139.7%
Mistral Large 2574643321739.0%
Ministral 3 8B504637332838.8%
Mistral Small Creative514934342638.6%
Gemini 2.5 Pro624931271837.2%
DeepSeek V3 (2024-12-26)62383726633.8%
Ministral 8B514632221132.4%
Z.AI GLM 4.6504031211731.9%
Gemma 3 27B383733291931.1%
Mistral Large542826202029.7%
Gemma 3 4B51483011829.4%
o4 Mini48432721128.1%
GPT-4.1433924231127.8%
o4 Mini High413026211827.2%
Hermes 3 405B6144270026.5%
Gemini 2.5 Flash46412914026.0%
Grok 4.1 Fast492221211725.9%
Ministral 3B6032284024.9%
Grok 44734297023.4%
Gemini 2.5 Flash Lite60331311023.3%
Llama 3.1 70B37332218021.9%
Claude 3.5 Haiku4340169021.6%
Ministral 3 3B5823118020.1%
Cohere Command R+ (Aug. 2024)521270014.2%
Grok 4 Fast20191511013.1%
GPT-4o, May 13th (temp=0)481700012.9%
Arcee AI: Trinity Mini2018130010.1%
Llama 3.1 Nemotron 70B24205009.7%
Gemma 3 12B24137209.2%
GPT-4o, May 13th (temp=1)30130008.6%
Claude 3 Haiku3273008.5%
GPT-4.1 Nano25142008.2%
Mistral NeMO2932207.4%
Qwen 2.5 72B2800005.5%
GPT-4o, Aug. 6th (temp=0)1500003.0%
GPT-4.1 Mini860002.8%
GPT-4o, Aug. 6th (temp=1)1200002.3%
GPT-4o Mini (temp=1)500001.1%
Stealth: Aurora Alpha000000.0%
GPT-4o Mini (temp=0)000000.0%
Mistral Small 3.2 24B000000.0%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 1.6 Flash656462585160.2%
GPT-5736863543959.5%
GPT-5.1595753513952.0%
Claude Sonnet 4.6595550423949.0%
GPT-5 Mini555448463247.1%
GPT-5.2595245423847.0%
Claude Opus 4.6565148383345.3%
Z.AI GLM 5635647342545.1%
Claude Haiku 4.5626050282044.2%
Claude 3.7 Sonnet604441392742.0%
Writer: Palmyra X5544342422040.6%
Llama 3.1 8B64474242039.1%
DeepSeek V3.2524841311737.8%
Rocinante 12B100432315036.4%
Z.AI GLM 4.7 Flash545041221235.8%
Minimax M2.5585129221735.6%
Ministral 3 14B504336262135.0%
Claude Opus 4.5514443241334.9%
Mistral Medium 3.1434237272334.3%
Claude Sonnet 4.5403736352133.9%
Z.AI GLM 4.560453217832.3%
GPT-5 Nano403632292432.3%
ByteDance Seed 1.6594227171732.3%
Qwen 3.5 Plus (2026-02-15)49483428032.1%
Mistral Large52322823928.8%
Mistral Small Creative37323119925.6%
Gemini 3 Pro (Preview)40353418025.5%
MoonshotAI: Kimi K2.54335317323.9%
Claude Opus 446252520023.2%
Z.AI GLM 4.749421411023.1%
Claude 3.5 Sonnet5032310022.5%
Claude Sonnet 442291614020.2%
Qwen 3.5 397B A17B312216151319.3%
Ministral 8B40271513019.2%
Cohere Command R+ (Aug. 2024)4132173018.6%
Arcee AI: Trinity Large (Preview)3735119018.5%
Hermes 3 70B4722180017.3%
DeepSeek V3 (2025-03-24)3130166016.6%
Z.AI GLM 4.62519180012.3%
Mistral Large 3242490011.5%
WizardLM 2 8x22b221576611.1%
GPT-4.1311590011.1%
Gemini 3.1 Pro (Preview)1912119010.0%
Mistral Large 226129009.4%
Ministral 3 8B33131009.4%
Gemini 2.5 Pro171514009.2%
Hermes 3 405B24193009.0%
Gemini 3 Flash (Preview)23166008.8%
DeepSeek-V2 Chat21125007.6%
o4 Mini1783105.9%
Ministral 3B19100005.9%
Ministral 3 3B1590004.9%
DeepSeek V3.11860004.8%
Gemma 3 27B1460004.0%
Mistral NeMO1200002.5%
Grok 4 Fast1111002.5%
Qwen 2.5 72B730002.1%
DeepSeek V3 (2024-12-26)630001.7%
GPT-4.1 Mini800001.5%
o4 Mini High600001.1%
Gemini 2.5 Flash Lite500001.1%
Gemini 2.5 Flash400000.8%
Grok 4400000.8%
Claude 3.5 Haiku200000.4%
Grok 4.1 Fast000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o, May 13th (temp=1)000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemma 3 12B000000.0%
Llama 3.1 70B000000.0%
Llama 3.1 Nemotron 70B000000.0%
Mistral Small 3.2 24B000000.0%
Claude 3 Haiku000000.0%
Arcee AI: Trinity Mini000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5776766635966.4%
ByteDance Seed 1.6 Flash605851463850.6%
Llama 3.1 8B715454492450.5%
Minimax M2.5645546443749.1%
GPT-5.1595652463249.0%
Claude Sonnet 4.6745348451446.7%
GPT-5.2534945403945.1%
Claude Opus 4.6595540302942.6%
GPT-5 Mini584948342442.5%
Z.AI GLM 5504343403041.3%
Z.AI GLM 4.7 Flash534039352438.2%
MoonshotAI: Kimi K2.556545225037.4%
ByteDance Seed 1.6564342311337.1%
Claude Opus 4.5554038321936.7%
Z.AI GLM 4.7564038211533.9%
GPT-4.1483829282433.5%
Claude 3.7 Sonnet57482726933.4%
GPT-5 Nano474230252233.3%
Gemini 3 Flash (Preview)393734231930.4%
Gemini 3 Pro (Preview)424125171728.4%
Claude Haiku 4.543402619626.5%
Claude Opus 462231817925.7%
Qwen 3.5 397B A17B52322913025.0%
Claude Sonnet 4.549272221023.8%
Mistral Large 343402413023.8%
Writer: Palmyra X540341411020.0%
Mistral Small 3.2 24B97000019.5%
Mistral Medium 3.128212019017.8%
Mistral Large5114107016.5%
Qwen 3.5 Plus (2026-02-15)3319199016.2%
Rocinante 12B26201810014.8%
Gemini 3.1 Pro (Preview)422600013.7%
WizardLM 2 8x22b381587013.7%
Z.AI GLM 4.5272500010.4%
DeepSeek V3.2381020010.0%
Claude Sonnet 426174009.4%
Gemma 3 27B25157009.4%
DeepSeek V3.14220008.7%
DeepSeek V3 (2025-03-24)16119508.1%
Grok 4 Fast27100007.5%
Z.AI GLM 4.62465007.1%
Ministral 3 8B3420007.1%
Hermes 3 70B18133006.8%
Grok 4.1 Fast16134006.5%
Claude 3.5 Sonnet2820006.0%
Arcee AI: Trinity Mini2900005.8%
DeepSeek-V2 Chat2400004.8%
Mistral Small Creative12101004.7%
Arcee AI: Trinity Large (Preview)985104.5%
Mistral Large 21840004.5%
Ministral 8B865003.8%
Hermes 3 405B1700003.3%
Ministral 3B1600003.1%
DeepSeek V3 (2024-12-26)930002.5%
Ministral 3 14B700001.5%
Claude 3 Haiku700001.3%
Gemini 2.5 Pro320001.1%
Gemma 3 12B500000.9%
o4 Mini High500000.9%
o4 Mini400000.9%
Gemma 3 4B400000.7%
Qwen 2.5 72B300000.6%
Cohere Command R+ (Aug. 2024)300000.5%
Grok 4100000.3%
Stealth: Aurora Alpha000000.0%
GPT-4o, May 13th (temp=0)000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o, May 13th (temp=1)000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
Gemini 2.5 Flash000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 70B000000.0%
Gemini 2.5 Flash Lite000000.0%
Llama 3.1 Nemotron 70B000000.0%
Ministral 3 3B000000.0%
Mistral NeMO000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5868684838183.9%
Claude Sonnet 4.6918681777682.2%
GPT-5.1868280797480.2%
Claude Opus 4.6898482736678.5%
ByteDance Seed 1.6 Flash938274707078.1%
GPT-5 Mini838180746376.2%
GPT-5.2797773717073.8%
Claude Haiku 4.5797372727273.4%
Z.AI GLM 5797773666171.3%
Claude Opus 4.5757271706570.5%
Qwen 3.5 397B A17B737271716169.6%
Minimax M2.5817468655568.8%
Claude Sonnet 4.5786969645867.8%
Writer: Palmyra X5796963605364.7%
Claude Opus 4696563605562.3%
GPT-5 Nano616058585658.5%
Mistral Large705857505057.3%
ByteDance Seed 1.6716564543057.0%
Gemini 3 Pro (Preview)615857564856.0%
Claude 3.7 Sonnet715650474754.2%
DeepSeek V3.2726159473254.1%
Qwen 3.5 Plus (2026-02-15)676158443653.1%
WizardLM 2 8x22b636249434352.2%
GPT-4.1646055433350.9%
MoonshotAI: Kimi K2.5685151473550.3%
Z.AI GLM 4.6645953452950.2%
Gemini 3.1 Pro (Preview)726158332750.2%
Gemini 3 Flash (Preview)615454393849.0%
Mistral Large 3746657321448.6%
Hermes 3 70B674947433347.8%
Claude 3.5 Sonnet595548393647.4%
DeepSeek V3 (2025-03-24)645245373246.1%
Z.AI GLM 4.7 Flash555048453145.9%
Ministral 3 14B725946421045.6%
Z.AI GLM 4.7696338312645.4%
Mistral Small Creative625545362244.0%
Ministral 8B666161201143.7%
Mistral Large 2694439372542.8%
Z.AI GLM 4.5544838373542.3%
Mistral Medium 3.1555243362341.8%
Arcee AI: Trinity Large (Preview)67484343441.3%
DeepSeek V3.1664544252340.6%
Claude Sonnet 4514643362740.6%
o4 Mini High58484241037.7%
Grok 4.1 Fast454434322335.8%
Cohere Command R+ (Aug. 2024)71393430034.7%
Gemma 3 27B433531231830.1%
Hermes 3 405B54432421930.1%
Grok 4 Fast413735201730.0%
Grok 4403527242229.7%
Gemini 2.5 Pro413937161429.4%
GPT-4.1 Mini49422827029.4%
Mistral NeMO5843430028.8%
Llama 3.1 8B52393716028.6%
DeepSeek V3 (2024-12-26)6432328027.1%
o4 Mini342828241225.2%
Ministral 3 8B422921141023.4%
Ministral 3B4835330023.3%
GPT-4.1 Nano35262320922.5%
DeepSeek-V2 Chat41262221022.1%
Rocinante 12B5032240021.1%
Qwen 2.5 72B4637180020.2%
Gemma 3 12B3024165015.1%
GPT-4o, Aug. 6th (temp=1)3116136013.2%
Gemini 2.5 Flash Lite22220008.7%
Llama 3.1 70B3800007.5%
Stealth: Aurora Alpha22131007.1%
Gemma 3 4B17162007.0%
Ministral 3 3B2870006.9%
Gemini 2.5 Flash17170006.8%
Mistral Small 3.2 24B2920006.2%
GPT-4o, May 13th (temp=1)2260005.7%
Arcee AI: Trinity Mini1462104.7%
GPT-4o, Aug. 6th (temp=0)1053003.6%
Claude 3 Haiku1800003.5%
GPT-4o Mini (temp=1)400000.8%
GPT-4o Mini (temp=0)100000.2%
GPT-4o, May 13th (temp=0)100000.1%
Claude 3.5 Haiku000000.0%
Llama 3.1 Nemotron 70B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.6969584818187.4%
GPT-5868380737278.8%
GPT-5 Mini868380696877.4%
Claude Opus 4.6837978786777.1%
ByteDance Seed 1.6 Flash888179736276.7%
GPT-5.1818173696874.4%
Claude Sonnet 4.5837574696874.0%
Minimax M2.5827978695472.4%
GPT-5.2838071656272.2%
ByteDance Seed 1.6777474715770.4%
Claude Haiku 4.5867069615167.3%
Claude Opus 4838163604366.0%
Claude Opus 4.5717064636266.0%
Z.AI GLM 4.7 Flash716756555360.4%
Z.AI GLM 5676663634360.3%
Rocinante 12B686857564959.7%
Claude Sonnet 4766564642759.1%
Claude 3.7 Sonnet747259583259.1%
GPT-5 Nano595857565557.0%
WizardLM 2 8x22b635956555157.0%
DeepSeek V3.2686059584056.9%
MoonshotAI: Kimi K2.5716457513956.4%
Mistral Medium 3.1716964393555.5%
Z.AI GLM 4.7555350494650.8%
Mistral Large 3706362471150.7%
GPT-4.1666441413950.2%
Z.AI GLM 4.5625049464450.0%
Qwen 3.5 397B A17B715647353448.9%
Gemini 3 Flash (Preview)595248413847.6%
Ministral 3 8B605252452947.5%
Writer: Palmyra X5645838373646.7%
Ministral 8B604942403846.0%
Mistral Small Creative754337363545.2%
GPT-4.1 Mini594742353343.0%
Z.AI GLM 4.6574641412942.7%
Mistral Large684937362042.3%
Mistral Large 266664236041.9%
Gemini 3 Pro (Preview)615543381141.7%
Cohere Command R+ (Aug. 2024)63494240439.6%
Llama 3.1 70B925517171539.4%
Gemini 3.1 Pro (Preview)753533262338.3%
Llama 3.1 8B64573830037.8%
Arcee AI: Trinity Mini64534226037.0%
DeepSeek V3.157554627036.7%
o4 Mini504238322036.5%
Gemini 2.5 Flash49474340136.2%
Hermes 3 70B573735331134.6%
Gemma 3 27B444138311834.4%
Claude 3.5 Sonnet524937191233.8%
GPT-4.1 Nano494133181831.9%
Gemini 2.5 Pro434029212030.6%
DeepSeek V3 (2025-03-24)463834181730.5%
DeepSeek-V2 Chat463735241030.5%
Qwen 3.5 Plus (2026-02-15)463936191029.8%
Arcee AI: Trinity Large (Preview)5846450029.8%
GPT-4o, May 13th (temp=0)42403425529.2%
Hermes 3 405B463528201528.7%
Claude 3 Haiku46423019628.6%
DeepSeek V3 (2024-12-26)37363227026.3%
Grok 4.1 Fast413127211026.1%
Ministral 3B41313126025.8%
Mistral Small 3.2 24B982700025.0%
Gemma 3 12B50352214024.1%
Grok 432302522923.7%
Ministral 3 14B42391616423.3%
GPT-4o, Aug. 6th (temp=0)51301814022.7%
Llama 3.1 Nemotron 70B58251713022.6%
Grok 4 Fast36333014022.5%
GPT-4o, May 13th (temp=1)3835256321.3%
o4 Mini High5123169019.8%
Gemini 2.5 Flash Lite4110108514.9%
Qwen 2.5 72B2415109111.9%
Ministral 3 3B272280011.5%
Stealth: Aurora Alpha2414128011.4%
GPT-4o Mini (temp=1)27120007.7%
Gemma 3 4B1365405.6%
Mistral NeMO2080005.6%
Claude 3.5 Haiku1920004.3%
GPT-4o Mini (temp=0)700001.5%
GPT-4o, Aug. 6th (temp=1)000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.61009079747483.5%
GPT-5848378777779.8%
GPT-5 Mini848278786978.2%
ByteDance Seed 1.6 Flash847977726775.7%
ByteDance Seed 1.6897570696473.2%
Minimax M2.5817676755873.2%
Claude Opus 4.6828271686072.5%
GPT-5.1757068665466.7%
Qwen 3.5 397B A17B777067595565.6%
Claude Opus 4.5807267624665.5%
GPT-5.2696865636064.9%
Writer: Palmyra X5747272634264.7%
Claude Haiku 4.5787866574464.6%
Rocinante 12B828070622363.3%
Z.AI GLM 4.7686868575162.4%
Claude Sonnet 4.5847356534662.3%
Mistral Large776559555261.4%
Z.AI GLM 5716060585059.8%
Claude Opus 4716557564057.7%
Mistral Large 3725554544956.5%
Mistral Small Creative666459474656.3%
MoonshotAI: Kimi K2.5776350474055.1%
WizardLM 2 8x22b686255533554.8%
GPT-5 Nano635751514954.1%
Z.AI GLM 4.5706658423353.9%
Ministral 3 14B787647363253.8%
Gemini 3 Flash (Preview)655952454152.4%
Mistral Large 2816050393152.4%
GPT-4.1725246454351.5%
DeepSeek V3.2525049474147.8%
Gemini 3 Pro (Preview)725743422347.6%
DeepSeek V3 (2025-03-24)685845353047.2%
Mistral Medium 3.1735743392347.0%
Z.AI GLM 4.6605947343446.9%
DeepSeek V3.1646251292846.9%
Qwen 3.5 Plus (2026-02-15)735341372546.0%
DeepSeek V3 (2024-12-26)645246382544.9%
Z.AI GLM 4.7 Flash514839393241.9%
Claude Sonnet 4605433271838.3%
Claude 3.7 Sonnet623838272137.1%
Claude 3.5 Sonnet645528181536.1%
Ministral 3 8B58494619835.8%
Hermes 3 70B64443930035.4%
Hermes 3 405B66383626934.8%
Grok 4514030292334.8%
o4 Mini524626262534.7%
Ministral 8B48433822932.0%
Cohere Command R+ (Aug. 2024)64582810031.9%
o4 Mini High49432522829.6%
Gemini 2.5 Pro443938171029.5%
DeepSeek-V2 Chat413433251229.0%
Llama 3.1 8B776350028.9%
Gemini 3.1 Pro (Preview)5943205025.5%
Grok 4.1 Fast322926251325.2%
Ministral 3 3B4640320023.7%
Grok 4 Fast423317161123.7%
Gemma 3 27B32292525122.3%
Arcee AI: Trinity Large (Preview)5522205020.5%
Stealth: Aurora Alpha262220191420.3%
Mistral NeMO5123206120.2%
Arcee AI: Trinity Mini4722174218.4%
Qwen 2.5 72B4721157017.9%
Gemma 3 4B4429140017.3%
Claude 3.5 Haiku701300016.6%
Llama 3.1 70B77000015.4%
Gemma 3 12B2822196115.1%
GPT-4o, May 13th (temp=0)3815140013.4%
Gemini 2.5 Flash2923141013.3%
GPT-4o, Aug. 6th (temp=1)24220009.3%
GPT-4.1 Nano24109008.5%
GPT-4.1 Mini26105008.3%
Mistral Small 3.2 24B22100006.5%
Claude 3 Haiku2150005.3%
Gemini 2.5 Flash Lite1420003.1%
GPT-4o, May 13th (temp=1)1030002.6%
Ministral 3B1100002.2%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 Nemotron 70B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.6908680767681.7%
GPT-5858479756978.6%
GPT-5 Mini867978686775.7%
GPT-5.2777574736672.9%
ByteDance Seed 1.6 Flash867772645771.4%
Claude Opus 4.5767670696571.3%
Claude Opus 4.6827772675370.2%
Minimax M2.5847270675369.1%
Claude Opus 4827066595466.1%
GPT-5.1746969614763.8%
Z.AI GLM 4.7756666585463.8%
Claude Sonnet 4.5696662555361.1%
WizardLM 2 8x22b895959523759.3%
ByteDance Seed 1.6626159575659.0%
Z.AI GLM 5756354513756.1%
Claude 3.7 Sonnet676565582455.8%
Writer: Palmyra X5676154533754.4%
Qwen 3.5 397B A17B646362473453.9%
DeepSeek V3.2666557483353.8%
Claude Sonnet 4595756524553.7%
Mistral Small Creative646352464052.9%
Llama 3.1 8B646354523152.9%
Z.AI GLM 4.7 Flash786550393252.6%
Z.AI GLM 4.5735049454151.5%
Z.AI GLM 4.6716546462149.7%
Gemini 3 Pro (Preview)706438363448.4%
GPT-5 Nano515048474548.1%
Claude Haiku 4.5665546393347.7%
DeepSeek V3 (2025-03-24)645045432144.6%
Mistral Large 2614739373543.7%
Gemini 3.1 Pro (Preview)635237333243.2%
MoonshotAI: Kimi K2.5854637361042.9%
Qwen 3.5 Plus (2026-02-15)595844272542.7%
Gemini 3 Flash (Preview)524946353042.6%
DeepSeek V3.1544339363341.1%
Claude 3.5 Sonnet735242241140.4%
Mistral Medium 3.1554935332939.8%
Ministral 8B585637231938.5%
Arcee AI: Trinity Large (Preview)65622721936.7%
Cohere Command R+ (Aug. 2024)66433633035.7%
Rocinante 12B78502216033.2%
GPT-4.1403831252431.5%
Hermes 3 70B444227261430.5%
Ministral 3 8B8542222030.1%
Mistral Large59591310328.8%
Ministral 3 14B423827151026.3%
Gemma 3 27B50353111426.1%
DeepSeek-V2 Chat49282114523.5%
Mistral Large 339351411821.6%
Gemma 3 12B342423141121.2%
o4 Mini High38332012020.7%
Gemini 2.5 Pro342120141320.4%
Ministral 3 3B5028169020.3%
DeepSeek V3 (2024-12-26)40321512019.9%
GPT-4o, May 13th (temp=0)4331220019.3%
Claude 3 Haiku25222217317.8%
GPT-4.1 Mini4033150017.6%
Ministral 3B3426194016.5%
Grok 425201811215.2%
Gemma 3 4B421199515.1%
GPT-4.1 Nano3719181014.9%
Arcee AI: Trinity Mini3216159014.4%
Llama 3.1 Nemotron 70B3217154414.3%
Grok 4 Fast2322139414.3%
Claude 3.5 Haiku54800012.4%
Hermes 3 405B3115111011.7%
Mistral Small 3.2 24B302800011.6%
Qwen 2.5 72B301480010.5%
Llama 3.1 70B321350010.1%
Grok 4.1 Fast171514309.5%
o4 Mini25191008.9%
Gemini 2.5 Flash Lite1332003.6%
GPT-4o Mini (temp=1)1700003.3%
GPT-4o, May 13th (temp=1)533002.2%
Mistral NeMO900001.9%
GPT-4o Mini (temp=0)500001.0%
Gemini 2.5 Flash500001.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%