Purple prose (modifier overload)

Test: Bad Writing Habits

Avg. Score
95.7%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Stealth: Aurora Alpha99.3%$0.00009.8s95%
2ByteDance Seed 1.6 Flash99.1%$0.001327.3s95%
3GPT-4o Mini (temp=0)99.0%$0.001234.8s95%
4Grok 4 Fast98.3%$0.001724.1s94%
5Mistral NeMO97.8%$0.000510.1s93%
6GPT-4o, Aug. 6th (temp=0)98.6%$0.02322.7s95%
7Ministral 3B97.7%$0.00018.1s91%
8Qwen 2.5 72B98.0%$0.001036.7s92%
9Mistral Large98.0%$0.01430.9s93%
10o4 Mini97.8%$0.01525.7s93%
11Arcee AI: Trinity Mini96.6%$0.00039.2s91%
12Grok 4.1 Fast97.5%$0.001837.8s92%
13GPT-4o Mini (temp=1)97.6%$0.001234.8s91%
14Hermes 3 70B97.6%$0.00101.2m93%
15GPT-5 Mini97.6%$0.010057.4s93%
16Arcee AI: Trinity Large (Preview)97.0%$0.000043.6s91%
17Cohere Command R+ (Aug. 2024)97.9%$0.02052.5s92%
18Mistral Large 397.0%$0.003330.3s90%
19o4 Mini High97.8%$0.02547.2s93%
20DeepSeek V3 (2025-03-24)97.0%$0.001439.4s90%
21Ministral 8B96.0%$0.000410.4s89%
22Mistral Small Creative95.9%$0.00079.1s89%
23Qwen 3.5 Plus (2026-02-15)96.3%$0.006031.5s90%
24Mistral Large 296.7%$0.01329.4s90%
25Claude 3 Haiku96.6%$0.002514.9s87%
26Ministral 3 14B96.0%$0.000711.7s87%
27GPT-4o, Aug. 6th (temp=1)96.9%$0.01824.4s88%
28Llama 3.1 Nemotron 70B96.0%$0.003831.7s88%
29Grok 499.0%$0.0481.7m94%
30Z.AI GLM 597.2%$0.00841.2m89%
31Ministral 3 3B95.9%$0.000511.1s86%
32Ministral 3 8B95.6%$0.000819.6s87%
33Mistral Medium 3.195.8%$0.004836.5s89%
34Hermes 3 405B96.5%$0.003253.2s88%
35GPT-4o, May 13th (temp=0)96.4%$0.03514.1s89%
36DeepSeek V3 (2024-12-26)96.1%$0.002154.6s88%
37DeepSeek-V2 Chat95.9%$0.002153.3s88%
38WizardLM 2 8x22b96.9%$0.00261.8m91%
39GPT-4.196.2%$0.01844.7s88%
40Claude 3.5 Sonnet97.2%$0.04835.5s89%
41Rocinante 12B95.9%$0.001438.4s86%
42Z.AI GLM 4.7 Flash96.2%$0.00171.2m88%
43Claude Haiku 4.595.1%$0.01121.6s87%
44Writer: Palmyra X595.4%$0.01122.0s86%
45Claude Sonnet 4.596.0%$0.03538.1s89%
46Minimax M2.595.9%$0.00341.3m88%
47Qwen 3.5 397B A17B97.6%$0.0143.0m94%
48ByteDance Seed 1.697.4%$0.0132.5m91%
49Claude Sonnet 495.8%$0.03243.7s87%
50Claude Opus 4.697.2%$0.0781.2m92%
51MoonshotAI: Kimi K2.597.8%$0.0193.2m92%
52Z.AI GLM 4.795.9%$0.0101.4m86%
53Z.AI GLM 4.694.8%$0.006551.5s85%
54Llama 3.1 70B95.5%$0.001529.4s80%
55GPT-5 Nano95.0%$0.00421.4m86%
56Z.AI GLM 4.594.6%$0.005142.1s83%
57GPT-4o, May 13th (temp=1)94.9%$0.03314.4s83%
58GPT-5.296.3%$0.0561.5m90%
59GPT-4.1 Mini93.3%$0.002719.0s83%
60Gemini 2.5 Flash Lite93.7%$0.00099.5s81%
61Gemini 3 Flash (Preview)94.0%$0.007819.6s81%
62Gemini 3 Pro (Preview)95.5%$0.05554.4s85%
63Claude Sonnet 4.694.1%$0.03139.3s83%
64DeepSeek V3.294.9%$0.00141.9m83%
65Gemini 2.5 Pro94.3%$0.03636.2s82%
66Claude 3.5 Haiku93.3%$0.003510.8s77%
67GPT-597.2%$0.0652.8m91%
68Gemma 3 12B92.9%$0.000441.3s79%
69Gemini 2.5 Flash91.5%$0.005210.6s80%
70Claude Opus 4.594.8%$0.07053.4s84%
71Claude 3.7 Sonnet93.4%$0.04246.7s82%
72DeepSeek V3.193.2%$0.00201.8m81%
73GPT-5.194.4%$0.0541.8m82%
74Gemma 3 27B91.7%$0.000652.6s75%
75Claude Opus 496.7%$0.2091.4m90%
76Gemma 3 4B88.7%$0.000220.0s71%
77GPT-4.1 Nano87.5%$0.000713.3s73%
78Mistral Small 3.2 24B95.3%$0.00695.7m83%
79Llama 3.1 8B87.7%$0.00031.3m66%
80Gemini 3.1 Pro (Preview)86.8%$0.1071.8m73%
95.73%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Mistral Large100100100989799.0%
Z.AI GLM 4.510010099989798.9%
o4 Mini High1001001001009498.8%
DeepSeek-V2 Chat1001001001009498.8%
GPT-4o, Aug. 6th (temp=1)1001001001009498.8%
GPT-4o Mini (temp=0)1001001001009498.8%
Ministral 3 14B1001001001009498.8%
Mistral Small 3.2 24B1001001001009498.8%
Mistral NeMO1001001001009498.8%
Mistral Medium 3.1100100100999498.6%
Claude Opus 4100100100989498.5%
GPT-4o, May 13th (temp=1)100100100969498.0%
Arcee AI: Trinity Large (Preview)100100100969498.0%
Ministral 8B1001001001008997.9%
Ministral 3B100100100999097.8%
Mistral Large 3100100100949497.7%
GPT-5 Mini100100100949497.6%
Claude Opus 4.6100100100949497.6%
MoonshotAI: Kimi K2.51001001001008897.6%
GPT-5100100100949497.6%
DeepSeek V3 (2024-12-26)1001001001008897.6%
Cohere Command R+ (Aug. 2024)100100100949497.6%
Hermes 3 70B100100100949096.8%
Llama 3.1 8B10010096959296.5%
o4 Mini10010094949496.4%
GPT-5.210010094949496.4%
Z.AI GLM 4.7100100100948896.4%
ByteDance Seed 1.610010094949496.4%
Stealth: Aurora Alpha1001001001008296.4%
DeepSeek V3 (2025-03-24)10010094949496.4%
ByteDance Seed 1.6 Flash100100100948896.4%
GPT-4.110010096949296.3%
Ministral 3 8B100100100948796.2%
Llama 3.1 Nemotron 70B100100100948696.1%
Hermes 3 405B1001001001008096.0%
Writer: Palmyra X5100100100948395.5%
Gemma 3 12B1009594949495.4%
Gemini 3 Pro (Preview)1009494949495.2%
Z.AI GLM 4.7 Flash1009494949495.2%
Qwen 3.5 Plus (2026-02-15)1009494949495.2%
Mistral Large 21009494949495.2%
Claude Sonnet 4.5999797948895.0%
Gemma 3 27B100100100888795.0%
Claude Sonnet 4.610010098888894.8%
Claude Sonnet 410010094938594.4%
Mistral Small Creative10010094948294.0%
Qwen 3.5 397B A17B1009494948894.0%
Qwen 2.5 72B10010094888894.0%
Gemini 2.5 Flash Lite1009894908893.9%
Claude 3.7 Sonnet100100100897893.4%
Claude Haiku 4.51009594918793.4%
Gemini 2.5 Flash1009493928893.4%
GPT-4.1 Nano1009493918993.4%
GPT-4.1 Mini949494949093.1%
Minimax M2.51009897927792.9%
DeepSeek V3.2100100100828292.8%
Claude Opus 4.51009694927892.0%
Llama 3.1 70B1001001001006092.0%
WizardLM 2 8x22b949494908791.7%
Z.AI GLM 4.610010088888291.6%
Z.AI GLM 51009897966691.5%
GPT-5 Nano10010099827691.4%
GPT-4o, May 13th (temp=0)100100100946090.8%
Ministral 3 3B10010094887090.4%
Claude 3.5 Haiku100100100787390.2%
GPT-5.11009488887689.2%
Gemini 2.5 Pro1008888888289.2%
Gemini 3 Flash (Preview)1009488827688.0%
DeepSeek V3.11008888887086.8%
Gemini 3.1 Pro (Preview)949482827685.6%
Gemma 3 4B929286837485.4%
Rocinante 12B1009487686783.2%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5 Mini100100100100100100.0%
GPT-5 Nano100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Arcee AI: Trinity Mini1001001001009899.5%
Claude 3 Haiku100100100989899.1%
Ministral 3 8B1001001001009699.1%
Claude Opus 4.61001001001009498.8%
Stealth: Aurora Alpha1001001001009498.8%
GPT-4o Mini (temp=0)1001001001009498.8%
Grok 4 Fast1001001001009498.7%
Mistral Large1001001001009398.6%
Ministral 3 14B100100100989197.9%
ByteDance Seed 1.6100100100959497.8%
GPT-4o, Aug. 6th (temp=1)100100100969397.7%
Qwen 3.5 397B A17B100100100949497.6%
GPT-5100100100949497.6%
o4 Mini100100100949497.6%
Z.AI GLM 4.7100100100949497.6%
Qwen 3.5 Plus (2026-02-15)100100100949497.6%
GPT-4o, May 13th (temp=0)100100100949497.6%
GPT-4o, Aug. 6th (temp=0)100100100949497.6%
Cohere Command R+ (Aug. 2024)1001001001008897.6%
Mistral Medium 3.1100100100949397.5%
Hermes 3 70B10010099949497.3%
Claude Opus 410010099988997.2%
Arcee AI: Trinity Large (Preview)10010096949496.9%
Rocinante 12B100100100968896.8%
DeepSeek V3.1100100100948996.6%
MoonshotAI: Kimi K2.5100100100948896.4%
Grok 4.1 Fast10010094949496.4%
Z.AI GLM 4.7 Flash100100100948896.4%
Llama 3.1 70B1001001001008296.4%
Qwen 2.5 72B100100100948896.4%
Claude Sonnet 410010097949196.3%
Mistral Small 3.2 24B100100100948796.2%
Mistral NeMO1001001001008196.2%
Mistral Small Creative10010098929196.2%
GPT-4o Mini (temp=1)1009995949195.9%
Claude Opus 4.51009896949195.8%
Mistral Large 210010095948995.6%
GPT-5.110010094949095.6%
Z.AI GLM 5100100100988095.6%
o4 Mini High10010094948895.2%
Gemini 3 Pro (Preview)100100100888895.2%
Writer: Palmyra X5100100100948195.0%
WizardLM 2 8x22b1009795948894.7%
Claude 3.5 Haiku100100100888594.6%
DeepSeek V3 (2025-03-24)1001001001007394.5%
Claude 3.7 Sonnet1009595938994.4%
Minimax M2.510010094938594.2%
Claude Haiku 4.510010092908994.2%
DeepSeek V3 (2024-12-26)10010097908494.1%
Grok 4100100100947794.1%
DeepSeek V3.210010094948294.0%
Llama 3.1 Nemotron 70B999994938293.6%
Z.AI GLM 4.61009493928893.4%
Claude Sonnet 4.51009995868693.2%
Claude Sonnet 4.610010095947793.1%
GPT-4o, May 13th (temp=1)1009493918893.1%
Claude 3.5 Sonnet1009994947792.8%
Gemini 2.5 Pro1009494888792.6%
Gemini 3 Flash (Preview)949494928892.5%
GPT-5.21009494888391.8%
Gemini 2.5 Flash Lite10010096897091.1%
GPT-4.1 Nano949493928191.0%
Mistral Large 3959392898490.7%
GPT-4.11009494917490.6%
GPT-4.1 Mini969494868290.4%
Z.AI GLM 4.5959493878390.2%
Gemma 3 4B1009896807689.9%
Llama 3.1 8B100100100706787.4%
DeepSeek-V2 Chat948988848187.1%
Gemma 3 12B1009694766486.2%
Gemma 3 27B1009190757185.6%
Ministral 3 3B1009982786885.5%
Gemini 2.5 Flash918986817784.6%
Gemini 3.1 Pro (Preview)1008882827084.4%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
DeepSeek-V2 Chat10010010010010099.9%
Mistral Large 21001001001009999.8%
Cohere Command R+ (Aug. 2024)1001001001009799.3%
Mistral Small Creative1001001001009699.3%
Mistral Large 3100100100999799.2%
DeepSeek V3 (2025-03-24)1001001001009699.2%
Z.AI GLM 51001001001009498.9%
Qwen 3.5 397B A17B1001001001009498.8%
GPT-5 Mini1001001001009498.8%
GPT-51001001001009498.8%
Grok 4.1 Fast1001001001009498.8%
Z.AI GLM 4.7 Flash1001001001009498.8%
Grok 4 Fast1001001001009498.8%
Claude 3.5 Sonnet1001001001009498.8%
DeepSeek V3 (2024-12-26)1001001001009498.8%
GPT-4o Mini (temp=0)1001001001009498.8%
Arcee AI: Trinity Mini1001001001009498.8%
Mistral NeMO1001001001009498.8%
WizardLM 2 8x22b100100100999498.7%
Claude Sonnet 4.5100100100999498.5%
Ministral 3B1001001001009298.5%
Ministral 8B1001001001009198.2%
GPT-5 Nano10010099979598.2%
Arcee AI: Trinity Large (Preview)100100100979398.1%
Mistral Large1001001001009097.9%
Claude Sonnet 4.610010098989397.9%
Ministral 3 3B1001001001008997.7%
MoonshotAI: Kimi K2.5100100100949497.6%
o4 Mini100100100949497.6%
GPT-5.2100100100949497.6%
ByteDance Seed 1.61001001001008897.6%
GPT-4.1100100100949497.6%
Gemini 2.5 Flash100100100999097.6%
Qwen 3.5 Plus (2026-02-15)10010099949497.4%
Z.AI GLM 4.510010099949497.4%
Mistral Medium 3.1100100100949397.3%
Hermes 3 405B100100100949397.3%
Claude 3.7 Sonnet100100100998897.2%
GPT-4.1 Mini100100100949397.2%
Z.AI GLM 4.710010097949497.0%
o4 Mini High10010094949496.4%
Claude Opus 4.610010094949496.4%
GPT-5.110010094949496.4%
GPT-4o, Aug. 6th (temp=0)10010094949496.4%
Qwen 2.5 72B100100100948896.4%
Claude Sonnet 4100100100948796.2%
Hermes 3 70B1009894949496.0%
Claude Opus 410010096929095.7%
Rocinante 12B100100100918795.5%
Writer: Palmyra X51009594949495.4%
Claude 3.5 Haiku1009494949495.2%
Z.AI GLM 4.61009494949495.2%
DeepSeek V3.2100100100948295.2%
Llama 3.1 8B100100100918495.0%
Llama 3.1 70B100100100888694.7%
Gemma 3 12B10010098918594.6%
Gemini 3 Pro (Preview)10010094888894.0%
Ministral 3 8B999594948793.9%
Gemini 2.5 Flash Lite1009594938893.9%
GPT-4o, May 13th (temp=0)1009494948793.7%
Ministral 3 14B1009996937793.1%
Minimax M2.5989592909093.1%
DeepSeek V3.11009494898893.0%
Gemini 3 Flash (Preview)100100100887692.8%
GPT-4o, May 13th (temp=1)999696947792.5%
GPT-4o, Aug. 6th (temp=1)10010098827490.8%
Claude Opus 4.5989292907990.5%
Gemini 2.5 Pro10010094827690.4%
Llama 3.1 Nemotron 70B999494947190.3%
Claude 3 Haiku10010094886589.5%
GPT-4o Mini (temp=1)1009494847288.8%
Claude Haiku 4.5999586827888.0%
GPT-4.1 Nano919085817584.3%
Gemini 3.1 Pro (Preview)888882827683.2%
Gemma 3 27B939283785480.0%
Gemma 3 4B928482795077.4%
Mistral Small 3.2 24B1008266625973.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Mistral Large100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Mistral NeMO100100100100100100.0%
Ministral 3B100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)1001001001009999.9%
GPT-4o, Aug. 6th (temp=0)1001001001009999.9%
Z.AI GLM 4.5100100100999999.6%
GPT-4o Mini (temp=1)1001001001009799.5%
GPT-4o, May 13th (temp=1)1001001001009799.4%
Claude Haiku 4.5100100100999899.3%
Claude Opus 41001001001009498.9%
GPT-5 Mini1001001001009498.8%
o4 Mini High1001001001009498.8%
GPT-51001001001009498.8%
Gemini 2.5 Pro1001001001009498.8%
Gemini 3 Pro (Preview)1001001001009498.8%
Grok 41001001001009498.8%
Llama 3.1 70B1001001001009498.8%
Qwen 2.5 72B1001001001009498.8%
Mistral Small 3.2 24B1001001001009498.8%
Hermes 3 70B100100100999498.6%
Mistral Large 3100100100989498.3%
Minimax M2.51001001001009198.1%
Ministral 3 3B100100100959497.8%
Claude Sonnet 4.6100100100959497.7%
Gemini 2.5 Flash Lite1001001001008997.7%
Qwen 3.5 397B A17B100100100949497.6%
GPT-5.1100100100949497.6%
Grok 4.1 Fast100100100949497.6%
Qwen 3.5 Plus (2026-02-15)100100100949497.6%
WizardLM 2 8x22b100100100949497.6%
Cohere Command R+ (Aug. 2024)100100100979197.6%
Writer: Palmyra X510010098969497.5%
Claude Opus 4.5100100100949497.5%
Mistral Large 2100100100949397.5%
Claude Sonnet 4.5100100100969197.5%
Llama 3.1 Nemotron 70B100100100949297.2%
Mistral Medium 3.110010099959196.9%
Ministral 8B10010097949296.8%
Rocinante 12B100100100949096.7%
Claude 3 Haiku10010097949296.5%
DeepSeek V3 (2024-12-26)1009999949096.4%
Z.AI GLM 4.710010094949496.4%
ByteDance Seed 1.610010094949496.4%
GPT-5 Nano10010094949496.4%
Grok 4 Fast10010094949496.4%
GPT-4o, May 13th (temp=0)10010094949496.4%
DeepSeek V3.110010094949496.4%
Arcee AI: Trinity Mini100100100948896.4%
Ministral 3 14B10010098929196.3%
Llama 3.1 8B1001001001008196.2%
Claude Sonnet 410010096938895.5%
Z.AI GLM 4.610010094948895.2%
GPT-4.110010094948895.2%
DeepSeek V3.210010094948895.2%
Arcee AI: Trinity Large (Preview)1009898948595.0%
Gemma 3 12B10010099928395.0%
Claude 3.7 Sonnet10010095948594.9%
DeepSeek V3 (2025-03-24)1009894938894.6%
Z.AI GLM 4.7 Flash1009794948794.5%
Hermes 3 405B10010099957894.3%
Claude 3.5 Haiku100100100997093.9%
Ministral 3 8B10010094888793.7%
GPT-4.1 Mini1009794888793.2%
Gemini 2.5 Flash10010093898593.1%
Gemma 3 27B10010096858192.4%
GPT-4.1 Nano939391888289.3%
Gemini 3 Flash (Preview)1009494827689.2%
Gemini 3.1 Pro (Preview)948882826482.0%
Gemma 3 4B929084756981.9%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Grok 4100100100100100100.0%
DeepSeek-V2 Chat1001001001009999.9%
Mistral Large1001001001009999.8%
ByteDance Seed 1.6 Flash1001001001009899.6%
Minimax M2.51001001001009899.5%
o4 Mini High1001001001009498.8%
GPT-51001001001009498.8%
o4 Mini1001001001009498.8%
Z.AI GLM 51001001001009498.8%
Grok 4 Fast1001001001009498.8%
GPT-4o, May 13th (temp=0)1001001001009498.8%
Cohere Command R+ (Aug. 2024)1001001001009498.8%
GPT-4o, Aug. 6th (temp=0)1001001001009398.6%
GPT-4o Mini (temp=1)100100100969698.4%
Claude 3 Haiku100100100989398.3%
Hermes 3 70B1001001001009198.3%
Claude Opus 4.6100100100979498.2%
Ministral 3 14B100100100969497.9%
Ministral 3B100100100969497.9%
GPT-5.1100100100949497.6%
ByteDance Seed 1.61001001001008897.6%
Z.AI GLM 4.7 Flash100100100949497.6%
Claude 3.5 Sonnet100100100949497.6%
DeepSeek V3.21001001001008897.6%
Mistral NeMO10010099949497.5%
DeepSeek V3 (2024-12-26)10010099949497.3%
Claude Opus 410010099939397.0%
DeepSeek V3 (2025-03-24)100100100978896.9%
Claude Sonnet 4100100100949096.8%
GPT-5 Nano10010098949296.8%
GPT-4.1100100100958896.5%
Arcee AI: Trinity Large (Preview)100100100919196.5%
Z.AI GLM 4.710010094949496.4%
Gemini 3 Pro (Preview)100100100948896.4%
Grok 4.1 Fast10010094949496.4%
Mistral Small Creative10010097939296.4%
Claude Sonnet 4.5100100100948896.3%
GPT-4o, May 13th (temp=1)10010094949296.1%
Writer: Palmyra X5100100100948595.9%
GPT-5 Mini10010098948895.9%
Mistral Large 310010095949095.7%
GPT-4o, Aug. 6th (temp=1)100100100948495.7%
Hermes 3 405B10010096948895.5%
Gemini 2.5 Pro10010094948895.2%
GPT-5.210010094948895.2%
Z.AI GLM 4.61009494949495.2%
Mistral Small 3.2 24B1009494949495.2%
Arcee AI: Trinity Mini10010094948895.2%
Gemini 3 Flash (Preview)100100100888895.1%
Llama 3.1 Nemotron 70B1009895929094.9%
DeepSeek V3.11009494949294.7%
Rocinante 12B10010095948494.5%
Mistral Large 2100100100908294.5%
WizardLM 2 8x22b10010096898894.4%
Ministral 3 8B10010094948494.4%
GPT-4.1 Mini10010094878593.1%
Mistral Medium 3.1979695928593.0%
Gemini 2.5 Flash Lite10010094858592.8%
Claude Opus 4.5999693888792.6%
Claude Sonnet 4.6999792888792.6%
Ministral 3 3B100100100906791.4%
Llama 3.1 70B1009390908391.0%
Gemini 2.5 Flash969490888590.8%
Gemma 3 27B1009393858090.2%
Claude Haiku 4.5969190908089.4%
Gemma 3 12B1009492817989.0%
Ministral 8B1009987867388.9%
GPT-4.1 Nano998786868388.3%
Claude 3.7 Sonnet1009884817988.2%
Z.AI GLM 4.5949391886786.7%
Llama 3.1 8B10010092776085.7%
Gemini 3.1 Pro (Preview)948883828285.7%
Claude 3.5 Haiku1009789854082.3%
Gemma 3 4B777371665869.1%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5 Mini100100100100100100.0%
Claude Opus 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
DeepSeek V3 (2024-12-26)10010010010010099.9%
Claude 3.5 Sonnet1001001001009999.9%
Claude Haiku 4.51001001001009999.8%
Claude Opus 4.51001001001009999.8%
Arcee AI: Trinity Large (Preview)1001001001009799.5%
Rocinante 12B1001001001009799.4%
Z.AI GLM 51001001001009498.8%
Claude Sonnet 41001001001009498.8%
GPT-4.11001001001009498.8%
GPT-4o, Aug. 6th (temp=1)1001001001009498.8%
Mistral Large 21001001001009498.8%
GPT-4o Mini (temp=0)1001001001009498.8%
Llama 3.1 70B1001001001009498.8%
Ministral 3 14B1001001001009498.8%
Hermes 3 70B1001001001009498.8%
Claude 3 Haiku1001001001009498.8%
Ministral 3 8B1001001001009498.8%
Arcee AI: Trinity Mini1001001001009498.8%
Mistral NeMO1001001001009498.8%
Cohere Command R+ (Aug. 2024)1001001001009498.8%
Z.AI GLM 4.51001001001009398.7%
GPT-4o, May 13th (temp=1)100100100999498.5%
DeepSeek-V2 Chat100100100969498.0%
Mistral Large 3100100100969497.9%
Claude 3.7 Sonnet1001001001009097.9%
Qwen 3.5 397B A17B100100100949497.6%
Claude Opus 4.6100100100949497.6%
o4 Mini1001001001008897.6%
Z.AI GLM 4.71001001001008897.6%
Grok 4 Fast100100100949497.6%
DeepSeek V3.2100100100949497.6%
ByteDance Seed 1.6 Flash100100100949497.6%
Llama 3.1 Nemotron 70B1001001001008897.6%
o4 Mini High10010099949497.4%
Mistral Large100100100949197.0%
GPT-4.1 Nano100100100978896.9%
GPT-5.1100100100948896.4%
Gemini 2.5 Pro100100100948896.4%
GPT-5.210010094949496.4%
ByteDance Seed 1.6100100100948896.4%
Z.AI GLM 4.6100100100948896.4%
Qwen 3.5 Plus (2026-02-15)10010094949496.4%
GPT-5 Nano100100100948896.4%
GPT-4.1 Mini100100100948896.4%
Ministral 8B10010094949496.4%
Writer: Palmyra X510010094949396.2%
Hermes 3 405B10010094949396.2%
Claude 3.5 Haiku100100100958596.0%
Gemini 2.5 Flash Lite100100100998195.9%
GPT-4o Mini (temp=1)100100100948595.9%
Gemma 3 4B10010096948895.6%
MoonshotAI: Kimi K2.51009494949495.2%
GPT-510010094948895.2%
Gemini 3 Pro (Preview)10010094948895.2%
Minimax M2.510010094948895.2%
Gemini 3 Flash (Preview)10010094948895.2%
Z.AI GLM 4.7 Flash10010094948895.2%
Gemma 3 12B100100100967393.9%
Gemma 3 27B1001001001006993.7%
DeepSeek V3.1949494948892.8%
Mistral Medium 3.110010094888192.6%
Mistral Small Creative1009494928292.4%
Llama 3.1 8B10010091888292.1%
Grok 4.1 Fast1009494888291.6%
Gemini 2.5 Flash1009694848291.1%
Gemini 3.1 Pro (Preview)948888707082.0%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 405B1001001001009999.9%
Ministral 3B1001001001009899.6%
GPT-4o Mini (temp=1)1001001001009699.3%
Claude Opus 4.6100100100989799.1%
Mistral Large 31001001001009599.0%
Grok 4.1 Fast1001001001009498.8%
GPT-4o Mini (temp=0)1001001001009498.8%
Cohere Command R+ (Aug. 2024)1001001001009498.8%
WizardLM 2 8x22b1001001001009498.8%
GPT-4o, Aug. 6th (temp=1)100100100999498.7%
Llama 3.1 70B100100100999398.5%
Mistral Large100100100999498.5%
Claude 3 Haiku100100100989498.4%
Ministral 3 3B100100100979498.3%
Hermes 3 70B100100100979498.2%
GPT-4.1 Mini10010098979497.8%
GPT-4o, Aug. 6th (temp=0)100100100949497.6%
ByteDance Seed 1.6 Flash100100100949497.6%
Mistral NeMO1001001001008897.6%
MoonshotAI: Kimi K2.5100100100949497.6%
DeepSeek V3 (2025-03-24)100100100998997.5%
Z.AI GLM 4.7 Flash100100100949397.5%
Claude 3.5 Sonnet10010099959096.8%
Z.AI GLM 5100100100949096.8%
Ministral 3 8B100100100949096.7%
Mistral Large 2100100100958896.6%
Ministral 8B100100100998496.6%
Writer: Palmyra X51009898949296.4%
Gemma 3 4B10010098949096.4%
ByteDance Seed 1.610010094949496.4%
Llama 3.1 Nemotron 70B100100100948896.4%
Qwen 2.5 72B100100100948896.4%
Arcee AI: Trinity Mini10010094949496.4%
Claude Sonnet 410010097968996.4%
Rocinante 12B1009998949096.3%
Gemma 3 27B1009998949196.3%
Claude Opus 41009996949095.9%
Z.AI GLM 4.6100100100948695.9%
Z.AI GLM 4.71009894949395.9%
Gemma 3 12B10010095938895.3%
Qwen 3.5 397B A17B1009494949495.2%
GPT-5 Mini10010094948895.2%
o4 Mini High10010094948895.2%
o4 Mini10010094948895.2%
Grok 4 Fast10010094948895.2%
DeepSeek V3.210010094948895.2%
Claude Opus 4.5979795949394.9%
Mistral Small Creative10010094938494.3%
Claude Haiku 4.510010094938494.2%
DeepSeek V3 (2024-12-26)1009993938694.1%
Qwen 3.5 Plus (2026-02-15)1009494948894.0%
Ministral 3 14B989895958393.8%
Claude 3.7 Sonnet1009996868693.3%
GPT-5.21009796928193.2%
DeepSeek-V2 Chat10010089878792.6%
GPT-51009494947691.6%
Gemini 3 Flash (Preview)100100100827691.6%
Mistral Medium 3.110010094887691.6%
GPT-4.1949493938291.2%
Z.AI GLM 4.51009289888691.0%
Claude Sonnet 4.51009393858190.6%
Gemini 3 Pro (Preview)1009488888290.4%
GPT-5 Nano1009488888290.4%
GPT-4o, May 13th (temp=0)949488888890.4%
GPT-4o, May 13th (temp=1)10010098787189.3%
Gemini 2.5 Pro949492828288.7%
Claude 3.5 Haiku999188837787.6%
GPT-5.1938888858287.3%
Gemini 2.5 Flash888887878586.8%
GPT-4.1 Nano929289808086.6%
Minimax M2.5969487837186.1%
Gemini 2.5 Flash Lite888887867885.3%
Claude Sonnet 4.6928786807884.7%
DeepSeek V3.1948888827084.4%
Llama 3.1 8B1009474676379.5%
Gemini 3.1 Pro (Preview)948681746279.5%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4 Fast100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
GPT-4o Mini (temp=0)10010010010010099.9%
Claude 3 Haiku1001001001009999.9%
GPT-4o, Aug. 6th (temp=1)1001001001009999.8%
Llama 3.1 70B100100100989899.2%
Rocinante 12B1001001001009498.9%
ByteDance Seed 1.61001001001009498.8%
ByteDance Seed 1.6 Flash1001001001009498.8%
Hermes 3 70B1001001001009498.8%
Mistral Large100100100989598.5%
Ministral 3 3B100100100999298.3%
Ministral 3B1009998979597.9%
GPT-4o, May 13th (temp=0)10010098979497.8%
Stealth: Aurora Alpha1001001001008897.6%
GPT-4o, Aug. 6th (temp=0)1001001001008897.6%
Mistral NeMO100100100949497.6%
Hermes 3 405B100100100998997.5%
Minimax M2.51009997969597.3%
Mistral Large 310010099988997.2%
GPT-4o Mini (temp=1)10010097949497.0%
Gemini 2.5 Flash Lite1009996969496.7%
GPT-4.11009997949496.7%
Ministral 3 14B10010098949296.7%
Grok 4.1 Fast100100100948896.4%
Qwen 3.5 397B A17B10010094949496.4%
Cohere Command R+ (Aug. 2024)100100100948896.4%
Grok 410010097929196.1%
Gemini 3 Flash (Preview)10010098948896.0%
o4 Mini10010094948995.5%
GPT-4o, May 13th (temp=1)100100100948395.3%
Llama 3.1 Nemotron 70B989897939095.3%
Mistral Small 3.2 24B1009494949495.2%
Arcee AI: Trinity Mini1009895948895.1%
Claude Haiku 4.51009994919094.7%
Z.AI GLM 510010097977994.7%
Arcee AI: Trinity Large (Preview)1009494929294.3%
Mistral Small Creative1009894908994.2%
GPT-5.21009694928894.1%
o4 Mini High10010094948294.0%
GPT-5100100100947694.0%
GPT-5 Mini1009494948894.0%
Claude Opus 4.61009792918893.4%
Ministral 3 8B1009593908993.4%
Gemma 3 4B1009493898993.1%
Ministral 8B1009593898993.1%
DeepSeek V3 (2025-03-24)1009593918793.1%
Claude Opus 410010094917992.9%
Z.AI GLM 4.7 Flash10010094888292.8%
DeepSeek V3.210010094888292.8%
Z.AI GLM 4.7949494948892.8%
Mistral Medium 3.11009491898792.3%
Writer: Palmyra X51009690898692.3%
Claude Opus 4.5979494888792.0%
Z.AI GLM 4.61009594888291.8%
MoonshotAI: Kimi K2.51009488888891.6%
Gemini 3 Pro (Preview)1009494888291.6%
Claude Sonnet 4.51009988868591.6%
Mistral Large 2959490878690.4%
GPT-5 Nano1009488858289.7%
Qwen 3.5 Plus (2026-02-15)949494828289.2%
Gemini 2.5 Pro1009488828289.2%
DeepSeek V3 (2024-12-26)959493867388.1%
GPT-4.1 Mini929090887987.9%
Z.AI GLM 4.5969493817587.8%
Claude Sonnet 4.6949288847987.6%
Claude 3.5 Sonnet958988847887.2%
Claude Sonnet 4939089857786.8%
Claude 3.7 Sonnet949486827786.6%
DeepSeek-V2 Chat958989857085.5%
Gemma 3 12B978887837285.3%
Gemma 3 27B1008986846584.8%
Gemini 3.1 Pro (Preview)1009481767284.5%
DeepSeek V3.1968886827084.4%
Gemini 2.5 Flash948981797483.2%
GPT-5.1918582786981.2%
Claude 3.5 Haiku949485834379.6%
GPT-4.1 Nano888786746279.4%
Llama 3.1 8B1007875605974.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
MoonshotAI: Kimi K2.51001001001009999.8%
Mistral Large 21001001001009899.6%
Ministral 8B100100100999598.9%
Claude Sonnet 41001001001009498.8%
Grok 4 Fast1001001001009498.8%
Claude 3.5 Sonnet100100100999498.6%
Qwen 2.5 72B1001001001009298.4%
ByteDance Seed 1.61001001001008997.9%
Mistral Large 3100100100989197.8%
Qwen 3.5 397B A17B100100100949497.6%
o4 Mini High100100100949497.6%
o4 Mini100100100949497.6%
Claude 3 Haiku100100100949497.6%
GPT-4.11001001001008897.5%
DeepSeek V3 (2024-12-26)10010099949397.3%
Claude Opus 4.6100100100959197.3%
Arcee AI: Trinity Mini100100100949297.2%
Claude Sonnet 4.5100100100949197.0%
ByteDance Seed 1.6 Flash10010097949497.0%
GPT-5100100100978896.9%
Ministral 3 14B10010098959196.8%
Mistral Large10010099949196.8%
Mistral Medium 3.110010096959296.5%
Grok 4.1 Fast100100100948896.4%
WizardLM 2 8x22b10010094949496.4%
Hermes 3 70B10010098948896.1%
Llama 3.1 Nemotron 70B999994949496.0%
GPT-4o, Aug. 6th (temp=1)1001001001007995.9%
Z.AI GLM 510010098948895.9%
Mistral Small 3.2 24B10010094949195.8%
DeepSeek-V2 Chat10010098909095.6%
DeepSeek V3 (2025-03-24)100100100898895.4%
Minimax M2.51009994949095.4%
GPT-5 Mini1009494949495.2%
Gemini 3 Flash (Preview)1009494949495.2%
GPT-4o, Aug. 6th (temp=0)1009494949495.2%
GPT-4o Mini (temp=1)10010094948895.2%
Cohere Command R+ (Aug. 2024)10010094948895.2%
Mistral NeMO1009494949495.2%
Arcee AI: Trinity Large (Preview)10010096948594.9%
Ministral 3 3B1009998938594.8%
Claude Opus 410010097898894.8%
Gemini 3 Pro (Preview)1009494948894.0%
Qwen 3.5 Plus (2026-02-15)1009494948894.0%
GPT-4o Mini (temp=0)949494949494.0%
Claude Haiku 4.51009996878793.8%
Z.AI GLM 4.710010093888893.8%
Z.AI GLM 4.51009592929193.8%
GPT-4o, May 13th (temp=0)1009494948693.6%
GPT-5.2999896948093.5%
Claude Opus 4.51009494918993.5%
Claude 3.7 Sonnet999494938593.2%
Gemini 2.5 Flash Lite10010095937893.1%
Gemma 3 27B1009794938192.9%
Z.AI GLM 4.7 Flash949494948892.8%
Claude Sonnet 4.61009692918492.5%
Ministral 3 8B1009795947692.4%
Writer: Palmyra X51009693918192.3%
DeepSeek V3.110010094858292.3%
Ministral 3B1009290888791.5%
GPT-4.1 Mini949493908691.5%
Gemma 3 12B989491888791.5%
GPT-4o, May 13th (temp=1)1009494927791.4%
Z.AI GLM 4.6949492888891.3%
DeepSeek V3.21009488888691.2%
Mistral Small Creative1009488878591.0%
Gemini 2.5 Pro1009494828290.4%
GPT-5 Nano1009492897690.1%
GPT-5.1979391878290.0%
Claude 3.5 Haiku969489868489.9%
Rocinante 12B10010088877589.8%
Hermes 3 405B10010094896689.8%
Gemma 3 4B999692827989.5%
Gemini 2.5 Flash1009484848288.8%
Gemini 3.1 Pro (Preview)999491846787.1%
Llama 3.1 8B1009482775481.4%
GPT-4.1 Nano977575676675.9%
Llama 3.1 70B1009776523571.9%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Minimax M2.5100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Mistral Large100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)10010010010010099.9%
Z.AI GLM 51001001001009999.8%
Ministral 3B1001001001009899.6%
DeepSeek V3 (2024-12-26)1001001001009699.2%
Qwen 3.5 397B A17B1001001001009498.8%
Grok 41001001001009498.8%
Grok 4.1 Fast1001001001009498.8%
Z.AI GLM 4.61001001001009498.8%
Stealth: Aurora Alpha1001001001009498.8%
Grok 4 Fast1001001001009498.8%
DeepSeek V3.11001001001009498.8%
Hermes 3 405B1001001001009498.8%
Llama 3.1 70B1001001001009498.8%
Arcee AI: Trinity Large (Preview)1001001001009498.8%
Arcee AI: Trinity Mini1001001001009498.8%
Rocinante 12B1001001001009498.8%
Claude 3.5 Haiku1001001001009298.4%
GPT-5100100100989498.3%
Mistral Large 2100100100989498.3%
Gemma 3 27B1001001001009198.3%
Z.AI GLM 4.510010099989498.1%
Writer: Palmyra X5100100100989298.0%
DeepSeek V3 (2025-03-24)100100100959497.8%
Claude Sonnet 4.5100100100949497.7%
Claude Opus 4.5100100100949497.6%
o4 Mini100100100949497.6%
ByteDance Seed 1.6100100100949497.6%
Qwen 3.5 Plus (2026-02-15)1001001001008897.6%
GPT-5 Nano100100100949497.6%
Llama 3.1 Nemotron 70B100100100949497.6%
Hermes 3 70B100100100949497.6%
Mistral NeMO100100100949497.6%
Cohere Command R+ (Aug. 2024)1001001001008797.4%
Gemini 2.5 Flash10010099949497.3%
DeepSeek-V2 Chat100100100949397.3%
Ministral 3 3B100100100949197.0%
Claude 3.7 Sonnet100100100939296.8%
Gemini 2.5 Flash Lite100100100949096.8%
GPT-5 Mini10010094949496.4%
Gemini 2.5 Pro100100100948896.4%
Gemini 3 Pro (Preview)10010094949496.4%
GPT-4.110010094949496.4%
DeepSeek V3.2100100100948896.4%
Mistral Small Creative10010094949496.4%
Claude Sonnet 4.61009999958896.3%
GPT-4o Mini (temp=1)1009994949496.3%
Mistral Medium 3.1100100100908895.5%
Z.AI GLM 4.7 Flash1009494949495.2%
GPT-5.210010094948895.1%
Ministral 3 14B989896948995.0%
Claude Opus 410010097958194.7%
Z.AI GLM 4.710010094888894.0%
GPT-4o, Aug. 6th (temp=1)10010093918593.9%
Claude 3.5 Sonnet100100100868493.9%
Claude Opus 4.6999494948893.8%
Ministral 8B1009693898893.2%
Claude Sonnet 41009494948292.9%
Claude Haiku 4.5959494908792.2%
Claude 3 Haiku10010097946992.1%
Gemini 3 Flash (Preview)1009488888891.6%
Llama 3.1 8B10010088888291.5%
GPT-5.1949492888891.1%
Ministral 3 8B1009493897990.9%
Gemini 3.1 Pro (Preview)989491888090.1%
Gemma 3 12B1009794797889.7%
GPT-4o, May 13th (temp=1)10010087847689.3%
GPT-4.1 Mini979088878088.5%
Gemma 3 4B959489827186.1%
GPT-4.1 Nano948584817684.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Ministral 8B100100100100100100.0%
GPT-4o Mini (temp=0)10010010010010099.9%
GPT-4o Mini (temp=1)1001001001009899.6%
o4 Mini1001001001009498.8%
ByteDance Seed 1.61001001001009498.8%
Grok 4 Fast1001001001009498.8%
Mistral Medium 3.11001001001009498.7%
MoonshotAI: Kimi K2.5100100100999498.7%
Mistral Large 2100100100979698.5%
Cohere Command R+ (Aug. 2024)1001001001009298.3%
Rocinante 12B1001001001009198.2%
Qwen 3.5 Plus (2026-02-15)1009999989498.1%
Hermes 3 70B1001001001009098.0%
Ministral 3B100100100979398.0%
Z.AI GLM 510010097979497.7%
Qwen 3.5 397B A17B100100100949497.6%
ByteDance Seed 1.6 Flash100100100949497.6%
Llama 3.1 70B100100100949497.6%
Qwen 2.5 72B100100100949497.6%
Claude Opus 4100100100969297.5%
Mistral NeMO10010099949497.4%
GPT-5.21009999959196.9%
DeepSeek-V2 Chat10010096949496.7%
Z.AI GLM 4.710010094949496.4%
GPT-4o, Aug. 6th (temp=0)10010094949496.4%
WizardLM 2 8x22b10010094949496.4%
Ministral 3 8B100100100998396.3%
o4 Mini High10010099948896.2%
GPT-5 Mini10010098948896.1%
DeepSeek V3 (2025-03-24)1001001001008095.9%
GPT-510010097948895.8%
Claude Opus 4.610010099968495.7%
Mistral Large10010098918995.6%
Mistral Small Creative10010096919095.4%
Arcee AI: Trinity Mini10010094928894.7%
GPT-4o, May 13th (temp=1)1009594939194.6%
Llama 3.1 Nemotron 70B1009694948894.5%
Mistral Large 310010098948094.4%
Z.AI GLM 4.7 Flash10010094888894.0%
GPT-4o, May 13th (temp=0)949494949494.0%
DeepSeek V3.21009494938893.8%
GPT-4.1969695948893.8%
Z.AI GLM 4.610010092888893.7%
Claude Haiku 4.510010093898593.4%
Gemini 3 Pro (Preview)1009494888892.8%
GPT-5 Nano989594898892.7%
Claude Sonnet 4.61009691918692.7%
Mistral Small 3.2 24B1009994937792.7%
Claude 3.5 Haiku1009994888292.6%
Ministral 3 14B989494928492.5%
Gemini 2.5 Pro1009494918392.5%
Gemini 3 Flash (Preview)1009494888692.4%
Claude Sonnet 4.51009693888492.2%
GPT-4o, Aug. 6th (temp=1)10010094917592.0%
Minimax M2.5999692878592.0%
Claude Sonnet 41009790888591.9%
Claude 3.5 Sonnet10010091897691.2%
Ministral 3 3B999490908391.1%
Gemma 3 12B1009888848390.6%
Claude Opus 4.510010094857390.3%
Arcee AI: Trinity Large (Preview)979290878590.1%
Hermes 3 405B1009492857890.0%
Writer: Palmyra X5959593917589.8%
Gemini 2.5 Flash1009490857989.7%
GPT-5.1969691897589.5%
DeepSeek V3 (2024-12-26)10010087817989.3%
Z.AI GLM 4.5939290897888.7%
Claude 3 Haiku949489797786.5%
GPT-4.1 Mini999283827686.5%
DeepSeek V3.1948886828286.4%
Gemini 3.1 Pro (Preview)939390777685.7%
Claude 3.7 Sonnet938684837985.0%
Gemini 2.5 Flash Lite949484787084.1%
Gemma 3 27B928886786782.2%
Gemma 3 4B939188814078.6%
GPT-4.1 Nano898981755978.4%
Llama 3.1 8B937255525064.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Ministral 8B100100100100100100.0%
Claude Opus 410010010010010099.9%
Claude 3.5 Haiku1001001001009799.3%
Qwen 3.5 397B A17B1001001001009498.8%
GPT-5.11001001001009498.8%
Minimax M2.51001001001009498.8%
DeepSeek V3 (2025-03-24)1001001001009498.8%
Llama 3.1 Nemotron 70B1001001001009498.8%
Qwen 2.5 72B1001001001009498.8%
Arcee AI: Trinity Large (Preview)1001001001009498.8%
Hermes 3 70B1001001001009498.8%
Ministral 3B1001001001009498.8%
Ministral 3 8B1001001001009398.7%
Gemma 3 12B100100100989498.5%
Hermes 3 405B100100100989498.4%
Ministral 3 14B100100100979498.1%
Claude Haiku 4.5100100100999198.0%
Writer: Palmyra X5100100100949497.6%
GPT-5 Mini100100100949497.6%
o4 Mini High1001001001008897.6%
o4 Mini100100100949497.6%
Z.AI GLM 5100100100949497.6%
Grok 4100100100949497.6%
Claude Sonnet 4.5100100100949497.6%
ByteDance Seed 1.6100100100949497.6%
GPT-4o, May 13th (temp=0)100100100949497.6%
Z.AI GLM 4.5100100100949497.6%
GPT-4o, Aug. 6th (temp=0)100100100949497.6%
Llama 3.1 70B1001001001008897.6%
Mistral Large100100100949497.6%
Mistral Small Creative100100100949497.6%
Claude 3 Haiku100100100949497.6%
Cohere Command R+ (Aug. 2024)100100100949497.6%
Rocinante 12B100100100949497.6%
GPT-4o, May 13th (temp=1)100100100949497.6%
Gemma 3 4B1009999968996.5%
Mistral NeMO10010099958896.4%
GPT-4.1 Nano10010096949296.4%
Claude Sonnet 4.610010096949296.4%
GPT-5100100100948896.4%
Gemini 2.5 Pro10010094949496.4%
Grok 4.1 Fast10010094949496.4%
Qwen 3.5 Plus (2026-02-15)100100100948896.4%
GPT-4o, Aug. 6th (temp=1)100100100948896.4%
GPT-4.1 Mini10010094949496.4%
Mistral Medium 3.1100100100948896.4%
DeepSeek V3.1100100100948896.4%
Mistral Large 2100100100948896.4%
Gemini 2.5 Flash10010094949496.4%
Mistral Small 3.2 24B10010094949496.4%
WizardLM 2 8x22b1001001001007995.8%
Claude 3.7 Sonnet100100100948495.6%
MoonshotAI: Kimi K2.510010094948895.2%
Z.AI GLM 4.61009494949495.2%
GPT-4.11009494949495.2%
Grok 4 Fast100100100888895.2%
Arcee AI: Trinity Mini10010094948895.2%
Gemini 2.5 Flash Lite100100100888895.1%
Gemma 3 27B1009694948894.4%
DeepSeek V3.2100100100947694.0%
Z.AI GLM 4.71009494948894.0%
Gemini 3 Pro (Preview)1009494948894.0%
GPT-5.21009494948292.7%
Gemini 3.1 Pro (Preview)959494888290.5%
GPT-5 Nano1008888888890.4%
Llama 3.1 8B1007976736278.1%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
o4 Mini High100100100100100100.0%
Grok 4100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
Rocinante 12B1001001001009999.8%
Llama 3.1 70B1001001001009699.3%
GPT-4o Mini (temp=1)1001001001009699.1%
Ministral 3 8B1001001001009599.1%
Grok 4.1 Fast1001001001009498.8%
ByteDance Seed 1.61001001001009498.8%
GPT-4.11001001001009498.8%
Claude 3.5 Haiku1001001001009498.8%
Llama 3.1 Nemotron 70B1001001001009498.8%
Mistral Large1001001001009498.8%
Ministral 3 3B1001001001009498.8%
Claude 3 Haiku1001001001009498.8%
Mistral Small Creative1001001001009298.5%
Claude Opus 4100100100979598.5%
Claude Sonnet 4100100100989498.4%
Mistral Large 3100100100989498.4%
Writer: Palmyra X5100100100989398.3%
Mistral Medium 3.11001001001009098.1%
GPT-5100100100949497.6%
Z.AI GLM 4.7100100100949497.6%
Z.AI GLM 4.7 Flash100100100949497.6%
WizardLM 2 8x22b100100100949497.6%
Ministral 3B100100100949497.6%
Mistral Small 3.2 24B1001001001008897.5%
Arcee AI: Trinity Large (Preview)1001001001008797.5%
Gemma 3 27B100100100998897.4%
GPT-5.110010099949497.3%
DeepSeek V3 (2025-03-24)10010098949497.2%
Claude Sonnet 4.510010099949397.1%
Mistral Large 2100100100978897.0%
Gemma 3 4B10010096949496.9%
Gemini 2.5 Flash Lite100100100939096.7%
GPT-5 Mini10010094949496.4%
MoonshotAI: Kimi K2.5100100100948896.4%
o4 Mini10010094949496.4%
GPT-5.210010094949496.4%
Qwen 3.5 Plus (2026-02-15)100100100948896.4%
GPT-4o, May 13th (temp=1)10010094949496.4%
Hermes 3 405B100100100948896.4%
Qwen 2.5 72B100100100948896.4%
GPT-4o, Aug. 6th (temp=1)10010094949496.4%
Gemma 3 12B10010094949496.2%
Minimax M2.510010099919196.2%
GPT-4.1 Mini10010095948795.3%
Llama 3.1 8B1009995948895.3%
Qwen 3.5 397B A17B10010094948895.2%
Z.AI GLM 4.610010094948895.2%
DeepSeek V3.21009494949495.2%
Z.AI GLM 4.510010094938894.9%
DeepSeek V3.11009894948894.9%
DeepSeek V3 (2024-12-26)1009994928994.8%
Claude Haiku 4.510010096918594.5%
DeepSeek-V2 Chat10010097918394.0%
Ministral 8B1009494929094.0%
GPT-5 Nano10010094888793.8%
Ministral 3 14B1009995898393.0%
Gemini 2.5 Pro1009494948292.8%
Gemini 3 Pro (Preview)949494948892.8%
GPT-4o, May 13th (temp=0)1009494888892.8%
Claude Sonnet 4.610010094858492.5%
Claude Opus 4.510010094888192.5%
Claude 3.7 Sonnet999692898592.3%
Hermes 3 70B949493898891.7%
Gemini 2.5 Flash1009494888291.6%
Z.AI GLM 5959491898891.5%
GPT-4.1 Nano959494898190.6%
Gemini 3 Flash (Preview)10010094827690.4%
Arcee AI: Trinity Mini1009488888290.4%
Claude Opus 4.6949392888490.3%
Gemini 3.1 Pro (Preview)948276767680.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Grok 4 Fast100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
GPT-5.210010010010010099.9%
ByteDance Seed 1.6 Flash1001001001009999.9%
GPT-5 Mini1001001001009498.8%
GPT-51001001001009498.8%
Gemini 3 Pro (Preview)1001001001009498.8%
Grok 4.1 Fast1001001001009498.8%
Stealth: Aurora Alpha1001001001009498.8%
Mistral Small 3.2 24B1001001001009498.8%
Hermes 3 70B1001001001009498.8%
Mistral NeMO100100100999498.7%
Llama 3.1 70B100100100999498.6%
Z.AI GLM 4.7 Flash1001001001009298.5%
Mistral Medium 3.1100100100989498.3%
Hermes 3 405B100100100989498.3%
GPT-4.110010099989398.1%
Z.AI GLM 4.510010098989498.0%
GPT-4o, Aug. 6th (temp=1)1001001001008997.9%
GPT-4o, May 13th (temp=0)100100100949497.6%
Claude Opus 4.6100100100949497.6%
GPT-5.1100100100949497.6%
o4 Mini High100100100949497.6%
MoonshotAI: Kimi K2.5100100100949497.6%
ByteDance Seed 1.6100100100949497.6%
Rocinante 12B100100100988897.3%
Gemini 2.5 Flash Lite100100100959297.3%
Cohere Command R+ (Aug. 2024)1001001001008697.2%
Ministral 3 14B10010099949296.9%
Grok 4100100100949096.9%
Minimax M2.510010097949396.8%
Gemini 3 Flash (Preview)10010094949496.4%
DeepSeek V3 (2025-03-24)1009997949196.2%
Qwen 2.5 72B10010099948896.1%
Claude 3.5 Sonnet10010096949196.1%
Llama 3.1 Nemotron 70B100100100928896.1%
Gemma 3 4B10010096938895.5%
Claude Sonnet 4.5999796949095.3%
Qwen 3.5 397B A17B1009494949495.2%
GPT-4o Mini (temp=1)10010095948695.0%
Ministral 3B10010095928895.0%
GPT-4o, May 13th (temp=1)10010094948794.9%
GPT-5 Nano100100100948194.9%
Claude Opus 4.51009894948894.8%
Ministral 3 3B1009797918894.8%
Arcee AI: Trinity Mini999494949394.7%
Arcee AI: Trinity Large (Preview)100100100878594.5%
DeepSeek-V2 Chat1009996898894.3%
Gemini 2.5 Pro1009494948894.0%
Z.AI GLM 4.71009494948894.0%
GPT-4.1 Mini1009493929194.0%
Claude Haiku 4.510010093898893.8%
Gemma 3 12B10010097927993.7%
Writer: Palmyra X510010092908593.5%
Mistral Large1009894888492.8%
Qwen 3.5 Plus (2026-02-15)100100100828292.8%
Claude Sonnet 4.61009393908892.7%
DeepSeek V3.110010088888792.4%
Mistral Small Creative1009494938192.3%
DeepSeek V3 (2024-12-26)1009494938092.2%
Z.AI GLM 4.61009788888792.0%
Claude 3.5 Haiku10010094877891.7%
Claude 3.7 Sonnet989591868691.1%
WizardLM 2 8x22b1009494887991.0%
Claude Sonnet 41009694887690.7%
DeepSeek V3.2949494888290.4%
Claude Opus 41009492877890.2%
GPT-4.1 Nano959389878289.5%
Gemini 3.1 Pro (Preview)948888888288.0%
Ministral 8B978885858487.9%
Ministral 3 8B1009589856586.8%
Mistral Large 2949490797786.8%
Llama 3.1 8B949485827686.1%
Gemini 2.5 Flash1008888836985.6%
Mistral Large 3948887817685.3%
Gemma 3 27B1009392815984.9%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Grok 4100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Claude Opus 4.61001001001009999.9%
Claude 3.5 Sonnet1001001001009598.9%
GPT-5 Mini1001001001009498.8%
GPT-5.21001001001009498.8%
ByteDance Seed 1.61001001001009498.8%
Stealth: Aurora Alpha1001001001009498.8%
Grok 4 Fast1001001001009498.8%
Claude 3.5 Haiku1001001001009498.8%
DeepSeek V3.11001001001009498.8%
GPT-4o Mini (temp=1)1001001001009498.8%
Mistral Small Creative1001001001009498.8%
Qwen 2.5 72B1001001001009498.8%
GPT-5 Nano100100100999498.7%
DeepSeek-V2 Chat1001001001009398.6%
Rocinante 12B100100100999498.6%
Mistral Large100100100999498.6%
Mistral Large 210010099989598.5%
GPT-4.1100100100989498.4%
Claude Sonnet 4.51001001001009298.4%
DeepSeek V3 (2025-03-24)100100100979498.3%
Hermes 3 70B100100100969598.1%
Gemma 3 27B10010099959497.7%
Qwen 3.5 397B A17B100100100949497.6%
MoonshotAI: Kimi K2.51001001001008897.6%
Gemini 2.5 Pro1001001001008897.6%
GPT-4o, May 13th (temp=0)100100100949497.6%
DeepSeek V3.2100100100949497.6%
Gemini 3 Flash (Preview)100100100949497.6%
Llama 3.1 Nemotron 70B100100100949497.6%
Claude Haiku 4.5100100100979197.5%
DeepSeek V3 (2024-12-26)100100100998897.4%
Arcee AI: Trinity Mini10010098969197.1%
Mistral Large 3100100100949197.1%
GPT-4o, May 13th (temp=1)10010097949497.0%
Gemma 3 12B10010099949197.0%
Cohere Command R+ (Aug. 2024)100100100968896.9%
Ministral 3 8B10010098978996.9%
Claude 3 Haiku100100100968896.8%
Ministral 3 3B100100100948996.5%
GPT-4o, Aug. 6th (temp=1)100100100988596.5%
Arcee AI: Trinity Large (Preview)100100100948896.5%
Z.AI GLM 510010099968796.4%
Z.AI GLM 4.7 Flash10010094949496.4%
WizardLM 2 8x22b10010094949496.4%
Llama 3.1 8B1001001001008196.2%
Z.AI GLM 4.61009894949496.0%
Claude Opus 410010094949195.8%
Mistral NeMO1009794949495.8%
Z.AI GLM 4.5100100100908895.6%
Mistral Medium 3.1100100100908995.6%
Writer: Palmyra X51009795949095.2%
Qwen 3.5 Plus (2026-02-15)1009494949495.2%
Mistral Small 3.2 24B1009494949495.2%
Claude Opus 4.510010098948495.1%
GPT-4o Mini (temp=0)1009494949395.0%
Claude Sonnet 410010096918694.6%
Hermes 3 405B1009894919094.6%
Minimax M2.51009999888694.3%
Gemini 3.1 Pro (Preview)1009494948894.0%
Z.AI GLM 4.71009494948894.0%
Ministral 8B10010091898993.8%
Llama 3.1 70B1009494928893.6%
Ministral 3B100100100878093.5%
Gemini 2.5 Flash1009494908091.6%
Claude 3.7 Sonnet1009995847891.2%
Claude Sonnet 4.6959390898890.9%
Gemma 3 4B1009493867990.7%
Ministral 3 14B10010097866589.5%
GPT-4.1 Mini1009090867788.7%
Gemini 2.5 Flash Lite1009894786987.7%
GPT-4.1 Nano918980787081.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Minimax M2.5100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
GPT-4o Mini (temp=1)1001001001009999.8%
Writer: Palmyra X51001001001009999.8%
Ministral 3 3B1001001001009999.8%
Ministral 3 14B1001001001009799.4%
Hermes 3 405B1001001001009699.3%
Ministral 8B100100100999698.9%
Llama 3.1 8B1001001001009498.9%
GPT-5 Mini1001001001009498.8%
o4 Mini High1001001001009498.8%
GPT-51001001001009498.8%
Z.AI GLM 51001001001009498.8%
Claude Opus 41001001001009498.8%
GPT-4.11001001001009498.8%
GPT-5 Nano1001001001009498.8%
Grok 4 Fast1001001001009498.8%
Mistral Large 31001001001009498.8%
GPT-4o, May 13th (temp=0)1001001001009498.8%
GPT-4o, Aug. 6th (temp=1)1001001001009498.8%
GPT-4o, Aug. 6th (temp=0)1001001001009498.8%
o4 Mini100100100999498.7%
Qwen 2.5 72B100100100999498.6%
Mistral Large1001001001009398.6%
Mistral NeMO1001001001009398.6%
Ministral 3 8B100100100979498.2%
Claude Sonnet 410010098989498.1%
Z.AI GLM 4.61001001001009098.1%
Z.AI GLM 4.51001001001009098.0%
Rocinante 12B1001001001008997.9%
Claude Opus 4.5100100100949497.6%
GPT-5.2100100100949497.6%
Gemini 3 Flash (Preview)100100100949497.6%
Qwen 3.5 Plus (2026-02-15)100100100949497.6%
DeepSeek V3.1100100100949497.6%
Mistral Large 2100100100949497.6%
Cohere Command R+ (Aug. 2024)100100100949497.6%
Claude 3.7 Sonnet100100100959397.6%
Llama 3.1 Nemotron 70B100100100998897.4%
Claude Haiku 4.5100100100949397.4%
Claude 3.5 Haiku1001001001008797.3%
GPT-4o, May 13th (temp=1)10010098949497.2%
Hermes 3 70B100100100949197.1%
Llama 3.1 70B100100100949096.8%
Claude Sonnet 4.5100100100948996.6%
Claude Opus 4.610010094949496.4%
GPT-5.1100100100948896.4%
Z.AI GLM 4.7100100100948896.4%
Z.AI GLM 4.7 Flash100100100948896.4%
GPT-4.1 Mini10010095949396.4%
DeepSeek V3 (2024-12-26)1009994949496.1%
Gemini 2.5 Flash Lite10010097948995.9%
Mistral Small Creative100100100938695.8%
Claude 3 Haiku10010096948895.7%
DeepSeek V3 (2025-03-24)10010094939195.6%
Gemma 3 27B10010095958895.6%
Arcee AI: Trinity Mini1009894949095.3%
Gemini 3 Pro (Preview)10010094948895.2%
Grok 4.1 Fast10010094948895.2%
Mistral Medium 3.110010094938995.1%
DeepSeek-V2 Chat999694948894.2%
Gemini 3.1 Pro (Preview)1009494948894.0%
DeepSeek V3.210010094888894.0%
ByteDance Seed 1.6100100100887692.8%
Gemini 2.5 Flash1009491918892.7%
GPT-4.1 Nano1009790888692.0%
Gemma 3 4B979594878291.2%
Gemma 3 12B10010094797689.8%
Mistral Small 3.2 24B10010088885987.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5100100100100100100.0%
Grok 4100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Mistral Large 3100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)1001001001009999.9%
Claude 3.5 Sonnet1001001001009999.8%
Mistral Large 21001001001009999.8%
GPT-4o Mini (temp=1)1001001001009999.7%
DeepSeek V3 (2024-12-26)1001001001009899.5%
Claude Sonnet 4100100100999899.5%
Claude Haiku 4.5100100100999899.3%
Z.AI GLM 51001001001009599.1%
Hermes 3 70B100100100989698.9%
o4 Mini1001001001009498.8%
Gemini 2.5 Pro1001001001009498.8%
Gemini 3 Pro (Preview)1001001001009498.8%
Grok 4 Fast1001001001009498.8%
WizardLM 2 8x22b1001001001009498.8%
GPT-5.2100100100999598.7%
Arcee AI: Trinity Large (Preview)1001001001009398.6%
Cohere Command R+ (Aug. 2024)1001001001009298.5%
DeepSeek V3 (2025-03-24)1001001001009298.3%
Arcee AI: Trinity Mini100100100999298.1%
Minimax M2.510010098979598.0%
Claude Opus 4.6100100100959497.9%
Qwen 2.5 72B1001001001008897.7%
Mistral Small Creative100100100979197.7%
Mistral Large10010099959497.7%
Gemini 3 Flash (Preview)1001001001008897.6%
Stealth: Aurora Alpha100100100949497.6%
GPT-4o, May 13th (temp=0)100100100949397.4%
Ministral 3 14B10010099948996.5%
Z.AI GLM 4.7100100100948896.4%
Mistral NeMO10010094949496.4%
Ministral 3 8B10010094939396.0%
Claude Opus 410010095949196.0%
Claude Sonnet 4.5100100100948595.7%
Hermes 3 405B100100100948595.7%
GPT-4o, Aug. 6th (temp=1)1001001001007895.7%
Llama 3.1 70B10010094949095.6%
Gemini 2.5 Flash10010094929295.6%
Claude 3.5 Haiku10010096968595.5%
Z.AI GLM 4.7 Flash10010094948895.2%
Llama 3.1 Nemotron 70B1009494949395.1%
Mistral Small 3.2 24B959594949494.5%
DeepSeek-V2 Chat10010094938594.4%
GPT-5 Nano10010094928694.3%
Mistral Medium 3.110010097918394.1%
Gemini 3.1 Pro (Preview)1009494948894.0%
GPT-4.1 Mini10010099868493.8%
Claude 3 Haiku1009393929193.7%
Claude Sonnet 4.610010096937993.6%
Ministral 3 3B1009695948393.5%
Claude Opus 4.51009794908693.5%
Z.AI GLM 4.610010094918293.4%
Ministral 8B1009494938593.2%
Ministral 3B100100100887793.1%
Claude 3.7 Sonnet1009593918793.0%
DeepSeek V3.1989494928793.0%
Gemini 2.5 Flash Lite1009494888892.7%
GPT-4.11009494948092.5%
Gemma 3 12B989795947792.4%
Writer: Palmyra X510010089887991.2%
Z.AI GLM 4.5949488888690.0%
GPT-4o, May 13th (temp=1)10010094847290.0%
Gemma 3 27B989494907389.8%
Rocinante 12B989889897189.0%
Llama 3.1 8B10010093864083.8%
Gemma 3 4B1009976726883.1%
GPT-4.1 Nano977978777581.2%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Claude 3.7 Sonnet1001001001009999.8%
Claude Haiku 4.51001001001009899.6%
Hermes 3 70B1001001001009799.5%
Qwen 3.5 397B A17B1001001001009498.8%
o4 Mini High1001001001009498.8%
MoonshotAI: Kimi K2.51001001001009498.8%
GPT-5.21001001001009498.8%
Z.AI GLM 4.71001001001009498.8%
Claude Opus 41001001001009498.8%
Minimax M2.51001001001009498.8%
Grok 41001001001009498.8%
Z.AI GLM 4.61001001001009498.8%
GPT-4.11001001001009498.8%
Qwen 3.5 Plus (2026-02-15)1001001001009498.8%
GPT-5 Nano1001001001009498.8%
Mistral Large 31001001001009498.8%
DeepSeek-V2 Chat1001001001009498.8%
ByteDance Seed 1.6 Flash1001001001009498.8%
DeepSeek V3.11001001001009498.8%
Mistral Large 21001001001009498.8%
GPT-4o Mini (temp=1)1001001001009498.8%
GPT-4o Mini (temp=0)1001001001009498.8%
Gemini 2.5 Flash Lite1001001001009498.8%
Mistral Large1001001001009498.8%
Gemma 3 27B1001001001009498.8%
Qwen 2.5 72B1001001001009498.8%
Mistral Small 3.2 24B1001001001009498.8%
Claude 3 Haiku1001001001009498.8%
Mistral NeMO1001001001009498.8%
Gemma 3 12B1001001001009498.7%
Hermes 3 405B1001001001009398.5%
Cohere Command R+ (Aug. 2024)1001001001009298.5%
Arcee AI: Trinity Large (Preview)1001001001009097.9%
Claude Sonnet 4.6100100100989197.9%
GPT-5 Mini100100100949497.6%
GPT-5.1100100100949497.6%
Z.AI GLM 5100100100949497.6%
Gemini 2.5 Pro100100100949497.6%
Claude Sonnet 4100100100949497.6%
Claude Sonnet 4.5100100100949497.6%
Z.AI GLM 4.7 Flash100100100949497.6%
Claude 3.5 Sonnet100100100949497.6%
DeepSeek V3 (2024-12-26)100100100949497.6%
Z.AI GLM 4.5100100100949497.6%
GPT-4o, May 13th (temp=1)100100100949497.6%
Writer: Palmyra X5100100100949497.6%
Arcee AI: Trinity Mini100100100949497.6%
Rocinante 12B1001001001008897.6%
Ministral 3 14B1001001001008697.2%
Claude 3.5 Haiku100100100988696.9%
Llama 3.1 8B10010096949396.5%
GPT-4.1 Nano10010099978696.5%
o4 Mini100100100948896.4%
Grok 4 Fast10010094949496.4%
Gemini 2.5 Flash10010094949496.4%
Mistral Small Creative10010094949496.4%
WizardLM 2 8x22b100100100948896.4%
Mistral Medium 3.110010094949296.0%
Claude Opus 4.510010098948896.0%
ByteDance Seed 1.6100100100888895.2%
DeepSeek V3.21009494949495.2%
Llama 3.1 Nemotron 70B10010097948595.2%
GPT-5949494949494.0%
Grok 4.1 Fast10010094888894.0%
Gemini 3 Flash (Preview)1009494888892.8%
Gemini 3.1 Pro (Preview)1009494888291.6%