Technical jargon density

Test: Bad Writing Habits

Avg. Score
65.9%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1o4 Mini High92.7%$0.02547.2s69%
2o4 Mini89.3%$0.01525.7s63%
3Qwen 2.5 72B89.9%$0.001036.7s59%
4GPT-4.186.5%$0.01844.7s61%
5DeepSeek V3 (2025-03-24)86.4%$0.001439.4s49%
6GPT-4o, May 13th (temp=0)86.3%$0.03514.1s54%
7GPT-4o, May 13th (temp=1)84.7%$0.03314.4s54%
8Hermes 3 405B88.8%$0.003253.2s49%
9Rocinante 12B83.1%$0.001438.4s49%
10Gemma 3 27B82.9%$0.000652.6s49%
11Claude 3 Haiku79.8%$0.002514.9s43%
12Gemma 3 12B80.6%$0.000441.3s45%
13Writer: Palmyra X580.3%$0.01122.0s43%
14DeepSeek-V2 Chat80.2%$0.002153.3s45%
15Gemini 2.5 Flash76.7%$0.005210.6s38%
16Mistral Large 375.0%$0.003330.3s35%
17Mistral Large 276.0%$0.01329.4s37%
18Mistral NeMO72.4%$0.000510.1s31%
19Arcee AI: Trinity Mini72.5%$0.00039.2s29%
20Qwen 3.5 397B A17B88.7%$0.0143.0m55%
21DeepSeek V3 (2024-12-26)76.3%$0.002154.6s35%
22GPT-4.1 Mini71.3%$0.002719.0s31%
23GPT-5.184.8%$0.0541.8m55%
24Gemini 3.1 Pro (Preview)92.4%$0.1071.8m67%
25GPT-588.9%$0.0652.8m69%
26ByteDance Seed 1.6 Flash69.9%$0.001327.3s30%
27Gemini 2.5 Pro76.3%$0.03636.2s37%
28Ministral 3 14B68.7%$0.000711.7s26%
29Mistral Large71.8%$0.01430.9s30%
30DeepSeek V3.273.9%$0.00141.9m40%
31Ministral 3 8B66.4%$0.000819.6s27%
32Mistral Medium 3.169.4%$0.004836.5s29%
33Gemini 2.5 Flash Lite63.2%$0.00099.5s27%
34Mistral Small Creative65.4%$0.00079.1s25%
35Ministral 3 3B66.4%$0.000511.1s21%
36Z.AI GLM 4.667.9%$0.006551.5s30%
37Arcee AI: Trinity Large (Preview)67.7%$0.000043.6s25%
38Llama 3.1 70B67.2%$0.001529.4s21%
39Grok 4.1 Fast66.5%$0.001837.8s22%
40Hermes 3 70B70.1%$0.00101.2m24%
41Gemma 3 4B60.1%$0.000220.0s21%
42Z.AI GLM 4.7 Flash67.3%$0.00171.2m26%
43Gemini 3 Flash (Preview)57.5%$0.007819.6s24%
44GPT-4o, Aug. 6th (temp=1)63.5%$0.01824.4s23%
45Claude 3.7 Sonnet69.2%$0.04246.7s31%
46GPT-5 Mini63.4%$0.010057.4s27%
47Gemini 3 Pro (Preview)71.5%$0.05554.4s35%
48Z.AI GLM 4.564.1%$0.005142.1s21%
49Cohere Command R+ (Aug. 2024)67.9%$0.02052.5s25%
50Z.AI GLM 4.767.1%$0.0101.4m28%
51GPT-4.1 Nano57.0%$0.000713.3s16%
52Llama 3.1 8B65.3%$0.00031.3m22%
53DeepSeek V3.165.6%$0.00201.8m28%
54GPT-5.272.1%$0.0561.5m37%
55Ministral 8B56.1%$0.000410.4s14%
56Llama 3.1 Nemotron 70B58.5%$0.003831.7s18%
57WizardLM 2 8x22b69.4%$0.00261.8m25%
58GPT-4o Mini (temp=0)57.6%$0.001234.8s16%
59Ministral 3B55.1%$0.00018.1s11%
60Claude 3.5 Haiku55.2%$0.003510.8s11%
61GPT-4o, Aug. 6th (temp=0)55.0%$0.02322.7s17%
62Claude 3.5 Sonnet64.5%$0.04835.5s20%
63Claude Sonnet 4.558.5%$0.03538.1s20%
64Qwen 3.5 Plus (2026-02-15)47.3%$0.006031.5s16%
65Claude Haiku 4.544.1%$0.01121.6s16%
66Grok 4 Fast44.6%$0.001724.1s13%
67GPT-4o Mini (temp=1)46.5%$0.001234.8s11%
68Z.AI GLM 550.8%$0.00841.2m14%
69Claude Sonnet 4.644.3%$0.03139.3s16%
70Stealth: Aurora Alpha32.7%$0.00009.8s4%
71Minimax M2.540.4%$0.00341.3m10%
72Claude Sonnet 443.1%$0.03243.7s10%
73Grok 451.0%$0.0481.7m14%
74Claude Opus 4.541.9%$0.07053.4s17%
75Claude Opus 471.3%$0.2091.4m33%
76Mistral Small 3.2 24B69.0%$0.00695.7m20%
77Claude Opus 4.632.3%$0.0781.2m11%
78ByteDance Seed 1.633.1%$0.0132.5m4%
79MoonshotAI: Kimi K2.537.5%$0.0193.2m7%
80GPT-5 Nano4.6%$0.00421.4m0%
65.91%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Hermes 3 405B1001001001009599.0%
Qwen 3.5 397B A17B1001001001009398.6%
GPT-5989694929294.5%
GPT-4o, May 13th (temp=1)1001001001007093.9%
Gemini 3 Pro (Preview)10010079747184.9%
Qwen 2.5 72B10010094755084.0%
DeepSeek V3.2949288806383.3%
GPT-4.1989483716883.0%
Writer: Palmyra X5100100100634481.5%
Hermes 3 70B100100100614380.8%
GPT-5.110010097713480.3%
DeepSeek V3 (2025-03-24)10010010098079.6%
o4 Mini100100100573979.2%
Mistral Large1008983595978.0%
Gemini 2.5 Flash1008271716377.7%
Mistral NeMO1009077665477.2%
o4 Mini High1009686553975.3%
WizardLM 2 8x22b10010086721774.9%
DeepSeek-V2 Chat10010075543171.9%
DeepSeek V3.1947878583869.1%
Z.AI GLM 4.51001008062068.3%
GPT-4o, May 13th (temp=0)818077771064.9%
Z.AI GLM 4.71007865453564.6%
Rocinante 12B1008654413863.8%
GPT-4o, Aug. 6th (temp=0)1001006536060.2%
GPT-5 Mini837960601860.1%
Llama 3.1 70B100100970059.4%
Ministral 3 14B1007168441258.9%
Mistral Large 3918746362957.8%
Z.AI GLM 4.7 Flash100925641057.7%
Mistral Small 3.2 24B100736546056.8%
GPT-4.1 Mini100766041656.6%
Claude Sonnet 4.51007454451056.5%
Z.AI GLM 4.6807560333356.2%
Claude Opus 4100775841456.0%
Gemini 2.5 Pro967556211352.3%
ByteDance Seed 1.6 Flash76685957052.0%
Gemma 3 12B100794436051.9%
Claude 3 Haiku100695039051.5%
Mistral Medium 3.1737168271450.6%
Gemma 3 27B946341292450.3%
Mistral Small Creative1001003516050.2%
Gemini 3 Flash (Preview)885846371649.0%
DeepSeek V3 (2024-12-26)100100329048.1%
GPT-5.2785754271746.6%
Cohere Command R+ (Aug. 2024)83584441045.3%
Arcee AI: Trinity Mini9473515044.5%
Llama 3.1 8B10075410043.1%
Qwen 3.5 Plus (2026-02-15)9461527042.9%
Claude Sonnet 4.693513627041.4%
Ministral 3 8B71543838040.2%
Claude 3.5 Sonnet10071170037.6%
Mistral Large 2923327191336.6%
Arcee AI: Trinity Large (Preview)70553316034.9%
Claude Opus 4.56866209032.5%
Claude 3.7 Sonnet787100029.9%
Gemma 3 4B6857183029.3%
Z.AI GLM 563411716027.4%
GPT-4o, Aug. 6th (temp=1)73292013027.0%
Claude Haiku 4.56548130025.2%
MoonshotAI: Kimi K2.5695600025.0%
Llama 3.1 Nemotron 70B753700022.4%
GPT-4o Mini (temp=1)8213100020.9%
Claude 3.5 Haiku634100020.9%
Claude Opus 4.6653600020.2%
GPT-4o Mini (temp=0)5127170019.0%
Claude Sonnet 471700015.6%
ByteDance Seed 1.674300015.4%
Gemini 2.5 Flash Lite453000015.1%
Grok 4.1 Fast2922120012.4%
Minimax M2.560000012.1%
Ministral 8B272094012.0%
GPT-4.1 Nano391300010.3%
Ministral 3 3B2480006.4%
Grok 41000002.0%
Grok 4 Fast500001.0%
Stealth: Aurora Alpha200000.3%
GPT-5 Nano000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Rocinante 12B1001001001009298.4%
o4 Mini High100100100959297.4%
Ministral 3 3B100100100958896.6%
GPT-5.1100100100918294.6%
DeepSeek V3 (2025-03-24)100100100987193.9%
GPT-4o, May 13th (temp=1)10010096937693.0%
GPT-5100100100936090.6%
Claude 3.5 Haiku100100100787189.9%
o4 Mini10010084837889.1%
DeepSeek V3.210010089797588.6%
Gemma 3 27B100100100925188.6%
Hermes 3 405B1001001001003787.4%
Claude 3.5 Sonnet10010092825986.5%
Qwen 3.5 397B A17B100100100775486.1%
Z.AI GLM 4.7 Flash100100100902983.8%
Cohere Command R+ (Aug. 2024)10010079736583.6%
Writer: Palmyra X510010092636083.2%
WizardLM 2 8x22b10010073707082.4%
Qwen 2.5 72B10010078765782.2%
DeepSeek V3 (2024-12-26)100100100684382.1%
Arcee AI: Trinity Large (Preview)100100100593679.1%
Claude Opus 41009889713678.9%
Z.AI GLM 4.71009686703878.1%
GPT-4o, Aug. 6th (temp=1)10010079743477.4%
Mistral Large10010010086077.1%
GPT-4o, May 13th (temp=0)1009896751576.8%
Claude 3.7 Sonnet948583704976.2%
Gemini 3 Pro (Preview)10010070624976.2%
GPT-4.1 Mini1007671706175.8%
GPT-4.11009876584174.7%
Gemini 2.5 Flash888885605274.6%
Z.AI GLM 4.51009886543273.8%
Arcee AI: Trinity Mini100908681572.2%
GPT-5.2100978574071.2%
Z.AI GLM 5947671555269.9%
Mistral Large 2987463544566.8%
Gemma 3 12B968262464165.4%
Gemini 2.5 Pro1001007349565.4%
Ministral 3 8B100938349065.0%
DeepSeek-V2 Chat1009892241165.0%
ByteDance Seed 1.6 Flash100836562062.2%
GPT-4o, Aug. 6th (temp=0)100927339060.7%
Qwen 3.5 Plus (2026-02-15)858363422459.6%
Z.AI GLM 4.61008650481359.4%
Mistral Medium 3.189766663059.0%
Claude Sonnet 493886935958.7%
DeepSeek V3.195827541058.6%
MoonshotAI: Kimi K2.51007161431357.7%
Mistral NeMO10094833056.1%
Claude Sonnet 4.5100954338055.2%
Ministral 3B986359411154.4%
GPT-5 Mini1005547422253.2%
Claude 3 Haiku92706534352.7%
Llama 3.1 Nemotron 70B86696636051.5%
ByteDance Seed 1.610089630050.6%
Claude Sonnet 4.686635936048.9%
Mistral Large 3100675017948.5%
Gemma 3 4B100100225045.4%
Gemini 3 Flash (Preview)100494326744.7%
Gemini 2.5 Flash Lite10084343044.3%
Grok 474623929040.8%
Mistral Small Creative85543420840.1%
Hermes 3 70B1009800039.6%
GPT-4o Mini (temp=1)10065310039.1%
Claude Opus 4.697383027038.4%
Llama 3.1 8B10041349036.7%
Claude Opus 4.566564111034.7%
Ministral 3 14B54514424034.5%
Grok 4 Fast7541154027.0%
Llama 3.1 70B5443380027.0%
Grok 4.1 Fast6337320026.5%
GPT-4.1 Nano41331615021.0%
Minimax M2.548241514020.1%
Mistral Small 3.2 24B100000020.0%
Claude Haiku 4.5543570019.0%
Ministral 8B612000016.2%
Stealth: Aurora Alpha54000010.7%
GPT-4o Mini (temp=0)1800003.5%
GPT-5 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-4.1100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Writer: Palmyra X51001001001009899.6%
DeepSeek-V2 Chat1001001001009599.0%
GPT-4.1 Mini1001001001009098.0%
Grok 4.1 Fast1001001001008697.1%
Qwen 3.5 397B A17B100100100929296.7%
Ministral 3 3B1001001001008396.7%
o4 Mini100100100988396.3%
Grok 4100100100947894.4%
WizardLM 2 8x22b100100100888293.9%
Qwen 2.5 72B1001001001006693.1%
Mistral Large 21001001001006092.1%
Gemma 3 27B100100100827891.8%
Mistral Medium 3.1100100100986091.7%
Ministral 3 14B1001001001005891.6%
Mistral Small Creative10010093887190.5%
Z.AI GLM 4.5100100100866389.8%
Arcee AI: Trinity Mini100100100955489.8%
Gemini 2.5 Flash100100100886089.7%
Z.AI GLM 4.6100100100945489.5%
Z.AI GLM 4.7 Flash1001001001004889.5%
GPT-4o, May 13th (temp=1)100100100984588.7%
DeepSeek V3.210010092886388.6%
Rocinante 12B100100100756888.6%
Gemini 2.5 Pro1001001001004188.2%
Claude Opus 410010083787787.6%
Gemini 3 Flash (Preview)1009984826385.8%
GPT-51009285796985.0%
Arcee AI: Trinity Large (Preview)10010092785484.7%
Z.AI GLM 4.7100100100635884.3%
ByteDance Seed 1.6 Flash1009692785083.1%
Mistral NeMO10010098813683.1%
Llama 3.1 70B10010094922782.5%
Qwen 3.5 Plus (2026-02-15)10010091695081.8%
Mistral Large 310010094803481.6%
Grok 4 Fast10010086695181.2%
o4 Mini High10010080715481.0%
GPT-4o Mini (temp=0)10010088714581.0%
Ministral 3B100100100683780.9%
Cohere Command R+ (Aug. 2024)100100100713380.9%
GPT-5.110010085784180.7%
GPT-5.21007675737179.1%
Claude 3.7 Sonnet1009480715179.1%
Mistral Large100100100613378.8%
GPT-4o, Aug. 6th (temp=1)989888574477.2%
Ministral 3 8B1008075706077.0%
DeepSeek V3.11008378615575.6%
Stealth: Aurora Alpha897472716874.8%
Minimax M2.510010063604874.3%
GPT-4o, Aug. 6th (temp=0)10010071574374.2%
ByteDance Seed 1.610010094502473.4%
GPT-5 Mini928683584472.5%
GPT-4.1 Nano1001008174872.4%
DeepSeek V3 (2024-12-26)10010078542871.8%
Gemini 3 Pro (Preview)100938874070.9%
Gemma 3 12B979488372467.8%
Claude Sonnet 410010081292065.8%
Llama 3.1 8B1007868453665.3%
Claude 3.5 Haiku1001007841063.7%
Ministral 8B100999117462.0%
Claude Opus 4.51006060543661.9%
Gemini 2.5 Flash Lite1001001009061.8%
Z.AI GLM 51008668312461.7%
Claude Sonnet 4.51001007410056.7%
Claude Haiku 4.5815856464356.7%
Hermes 3 70B1001005724056.2%
Claude 3.5 Sonnet595454484150.9%
Llama 3.1 Nemotron 70B100783029548.4%
Claude Opus 4.69580560046.2%
MoonshotAI: Kimi K2.510060450041.2%
Claude Sonnet 4.68335289030.9%
GPT-4o Mini (temp=1)1004650030.2%
Gemma 3 4B54353324429.8%
GPT-5 Nano3380008.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Llama 3.1 70B1001001001009899.6%
o4 Mini High1001001001008897.6%
o4 Mini1001001001008897.6%
Gemini 3 Pro (Preview)1001001001008296.3%
GPT-4.11001001001007895.5%
GPT-5100100100898394.6%
Mistral NeMO100100100957694.3%
Claude 3 Haiku1001001001007194.3%
GPT-4o, Aug. 6th (temp=0)100100100888193.7%
Llama 3.1 Nemotron 70B100100100986392.3%
ByteDance Seed 1.6 Flash1001001001005991.8%
GPT-4o, May 13th (temp=1)100100100976091.4%
Writer: Palmyra X5100100100906390.7%
GPT-4.1 Mini100100100866089.2%
Claude Opus 41009591867188.6%
Z.AI GLM 4.710010097747188.5%
GPT-5.110010088827088.1%
Gemma 3 12B100100100865487.9%
GPT-4o, May 13th (temp=0)100100100815787.6%
Grok 4.1 Fast100100100924587.5%
GPT-4.1 Nano10010090806687.2%
DeepSeek-V2 Chat10010095924987.2%
GPT-4o, Aug. 6th (temp=1)1001001001003386.6%
Z.AI GLM 4.7 Flash10010098973886.6%
DeepSeek V3 (2025-03-24)10010095784884.2%
Gemini 2.5 Pro100100100793883.4%
Hermes 3 70B100100100902482.8%
Minimax M2.510010090784482.4%
Ministral 3 14B1008886716381.7%
Claude Haiku 4.5949491636381.1%
Gemini 2.5 Flash Lite1008583716580.9%
WizardLM 2 8x22b10010070666580.2%
Claude 3.5 Haiku100100100100080.0%
Mistral Small 3.2 24B100100100100080.0%
Qwen 3.5 397B A17B10010010097079.4%
DeepSeek V3.21009783754179.2%
GPT-4o Mini (temp=0)1008176736078.0%
MoonshotAI: Kimi K2.51007878716177.7%
Mistral Large 3929085764176.7%
GPT-5.21007978715576.4%
Claude 3.7 Sonnet969488861575.8%
Claude Sonnet 4.510010060575774.8%
Arcee AI: Trinity Mini1001009571073.3%
Mistral Large100958683072.8%
Gemini 2.5 Flash1001009563071.7%
Arcee AI: Trinity Large (Preview)1001008175071.1%
DeepSeek V3 (2024-12-26)1008678781370.9%
Z.AI GLM 4.510010078413670.9%
GPT-5 Mini1009461544270.1%
Claude 3.5 Sonnet10010010041068.2%
Qwen 3.5 Plus (2026-02-15)1008366493967.5%
Z.AI GLM 4.6100827369365.4%
Claude Sonnet 41001009235065.3%
Grok 41008078541364.9%
Mistral Small Creative10010063501164.8%
Ministral 3 3B1001007141062.4%
Cohere Command R+ (Aug. 2024)1008345414162.1%
Llama 3.1 8B948278322461.8%
DeepSeek V3.11008366242459.5%
Claude Sonnet 4.61001007620059.1%
Ministral 3B100100950059.0%
Ministral 3 8B88787554058.8%
Claude Opus 4.5100835145456.7%
Mistral Large 2100786528955.9%
Z.AI GLM 5757163442455.6%
Stealth: Aurora Alpha9891756054.0%
GPT-4o Mini (temp=1)100886015052.7%
Gemma 3 4B766649363151.7%
Gemini 3 Flash (Preview)94795513048.3%
Claude Opus 4.6100542718039.6%
ByteDance Seed 1.61007000033.9%
Grok 4 Fast60602915032.7%
Ministral 8B75442413031.2%
Mistral Medium 3.168511915030.4%
GPT-5 Nano600001.2%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
o4 Mini High100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Rocinante 12B100100100100100100.0%
o4 Mini1001001001009899.6%
Z.AI GLM 4.71001001001009298.4%
GPT-5.11001001001008897.6%
Hermes 3 70B1001001001008697.1%
GPT-4o, May 13th (temp=1)1001001001008296.3%
Gemini 2.5 Flash100100100928895.9%
Writer: Palmyra X51001001001007895.6%
Mistral Large10010098898895.1%
DeepSeek V3 (2024-12-26)1001001001007595.0%
Ministral 3 14B1001001001007595.0%
Mistral Large 3100100100928294.8%
Claude 3.5 Sonnet10010098888393.9%
Arcee AI: Trinity Mini100100100927493.0%
Z.AI GLM 510010094926890.6%
GPT-5100100100817190.5%
Claude 3 Haiku10010086838290.2%
GPT-4o, May 13th (temp=0)1009895867190.1%
GPT-4.110010097975489.4%
Claude Sonnet 4.510010098757489.4%
DeepSeek V3 (2025-03-24)100100100925489.1%
Arcee AI: Trinity Large (Preview)100100100865888.7%
Grok 4.1 Fast100100100786388.3%
DeepSeek-V2 Chat1001001001004188.2%
WizardLM 2 8x22b100100100756688.1%
DeepSeek V3.21009793886087.5%
Claude 3.7 Sonnet100100100805687.2%
DeepSeek V3.1100100100963786.7%
Gemini 2.5 Pro1001001001003386.5%
Mistral Medium 3.110010092915086.4%
Qwen 3.5 Plus (2026-02-15)10010089885285.8%
ByteDance Seed 1.6 Flash10010094944185.7%
Claude Opus 4100100100804585.1%
Hermes 3 405B1001001001002484.8%
Gemini 3 Pro (Preview)100100100635984.5%
Gemini 3 Flash (Preview)10010099754884.3%
ByteDance Seed 1.6100100100665483.8%
GPT-4.1 Mini10010095754883.5%
GPT-4o, Aug. 6th (temp=1)10010094853883.3%
Z.AI GLM 4.610010094922882.6%
Qwen 2.5 72B1001001001001382.6%
Grok 410010094793381.2%
Ministral 3 3B100100100633880.3%
Cohere Command R+ (Aug. 2024)100100100100080.0%
Llama 3.1 8B10010078714579.0%
Claude Sonnet 410010092614178.8%
Stealth: Aurora Alpha1009985723377.7%
Mistral Large 2100100100632477.5%
GPT-5.21009285773177.0%
Gemini 2.5 Flash Lite1008381655276.2%
Z.AI GLM 4.7 Flash947674716375.6%
Mistral Small 3.2 24B1001008983074.5%
Gemma 3 27B1009482662873.8%
Z.AI GLM 4.51009886711373.7%
Claude 3.5 Haiku10010010063072.7%
Minimax M2.510010088541771.7%
Gemma 3 12B10010088482071.0%
Mistral NeMO1008863604170.5%
Grok 4 Fast1008165503666.4%
Llama 3.1 70B1001007151064.4%
MoonshotAI: Kimi K2.5100948148064.4%
GPT-4o Mini (temp=0)100715954056.8%
Llama 3.1 Nemotron 70B100816933056.5%
Ministral 3 8B100974238055.5%
Claude Opus 4.592855724051.5%
GPT-5 Mini80635551751.2%
Ministral 8B10066650046.1%
GPT-4.1 Nano95614133046.0%
Gemma 3 4B86544829043.1%
Claude Sonnet 4.663636013040.1%
GPT-4o, Aug. 6th (temp=0)85613416039.2%
Claude Opus 4.65945245026.6%
Claude Haiku 4.545412420026.1%
GPT-5 Nano351040010.0%
Ministral 3B1200002.3%
GPT-4o Mini (temp=1)000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
o4 Mini High100100100100100100.0%
o4 Mini100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Hermes 3 70B1001001001009899.6%
Gemini 2.5 Flash1001001001009799.4%
GPT-4o, May 13th (temp=0)1001001001009498.9%
GPT-4.110010099989498.4%
Qwen 2.5 72B100100100979498.2%
Qwen 3.5 397B A17B1001001001008897.6%
WizardLM 2 8x22b100100100939296.9%
Grok 4.1 Fast100100100929296.7%
Arcee AI: Trinity Large (Preview)1001001001008096.0%
DeepSeek V3.11001001001008095.9%
Cohere Command R+ (Aug. 2024)100100100888694.9%
ByteDance Seed 1.6 Flash10010099928394.9%
Gemma 3 12B10010098947994.2%
GPT-5100100100888193.9%
GPT-5.11001001001006893.6%
Claude 3.5 Haiku1001001001006893.5%
GPT-4o, Aug. 6th (temp=1)100100100986592.7%
GPT-4o Mini (temp=0)10010092868592.4%
Writer: Palmyra X510010098946992.2%
DeepSeek V3 (2024-12-26)100100100837691.9%
Z.AI GLM 4.51009996917091.2%
GPT-4o, May 13th (temp=1)100100100867091.1%
GPT-5 Mini10010097847190.5%
Claude 3 Haiku100100100836890.4%
Mistral NeMO10010094915988.7%
DeepSeek-V2 Chat100100100925188.5%
Gemini 3 Pro (Preview)1009994905788.0%
Mistral Large 31009283797886.4%
Z.AI GLM 4.61009579787485.3%
Z.AI GLM 4.7 Flash10010089815484.7%
Claude Sonnet 4.510010086716384.3%
GPT-4.1 Nano1009993755283.7%
Mistral Large1001001001001783.3%
Z.AI GLM 4.710010089755283.2%
Gemini 2.5 Pro10010094754182.0%
Rocinante 12B100100100842682.0%
DeepSeek V3.2100100100584580.5%
Claude 3.5 Sonnet100100100100080.0%
Ministral 3B100100100100080.0%
GPT-4o Mini (temp=1)100100100524879.9%
Claude 3.7 Sonnet1008676706779.8%
GPT-5.21009292882679.7%
Llama 3.1 Nemotron 70B10010088812478.5%
Gemma 3 4B968585735278.1%
Mistral Small 3.2 24B100100100463075.4%
Mistral Large 2100929179072.4%
Gemini 2.5 Flash Lite1008977711870.9%
Llama 3.1 8B1001008071070.3%
Mistral Medium 3.11007468564969.2%
Claude Opus 41007760584968.8%
Claude Sonnet 4100867571066.4%
Ministral 3 3B1001009233766.3%
MoonshotAI: Kimi K2.51001009330064.6%
Z.AI GLM 51008876351863.4%
GPT-4o, Aug. 6th (temp=0)937566522662.6%
Stealth: Aurora Alpha957650413960.3%
Gemma 3 27B1005955434159.5%
Gemini 3 Flash (Preview)888352482458.9%
Ministral 3 8B1006157373658.3%
ByteDance Seed 1.6100905918053.3%
Claude Sonnet 4.693767518052.4%
Ministral 8B100824131050.8%
Qwen 3.5 Plus (2026-02-15)78545047045.7%
Grok 4 Fast737138331145.2%
Grok 4704945271741.7%
Ministral 3 14B10054419040.7%
Minimax M2.59290130039.0%
Claude Opus 4.565452721733.1%
Claude Opus 4.666393112330.2%
Mistral Small Creative45442913026.3%
Claude Haiku 4.548930011.9%
GPT-5 Nano4200008.5%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Hermes 3 405B10010095785886.2%
Gemma 3 12B1009190865584.4%
Gemini 3.1 Pro (Preview)1008480797183.0%
o4 Mini1008281735778.5%
GPT-4.1908979705877.2%
DeepSeek-V2 Chat1007869674171.0%
Llama 3.1 8B1009786363370.2%
Mistral Large 2100888773069.5%
GPT-5928170623067.0%
Mistral NeMO1009260362261.9%
Rocinante 12B100908732061.6%
Gemini 2.5 Flash Lite797363484361.2%
Z.AI GLM 4.6786057545260.2%
o4 Mini High867469332858.0%
Qwen 2.5 72B98897420056.3%
Qwen 3.5 397B A17B100696924052.4%
GPT-5.2665049484451.3%
GPT-4o, Aug. 6th (temp=1)81735836750.8%
Mistral Small 3.2 24B100100492050.4%
Gemini 2.5 Pro100525241550.2%
GPT-5.191634835648.7%
Writer: Palmyra X5928830181348.1%
DeepSeek V3.2746043362647.6%
Hermes 3 70B100100380047.6%
Ministral 3 14B10071588047.5%
GPT-4o, May 13th (temp=0)796256181345.6%
Claude 3 Haiku884141372045.3%
Claude 3.7 Sonnet96653315041.9%
Mistral Small Creative8280386041.1%
Ministral 8B10065400041.0%
Gemma 3 27B75733215038.9%
Gemini 2.5 Flash69564814238.0%
Z.AI GLM 4.51003326201138.0%
Ministral 3 8B7959510037.8%
Mistral Medium 3.171575011037.7%
Mistral Large75494616037.2%
DeepSeek V3 (2025-03-24)100352420837.2%
Gemini 3 Pro (Preview)59494131036.0%
GPT-4o Mini (temp=1)1007400034.8%
GPT-4o, May 13th (temp=1)78442621033.9%
GPT-4.1 Mini73661613033.5%
DeepSeek V3.17978100033.5%
Claude Sonnet 4.69458140033.1%
GPT-5 Mini53434029032.9%
Claude Opus 4474332171530.6%
Mistral Large 38344240030.3%
ByteDance Seed 1.6 Flash50453914029.8%
Gemma 3 4B6846248029.1%
Claude Haiku 4.56758110027.2%
Ministral 3 3B6132308026.2%
Cohere Command R+ (Aug. 2024)1002700025.3%
DeepSeek V3 (2024-12-26)6833240025.0%
Llama 3.1 Nemotron 70B754550025.0%
Arcee AI: Trinity Large (Preview)555475024.2%
Z.AI GLM 4.74842300024.0%
Gemini 3 Flash (Preview)4132249722.5%
Llama 3.1 70B4841240022.4%
Claude Sonnet 4.5732970021.8%
Grok 4.1 Fast614200020.7%
Claude Opus 4.5100000020.0%
Arcee AI: Trinity Mini503380018.1%
GPT-4o, Aug. 6th (temp=0)731600017.8%
GPT-4.1 Nano562900016.9%
Z.AI GLM 4.7 Flash631900016.5%
Ministral 3B483000015.5%
Claude Opus 4.6431000010.5%
Z.AI GLM 5381200010.0%
Claude 3.5 Sonnet4800009.5%
GPT-4o Mini (temp=0)29120008.2%
Qwen 3.5 Plus (2026-02-15)3900007.8%
Grok 4 Fast3400006.9%
MoonshotAI: Kimi K2.52600005.2%
ByteDance Seed 1.62400004.8%
Claude 3.5 Haiku2400004.8%
WizardLM 2 8x22b1900003.7%
Claude Sonnet 4830002.2%
Minimax M2.5000000.0%
Grok 4000000.0%
Stealth: Aurora Alpha000000.0%
GPT-5 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 2.5 72B1001001001009398.5%
DeepSeek V3 (2025-03-24)100100100987494.4%
Qwen 3.5 397B A17B100100100908094.0%
Gemma 3 12B1001001001006292.3%
Claude 3 Haiku10010093907591.5%
o4 Mini High10010091867290.0%
Gemini 3.1 Pro (Preview)100100100825687.5%
Claude 3.7 Sonnet1009898786387.4%
GPT-5.11009986836786.9%
DeepSeek V3 (2024-12-26)1009682797686.5%
GPT-510010075747484.5%
Mistral Medium 3.11008983725580.0%
DeepSeek V3.21008679764677.5%
DeepSeek-V2 Chat10010083663676.9%
Gemma 3 27B1008568656376.2%
Mistral Large 21009386761774.5%
GPT-4o, May 13th (temp=1)1009179613974.0%
Mistral Small 3.2 24B1001009661071.4%
GPT-4o, May 13th (temp=0)928578554871.4%
o4 Mini10010055515071.1%
GPT-4.11006766625870.7%
Writer: Palmyra X5100828073968.7%
Claude 3.5 Sonnet898883571466.4%
Gemini 3 Pro (Preview)897467554766.4%
Hermes 3 405B10010010029065.8%
Mistral NeMO100817166063.6%
Rocinante 12B1001007541063.3%
Claude Sonnet 4.51007368482162.0%
Claude Opus 4.593817165062.0%
Z.AI GLM 5100908032060.5%
Gemini 2.5 Flash877471571260.3%
Gemini 2.5 Flash Lite1005958414159.7%
Mistral Large 3957165511158.8%
Llama 3.1 8B10010041391358.5%
Llama 3.1 70B1008349292958.1%
Grok 4.1 Fast100906333357.9%
GPT-5.2837258432556.4%
Z.AI GLM 4.7 Flash655956544856.2%
Gemini 2.5 Pro756159561954.0%
DeepSeek V3.110081816053.8%
Hermes 3 70B100864132051.9%
Ministral 3B1001003227051.7%
GPT-4.1 Mini736355481951.6%
Llama 3.1 Nemotron 70B10085639051.5%
Mistral Small Creative100665534051.1%
GPT-4o, Aug. 6th (temp=1)85605645049.2%
Cohere Command R+ (Aug. 2024)88706219549.1%
Mistral Large706938363148.7%
GPT-5 Mini806654221948.4%
Gemini 3 Flash (Preview)656345392948.4%
Z.AI GLM 4.583755613546.6%
Claude Opus 4976829241546.4%
GPT-4.1 Nano100624721046.0%
Qwen 3.5 Plus (2026-02-15)856232242245.1%
Gemma 3 4B83625222043.7%
Ministral 3 8B804538242442.3%
Claude 3.5 Haiku10063410040.9%
Ministral 3 14B70564136040.6%
Claude Sonnet 4.676434137039.5%
Ministral 8B9860330038.2%
WizardLM 2 8x22b7551500035.4%
Claude Sonnet 466502817833.9%
Grok 4 Fast71382727032.4%
Claude Haiku 4.58150188031.4%
Arcee AI: Trinity Large (Preview)7144245028.9%
Minimax M2.5716570028.6%
Z.AI GLM 4.661242219426.0%
ByteDance Seed 1.6 Flash4742390025.6%
Ministral 3 3B5941154023.8%
Z.AI GLM 4.75239142021.4%
Arcee AI: Trinity Mini3616138014.5%
Grok 4412500013.3%
Claude Opus 4.660500013.0%
MoonshotAI: Kimi K2.5292760012.5%
ByteDance Seed 1.6352400011.7%
GPT-4o Mini (temp=1)52000010.4%
GPT-4o, Aug. 6th (temp=0)3600007.1%
Stealth: Aurora Alpha000000.0%
GPT-5 Nano000000.0%
GPT-4o Mini (temp=0)000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 2.5 72B100100100100100100.0%
Mistral Large100100100999699.1%
o4 Mini High1001001001009398.5%
Grok 41001001001009298.4%
Ministral 3 14B100100100998596.7%
GPT-4.11001001001007895.6%
Gemma 3 27B100100100978095.4%
Gemma 3 12B100100100957694.3%
Mistral Small 3.2 24B1001001001004188.2%
o4 Mini1009793836086.6%
DeepSeek V3 (2024-12-26)10010082787186.2%
Mistral Large 3100100100765486.0%
GPT-4o, May 13th (temp=0)100100100666386.0%
Mistral Small Creative10010098913885.4%
GPT-4o, May 13th (temp=1)100100100893584.8%
Arcee AI: Trinity Large (Preview)10010095666284.6%
Gemini 2.5 Flash949290826584.5%
Ministral 3 3B10010094913784.2%
Claude 3.5 Sonnet10010086716384.1%
Gemini 2.5 Pro100100100823583.3%
Hermes 3 405B100100100862982.9%
GPT-51009786685982.1%
GPT-5.110010087715081.5%
Hermes 3 70B10010090823681.5%
DeepSeek-V2 Chat100100100100681.2%
Mistral Large 210010073735480.0%
Ministral 8B10010091713579.3%
Claude Opus 41009392594577.8%
Mistral NeMO1009778605477.7%
Gemini 3.1 Pro (Preview)10010076674477.4%
Qwen 3.5 397B A17B1009593573475.8%
Mistral Medium 3.11008981574674.7%
GPT-4o, Aug. 6th (temp=0)1009794631273.2%
Arcee AI: Trinity Mini987471675472.8%
Gemini 2.5 Flash Lite1008563585471.9%
Ministral 3B10010060584171.8%
Ministral 3 8B10010091392771.3%
Claude Sonnet 4.5100959365070.7%
Claude Haiku 4.51001008963070.6%
Writer: Palmyra X51001009754070.1%
ByteDance Seed 1.610010056494570.0%
Claude 3 Haiku1008178711969.7%
Claude 3.5 Haiku1008378632469.7%
DeepSeek V3 (2025-03-24)10010010037067.4%
Z.AI GLM 4.610010054493467.3%
DeepSeek V3.2757373585767.2%
Gemma 3 4B100907463867.0%
Rocinante 12B10010054512966.8%
WizardLM 2 8x22b878380482464.3%
Z.AI GLM 4.7998170571263.7%
GPT-4o Mini (temp=0)100787168063.4%
ByteDance Seed 1.6 Flash1009651482163.2%
DeepSeek V3.110010066261761.9%
Grok 4.1 Fast100878339061.8%
Llama 3.1 8B100100989061.4%
GPT-5.2966454484461.0%
Grok 4 Fast826662504460.9%
GPT-4.1 Mini1005656412956.4%
Gemini 3 Flash (Preview)938869161355.8%
Gemini 3 Pro (Preview)1006639393355.4%
GPT-4o, Aug. 6th (temp=1)86747332053.0%
Cohere Command R+ (Aug. 2024)1005641383053.0%
Z.AI GLM 4.7 Flash1006241371350.6%
Z.AI GLM 4.5100100490049.8%
Claude 3.7 Sonnet954646302448.4%
GPT-4o Mini (temp=1)805049312947.6%
Llama 3.1 Nemotron 70B96483838043.9%
GPT-4.1 Nano69595633043.4%
Llama 3.1 70B100100134043.4%
GPT-5 Mini545150361240.4%
Qwen 3.5 Plus (2026-02-15)78554810038.2%
Minimax M2.595512915038.1%
Claude Sonnet 4.68862200034.0%
MoonshotAI: Kimi K2.54848413027.7%
Claude Opus 4.665282217026.3%
Stealth: Aurora Alpha8721152025.0%
Claude Sonnet 433333227024.9%
Z.AI GLM 5100000020.0%
Claude Opus 4.536261611017.6%
GPT-5 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
o4 Mini High100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Rocinante 12B100100100100100100.0%
DeepSeek V3 (2025-03-24)1001001001009899.6%
Mistral NeMO100100100989398.2%
DeepSeek V3 (2024-12-26)1001001001009098.0%
Arcee AI: Trinity Mini1001001001008396.7%
Llama 3.1 70B100100100978195.5%
Hermes 3 405B100100100887893.2%
Claude 3.7 Sonnet1009897868292.6%
Gemini 3.1 Pro (Preview)100100100956291.4%
Gemma 3 12B100100100955790.5%
Qwen 3.5 397B A17B100100100935890.2%
Gemini 2.5 Flash10010095837089.6%
Gemma 3 27B1001001001004889.5%
Arcee AI: Trinity Large (Preview)10010091827088.4%
Gemini 2.5 Pro10010092895587.2%
Writer: Palmyra X51009786826986.6%
Hermes 3 70B10010092696785.5%
Claude 3 Haiku100100100665483.8%
o4 Mini1009178767183.2%
Mistral Large 21009795685182.1%
GPT-510010070686881.2%
Ministral 3 8B100100100594179.8%
WizardLM 2 8x22b1008878716079.4%
Z.AI GLM 4.71009979655179.0%
Claude Opus 41001009888778.6%
Ministral 3 3B1009076715478.2%
Gemini 3 Pro (Preview)1007676686376.8%
Gemini 2.5 Flash Lite898378755776.4%
GPT-4.1 Mini10010080604176.2%
Llama 3.1 Nemotron 70B10010092681975.6%
Claude 3.5 Haiku10010092592474.9%
Grok 4.1 Fast1009285583774.3%
GPT-4o, May 13th (temp=1)1009863623872.3%
GPT-4o, Aug. 6th (temp=0)929163585471.5%
GPT-4o Mini (temp=0)929271683371.1%
Z.AI GLM 4.7 Flash1008879681770.5%
Llama 3.1 8B100908175069.2%
ByteDance Seed 1.6 Flash908662573967.0%
GPT-5.1868571583366.8%
GPT-4.11009459572466.7%
Gemma 3 4B10010059512066.0%
Mistral Large 3929089361965.1%
GPT-5.2857168633764.6%
Ministral 3 14B1008565492164.0%
Ministral 3B1008660373162.9%
DeepSeek-V2 Chat1009067292461.8%
GPT-4.1 Nano1009056352761.5%
Claude 3.5 Sonnet100867150061.4%
Cohere Command R+ (Aug. 2024)1001008224061.2%
Mistral Small Creative1009455282460.2%
GPT-5 Mini969061341859.9%
DeepSeek V3.1987358381857.1%
Mistral Small 3.2 24B1001006218056.0%
Claude Haiku 4.585717051055.3%
Z.AI GLM 4.595595654954.6%
Claude Sonnet 4.59894734053.8%
Mistral Medium 3.173685248048.2%
DeepSeek V3.2784341413848.2%
Z.AI GLM 4.6785948361046.0%
Grok 463634539843.7%
GPT-4o, Aug. 6th (temp=1)63616124042.0%
Mistral Large9756490040.4%
Ministral 8B484848312439.6%
MoonshotAI: Kimi K2.59846430037.5%
Z.AI GLM 575454521037.3%
Claude Sonnet 4.67056430033.8%
Gemini 3 Flash (Preview)7152397033.8%
Minimax M2.566412928032.7%
Claude Sonnet 452383624530.9%
Grok 4 Fast5854290028.2%
GPT-4o Mini (temp=1)902450023.7%
ByteDance Seed 1.6752400019.7%
Stealth: Aurora Alpha6614103018.6%
Claude Opus 4.5473330016.6%
Claude Opus 4.6521430013.8%
Qwen 3.5 Plus (2026-02-15)943003.2%
GPT-5 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-5.11001001001008296.5%
Mistral Large1001001001008096.0%
Qwen 2.5 72B1001001001007695.3%
GPT-4o, May 13th (temp=0)1001001001007695.2%
Claude 3 Haiku1001001001007093.9%
Arcee AI: Trinity Mini1001001001006092.1%
o4 Mini High100100100906791.4%
GPT-510010090857790.5%
Gemma 3 12B1009591868090.4%
Gemini 2.5 Flash100100100796789.2%
Mistral Large 3100100100835988.4%
Ministral 3 8B100100100944587.8%
GPT-4o, May 13th (temp=1)100100100884987.4%
o4 Mini10010092776887.2%
Mistral Large 21009788807187.2%
GPT-4.1918988868186.9%
Gemini 2.5 Pro10010085806886.5%
ByteDance Seed 1.6 Flash10010089785884.9%
Qwen 3.5 397B A17B1009485756884.3%
Mistral NeMO1009288686281.9%
Rocinante 12B10010089764181.2%
Z.AI GLM 510010098743280.8%
Gemma 3 4B1009082686079.9%
DeepSeek-V2 Chat1009593634879.9%
Claude Opus 410010082753879.0%
Mistral Medium 3.1938377736578.1%
DeepSeek V3.110010088653677.9%
Hermes 3 405B100100100612477.0%
Mistral Small 3.2 24B10010010078075.6%
Claude 3.5 Sonnet100100100501973.7%
Ministral 3 14B1007574694472.4%
Claude 3.7 Sonnet1007876693671.9%
Hermes 3 70B1009078454471.4%
GPT-4o, Aug. 6th (temp=1)10010056543568.8%
Z.AI GLM 4.7 Flash988282433668.1%
Z.AI GLM 4.71008969423466.8%
DeepSeek V3.21008169582666.7%
GPT-4o, Aug. 6th (temp=0)837663565466.5%
Gemini 3 Pro (Preview)898976512866.4%
Claude 3.5 Haiku1001007854066.3%
Grok 4.1 Fast1008563453766.1%
Mistral Small Creative100867963065.8%
GPT-4.1 Mini817571574165.0%
Ministral 3 3B100867159564.3%
Cohere Command R+ (Aug. 2024)100946956063.9%
Z.AI GLM 4.610010086181363.2%
Gemma 3 27B1001006841763.1%
Claude Sonnet 4.61008168452062.7%
Arcee AI: Trinity Large (Preview)1001007637062.6%
Writer: Palmyra X595956651061.5%
Llama 3.1 70B1001006641061.4%
Ministral 8B988582281361.1%
DeepSeek V3 (2024-12-26)100816937758.7%
GPT-4o Mini (temp=0)100735941956.3%
Llama 3.1 8B1001005428056.3%
Ministral 3B100906029055.8%
GPT-5.2908739342755.3%
Gemini 3.1 Pro (Preview)755954521851.6%
Grok 497686329051.4%
Claude Sonnet 4.5806957281349.3%
Claude Opus 4.5685646442046.8%
Claude Sonnet 49586419046.3%
Gemini 3 Flash (Preview)875434281844.1%
Llama 3.1 Nemotron 70B786028191740.3%
Claude Opus 4.6634928261937.1%
Minimax M2.568464424036.4%
Claude Haiku 4.563542724033.6%
GPT-4o Mini (temp=1)7844244029.9%
ByteDance Seed 1.67356170029.1%
GPT-5 Mini78271918028.3%
Grok 4 Fast6352188028.1%
Z.AI GLM 4.5864600026.6%
Gemini 2.5 Flash Lite6252130025.3%
GPT-4.1 Nano4324200017.5%
MoonshotAI: Kimi K2.5372430012.9%
Qwen 3.5 Plus (2026-02-15)272400010.2%
Stealth: Aurora Alpha50000010.0%
WizardLM 2 8x22b540001.7%
GPT-5 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-4o Mini (temp=0)100100100999198.0%
DeepSeek-V2 Chat1001001001008296.5%
GPT-4o, May 13th (temp=1)100100100948796.3%
Claude 3 Haiku100100100988195.8%
Qwen 2.5 72B100100100958395.7%
o4 Mini100100100908795.3%
Gemma 3 12B10010098928294.4%
Gemini 2.5 Pro100100100937894.1%
GPT-4.1 Mini100100100927793.7%
GPT-4.11001001001006793.4%
o4 Mini High1001001001006593.0%
Hermes 3 405B100100100936892.1%
Llama 3.1 Nemotron 70B100100100806689.1%
Rocinante 12B1009995787389.0%
Mistral Small 3.2 24B100100100885187.7%
Gemma 3 27B10010090836086.6%
Llama 3.1 70B1008886837185.8%
DeepSeek V3.2100100100804284.5%
GPT-5.11009189686181.8%
Claude 3.7 Sonnet878685786981.1%
DeepSeek V3 (2024-12-26)100100100100080.0%
DeepSeek V3 (2025-03-24)10010071676079.6%
Mistral NeMO100100100652979.0%
Arcee AI: Trinity Mini959076745578.0%
Gemini 2.5 Flash10010010086077.1%
GPT-4.1 Nano10010088504776.9%
Writer: Palmyra X510010063625776.5%
Hermes 3 70B1001009881075.8%
GPT-4o, May 13th (temp=0)1009985633075.6%
Gemini 3.1 Pro (Preview)1008482792974.9%
Mistral Large 21001008785374.9%
Mistral Large100969482074.4%
Ministral 3 14B10010082661973.4%
GPT-4o, Aug. 6th (temp=1)1009369525173.0%
Mistral Medium 3.11009274504872.9%
GPT-5.21008573634372.9%
Claude 3.5 Sonnet10010079444172.8%
Z.AI GLM 4.7878675733871.7%
Ministral 3 8B10010083571771.5%
GPT-5867367676271.0%
ByteDance Seed 1.6 Flash1001007872070.1%
Mistral Large 310010071483069.8%
Gemini 2.5 Flash Lite959075553169.3%
Claude Opus 4858274494967.9%
Gemma 3 4B1001009444067.6%
GPT-5 Mini988878502367.3%
Llama 3.1 8B989465393365.9%
Claude Sonnet 4.5837674514565.8%
DeepSeek V3.1767574584565.7%
Claude 3.5 Haiku100837563064.3%
Ministral 3 3B100916960063.9%
Ministral 8B1008160392961.6%
Cohere Command R+ (Aug. 2024)1008061491761.4%
Ministral 3B100886648060.2%
Mistral Small Creative857365522660.1%
WizardLM 2 8x22b1001001000060.0%
Arcee AI: Trinity Large (Preview)726559564859.9%
Grok 41006351373156.7%
Qwen 3.5 397B A17B92805935854.8%
Z.AI GLM 4.6908149351253.2%
GPT-4o Mini (temp=1)87864732050.4%
Gemini 3 Pro (Preview)82716528249.6%
Gemini 3 Flash (Preview)595854363448.0%
Claude Opus 4.510085520047.5%
Z.AI GLM 4.7 Flash75715532046.5%
Claude Haiku 4.5100703230046.4%
Z.AI GLM 4.589544941046.3%
Claude Sonnet 4.69355334037.1%
GPT-4o, Aug. 6th (temp=0)7957470036.6%
Z.AI GLM 51005880033.2%
Qwen 3.5 Plus (2026-02-15)6966163030.7%
Grok 4.1 Fast906100030.2%
Grok 4 Fast5645430028.8%
Claude Opus 4.6633580021.1%
Stealth: Aurora Alpha452400013.8%
Claude Sonnet 4312400010.9%
ByteDance Seed 1.6900001.8%
MoonshotAI: Kimi K2.5000000.0%
Minimax M2.5000000.0%
GPT-5 Nano000000.0%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B1001001001007695.2%
o4 Mini High10010097777589.8%
GPT-4o, May 13th (temp=0)100100100934988.4%
Gemma 3 27B1009392807086.9%
GPT-5998684837986.3%
Rocinante 12B10010098884285.6%
GPT-4.11009183814179.2%
Mistral Large 21009178764578.1%
GPT-4o, May 13th (temp=1)949285833577.8%
Hermes 3 405B100100100701977.6%
DeepSeek-V2 Chat1009873674777.0%
DeepSeek V3 (2025-03-24)1008371635474.4%
GPT-5 Mini95949482073.0%
Writer: Palmyra X51001008676072.3%
Gemma 3 4B10010063523870.7%
Qwen 2.5 72B938567543967.5%
Gemini 2.5 Flash Lite877571623465.8%
WizardLM 2 8x22b827973573164.5%
GPT-5.1918674403164.4%
Llama 3.1 8B94867857063.0%
DeepSeek V3.11007868521362.2%
Llama 3.1 70B1001007833062.2%
Mistral Medium 3.1100986051062.0%
Z.AI GLM 4.61006665541259.2%
o4 Mini958868241157.2%
Mistral Large 31001006024056.8%
Gemma 3 12B1001005616054.3%
Z.AI GLM 4.598615649554.0%
Ministral 8B83665854052.2%
Gemini 3 Flash (Preview)705745444151.4%
ByteDance Seed 1.6 Flash82795935051.0%
GPT-5.2986849211349.8%
GPT-4o, Aug. 6th (temp=1)100734629049.7%
Claude Sonnet 4.682785913046.3%
Mistral Large82635029746.3%
Z.AI GLM 4.7100772726045.9%
Arcee AI: Trinity Mini86605613944.7%
Mistral Small Creative71565430643.5%
Claude Opus 492654513043.0%
Claude Opus 4.5924638241342.7%
Claude Haiku 4.510057540042.1%
Ministral 3 14B8171417641.3%
Gemini 3 Pro (Preview)8480360040.0%
Cohere Command R+ (Aug. 2024)1004424191340.0%
GPT-4o Mini (temp=0)94433228039.3%
Ministral 3 3B10075200039.0%
Ministral 3 8B86443913036.3%
GPT-4o, Aug. 6th (temp=0)60563826036.0%
DeepSeek V3 (2024-12-26)633933301436.0%
Z.AI GLM 4.7 Flash9050360035.2%
GPT-4.1 Nano733630171734.6%
Claude 3 Haiku51504130034.3%
Mistral Small 3.2 24B867760033.7%
Hermes 3 70B10055130033.6%
Qwen 3.5 Plus (2026-02-15)59494312032.4%
DeepSeek V3.262474111032.1%
Gemini 2.5 Flash67462618031.4%
GPT-4o Mini (temp=1)7051330030.8%
Gemini 2.5 Pro75292914029.5%
Mistral NeMO9433130027.9%
Claude 3.7 Sonnet6752160027.0%
Claude 3.5 Sonnet10019110025.8%
Ministral 3B1001370024.0%
Llama 3.1 Nemotron 70B783300022.2%
Grok 4.1 Fast6132160021.8%
Grok 4 Fast571480015.8%
Z.AI GLM 5413700015.6%
Claude Opus 4.6413300014.8%
Minimax M2.573000014.6%
Arcee AI: Trinity Large (Preview)461400012.1%
Claude 3.5 Haiku54000010.7%
Claude Sonnet 427138009.5%
GPT-4.1 Mini3600007.1%
Claude Sonnet 4.53500006.9%
MoonshotAI: Kimi K2.517130006.0%
Grok 4000000.0%
ByteDance Seed 1.6000000.0%
Stealth: Aurora Alpha000000.0%
GPT-5 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
o4 Mini High100100100100100100.0%
Qwen 2.5 72B1001001001009699.3%
Gemma 3 27B1001001001009398.6%
Gemini 3.1 Pro (Preview)1001001001008396.7%
GPT-5.2100100100919196.5%
GPT-5100100100919096.3%
Hermes 3 405B1001001001006392.7%
o4 Mini1001001001005791.4%
Writer: Palmyra X5100100100717188.6%
GPT-5.1100100100716887.9%
WizardLM 2 8x22b10010087816987.4%
Grok 4.1 Fast10010088786686.4%
Rocinante 12B10010078766884.5%
Gemini 3 Pro (Preview)979280736982.2%
Claude 3 Haiku100100100674381.9%
Mistral Medium 3.1100100100100781.4%
Gemma 3 12B10010083675180.3%
Mistral Large 310010010097580.3%
Ministral 8B100100100861580.2%
GPT-4o, May 13th (temp=1)1009380754979.5%
ByteDance Seed 1.6 Flash10010091654179.4%
Z.AI GLM 4.510010090812679.3%
DeepSeek V3 (2025-03-24)10010071685779.2%
DeepSeek V3.21009680734478.7%
GPT-4o, May 13th (temp=0)1009897712678.4%
Gemini 2.5 Flash Lite999683575477.8%
Cohere Command R+ (Aug. 2024)1001009582075.5%
Llama 3.1 8B1001008886074.7%
Claude Opus 4947675715774.7%
GPT-4.110010088552673.9%
DeepSeek V3 (2024-12-26)10010086681172.9%
Z.AI GLM 4.7 Flash10010066621468.4%
Hermes 3 70B1001008256067.6%
Gemini 2.5 Pro807974623466.0%
Mistral Large 21001007841765.1%
Arcee AI: Trinity Large (Preview)1008278441664.0%
GPT-4.1 Nano938354523463.2%
Arcee AI: Trinity Mini1006652484161.3%
Claude Sonnet 4.5686863565161.2%
GPT-5 Mini996957413760.5%
Gemma 3 4B1009454341559.4%
Gemini 2.5 Flash1009161281759.3%
Claude 3.7 Sonnet10010062221259.1%
GPT-4o Mini (temp=1)1007463411658.9%
Claude Haiku 4.5756358553557.5%
Claude Sonnet 4.6808051492857.4%
Ministral 3 3B100846824756.5%
Mistral Small Creative815958543156.4%
GPT-4o, Aug. 6th (temp=1)1006839363555.8%
Ministral 3 8B100686348055.8%
Gemini 3 Flash (Preview)100846031055.0%
DeepSeek-V2 Chat100806628054.9%
Ministral 3 14B766857542054.8%
Mistral Large717054502854.4%
GPT-4.1 Mini766763382153.1%
Claude 3.5 Haiku100786324053.0%
Minimax M2.597874530051.9%
Z.AI GLM 4.71005038363551.7%
Grok 4 Fast89797019051.3%
Grok 4796855272450.6%
Z.AI GLM 4.6100653936849.6%
Llama 3.1 Nemotron 70B10071700048.2%
Claude Opus 4.6615648373647.6%
Claude 3.5 Sonnet944537312045.5%
Z.AI GLM 577634522943.2%
DeepSeek V3.187812424043.1%
Llama 3.1 70B10076380042.7%
Mistral Small 3.2 24B10086240042.1%
Ministral 3B81634117040.4%
Qwen 3.5 Plus (2026-02-15)59464139037.1%
MoonshotAI: Kimi K2.510043320035.0%
Claude Opus 4.56058560034.8%
GPT-4o, Aug. 6th (temp=0)47462928931.7%
Claude Sonnet 4503426241830.5%
GPT-4o Mini (temp=0)74322010027.3%
ByteDance Seed 1.66241270026.0%
Mistral NeMO383130014.4%
Stealth: Aurora Alpha3400006.9%
GPT-5 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Ministral 8B100100100100100100.0%
Qwen 2.5 72B1001001001009599.0%
o4 Mini1001001001009298.4%
Grok 4.1 Fast1001001001008897.6%
Mistral Large 21001001001008897.6%
o4 Mini High100100100988897.2%
GPT-51001001001008597.1%
GPT-4o, May 13th (temp=0)1001001001008596.9%
Mistral Small Creative1001001001007895.6%
GPT-4.1 Mini100100100908895.6%
Mistral Large 31001001001007595.0%
Mistral Small 3.2 24B1001001001007194.3%
Arcee AI: Trinity Large (Preview)1001001001007194.3%
Ministral 3B1001001001007093.9%
Qwen 3.5 Plus (2026-02-15)1001001001006893.4%
Grok 41001001001006693.1%
Gemini 2.5 Flash100100100887392.1%
Grok 4 Fast100100100827691.6%
Z.AI GLM 4.6100100100867191.4%
Gemini 2.5 Pro100100100867091.3%
GPT-5 Mini10010095807990.9%
Z.AI GLM 4.51001001001005490.7%
Ministral 3 14B100100100955690.3%
GPT-4.1100100100786989.4%
DeepSeek-V2 Chat100100100826389.1%
GPT-5.11001001001004488.8%
Gemma 3 12B10010092925988.5%
GPT-4o Mini (temp=0)1001001001004188.2%
Hermes 3 70B100100100716988.1%
Rocinante 12B100100100696386.5%
Mistral Medium 3.1100100100973586.2%
GPT-5.2100100100923885.9%
Claude 3 Haiku10010092825685.8%
DeepSeek V3 (2024-12-26)1001001001002985.7%
DeepSeek V3.110010081746583.9%
GPT-4.1 Nano10010090666383.8%
Gemma 3 4B1009895952883.3%
Ministral 3 8B10010086824883.0%
DeepSeek V3 (2025-03-24)1001001001001382.6%
DeepSeek V3.21009776686681.4%
Writer: Palmyra X5100100100832080.6%
GPT-4o, May 13th (temp=1)10010076755280.4%
Cohere Command R+ (Aug. 2024)100100100100080.0%
WizardLM 2 8x22b1009278715779.7%
Hermes 3 405B10010010097079.4%
Claude Opus 41009583744178.6%
GPT-4o Mini (temp=1)10010098692478.2%
Llama 3.1 8B10010010091078.1%
Claude 3.7 Sonnet10010083574176.3%
Gemini 3 Flash (Preview)897573706574.4%
ByteDance Seed 1.6 Flash1008281545173.5%
Mistral NeMO10010010061072.2%
Arcee AI: Trinity Mini10010088541771.7%
Gemini 2.5 Flash Lite1001006969067.6%
Z.AI GLM 4.7 Flash1001007557066.4%
Z.AI GLM 4.710010099211266.3%
Claude 3.5 Sonnet1001006359064.5%
Mistral Large1008874371963.5%
Z.AI GLM 51009156442462.9%
GPT-4o, Aug. 6th (temp=1)100957048062.5%
Llama 3.1 70B896959593061.2%
Llama 3.1 Nemotron 70B1007469451560.7%
Gemini 3 Pro (Preview)876562513660.2%
GPT-4o, Aug. 6th (temp=0)1009754371360.1%
Stealth: Aurora Alpha1007462322758.9%
Claude Sonnet 4.5100954836055.7%
Claude Opus 4.51007447362155.6%
Minimax M2.51006651382455.6%
MoonshotAI: Kimi K2.591736217048.5%
Claude Sonnet 4.6100665411547.2%
Claude Sonnet 49476570045.3%
GPT-5 Nano8469550041.5%
Claude Haiku 4.5663838241736.4%
Claude 3.5 Haiku1004100028.2%
Claude Opus 4.6674150022.5%
ByteDance Seed 1.63633180017.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
o4 Mini High100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
GPT-4o, May 13th (temp=0)10010010010010099.9%
GPT-5.11001001001009899.6%
Arcee AI: Trinity Mini1001001001009599.0%
GPT-4o, Aug. 6th (temp=0)1001001001009498.8%
GPT-5.21001001001008997.9%
Llama 3.1 70B1001001001008697.1%
DeepSeek V3 (2025-03-24)1001001001008396.7%
GPT-4.1 Nano100100100988396.3%
Z.AI GLM 4.6100100100977193.5%
GPT-4o, May 13th (temp=1)100100100867391.9%
Gemini 2.5 Pro10010091897991.8%
Qwen 3.5 397B A17B10010089888291.8%
DeepSeek-V2 Chat10010098916390.5%
Z.AI GLM 4.7 Flash100100100916090.1%
Gemma 3 12B1001001001004889.5%
Cohere Command R+ (Aug. 2024)1009789867589.3%
GPT-4o Mini (temp=1)100100100747188.9%
GPT-5 Mini10010090827288.9%
Arcee AI: Trinity Large (Preview)10010095816688.5%
Mistral NeMO10010092836688.3%
Gemini 2.5 Flash10010096806588.3%
Ministral 3 8B10010098835988.1%
Grok 4.1 Fast100100100766387.9%
Claude 3 Haiku1001001001003787.4%
GPT-4.110010093925187.1%
Qwen 2.5 72B100100100716387.0%
Hermes 3 70B100100100785486.3%
GPT-4.1 Mini10010088825484.6%
ByteDance Seed 1.6 Flash10010097863984.3%
Llama 3.1 8B100100100754584.1%
WizardLM 2 8x22b10010095685684.0%
GPT-4o, Aug. 6th (temp=1)10010097932482.7%
Claude Sonnet 4.510010080686081.6%
Gemma 3 4B988883825681.5%
Gemini 3 Pro (Preview)1009593655180.8%
Ministral 3 3B10010088753679.7%
Rocinante 12B100100100712479.0%
Mistral Small Creative1009983822978.4%
DeepSeek V3.210010088564477.7%
Mistral Small 3.2 24B10010010088077.6%
GPT-4o Mini (temp=0)10010081663776.8%
Llama 3.1 Nemotron 70B100100100591975.5%
Claude 3.5 Sonnet10010010068073.5%
Gemini 3 Flash (Preview)887371656071.6%
Mistral Large 2958378594171.2%
Grok 4 Fast100928975071.1%
Claude Haiku 4.510010080611370.8%
Z.AI GLM 4.71008078633070.3%
Stealth: Aurora Alpha1009481501968.9%
Ministral 3B1008883482468.5%
Grok 41007960574568.4%
Z.AI GLM 4.510010010041068.2%
Writer: Palmyra X510010075411966.9%
MoonshotAI: Kimi K2.51001007856066.8%
Mistral Medium 3.11009963412465.4%
Ministral 3 14B1009557501763.8%
Claude 3.7 Sonnet929059542463.6%
Claude Sonnet 4807861444160.8%
Claude Opus 4.6909062392460.8%
Claude Opus 41001005929558.6%
Qwen 3.5 Plus (2026-02-15)886863492458.4%
Gemini 2.5 Flash Lite1006046444158.2%
DeepSeek V3.198716854058.2%
Ministral 8B100956619056.0%
Minimax M2.592716050054.8%
ByteDance Seed 1.6100823730049.9%
Claude Sonnet 4.663616160049.3%
Claude 3.5 Haiku927100032.7%
Z.AI GLM 573413810032.4%
Claude Opus 4.56644170025.5%
GPT-5 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
o4 Mini High100100100100100100.0%
o4 Mini100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
GPT-51001001001009699.2%
Ministral 3 14B1001001001009599.0%
Ministral 3B1001001001009599.0%
GPT-4o, May 13th (temp=0)1001001001008897.5%
Mistral Medium 3.11001001001008697.3%
Gemma 3 27B1001001001008697.1%
Ministral 3 8B100100100948595.6%
GPT-5 Mini1001001001007895.5%
GPT-4.11001001001007695.3%
Writer: Palmyra X51001001001007595.0%
Ministral 8B10010097908694.5%
Gemini 2.5 Pro100100100858293.2%
Claude 3.5 Sonnet1001001001006392.7%
GPT-4o Mini (temp=0)100100100887592.5%
Claude Opus 4100100100956692.3%
ByteDance Seed 1.6 Flash10010098927192.3%
Qwen 2.5 72B100100100867592.1%
Mistral Small Creative10010098887391.9%
DeepSeek V3 (2024-12-26)100100100955790.5%
Mistral Large 210010098906089.7%
Grok 4.1 Fast100100100786989.4%
Ministral 3 3B100100100786889.1%
Cohere Command R+ (Aug. 2024)100100100925489.1%
GPT-5.110010084827888.9%
GPT-5.2100100100786187.8%
GPT-4.1 Mini10010083817587.8%
Arcee AI: Trinity Mini10010090826687.6%
Mistral NeMO100100100696686.9%
Z.AI GLM 4.61009492866186.7%
Llama 3.1 Nemotron 70B100100100864185.3%
DeepSeek V3.2100100100755084.9%
Gemini 3 Pro (Preview)10010096675683.7%
Gemini 2.5 Flash100100100803683.3%
Grok 410010079756283.2%
Mistral Large10010088814181.9%
Claude 3.7 Sonnet10010094803581.6%
Claude Sonnet 410010083754881.2%
Arcee AI: Trinity Large (Preview)1009279676380.3%
Z.AI GLM 4.510010085595780.2%
DeepSeek-V2 Chat100100100515080.1%
Hermes 3 405B100100100100080.0%
Grok 4 Fast100999491076.8%
Claude Sonnet 4.51001009588076.6%
WizardLM 2 8x22b10010090592775.1%
Rocinante 12B10010078633374.9%
GPT-4o Mini (temp=1)1008869635074.0%
Hermes 3 70B10010088542974.0%
Mistral Large 31009071545473.7%
Llama 3.1 8B1001008681073.3%
Z.AI GLM 4.7 Flash959291731473.0%
Gemma 3 12B1007878544871.4%
Claude Opus 4.61008381602970.6%
Mistral Small 3.2 24B10010086561070.3%
Z.AI GLM 51007973563067.8%
Gemini 2.5 Flash Lite1008374462665.8%
Z.AI GLM 4.7908579541965.3%
Gemini 3 Flash (Preview)1001006359565.2%
Gemma 3 4B100909045065.1%
DeepSeek V3.11009150413362.8%
Stealth: Aurora Alpha100946054562.5%
MoonshotAI: Kimi K2.51008361302760.3%
Minimax M2.51001005236057.5%
GPT-4o, Aug. 6th (temp=1)100100788057.1%
Claude 3.5 Haiku1001004141056.3%
Claude Opus 4.5765149493852.5%
Llama 3.1 70B10088710051.9%
Qwen 3.5 Plus (2026-02-15)71716013043.1%
GPT-4o, Aug. 6th (temp=0)80702924040.5%
ByteDance Seed 1.610058440040.4%
Claude Haiku 4.5684133312439.3%
Claude Sonnet 4.636292419021.3%
GPT-5 Nano3480008.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
o4 Mini High100100100100100100.0%
GPT-4.1100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-5.11001001001009498.9%
Mistral Large 21001001001008697.1%
WizardLM 2 8x22b1001001001008496.9%
DeepSeek V3 (2025-03-24)1001001001008396.7%
Gemma 3 27B100100100988596.6%
GPT-51001001001008296.3%
Arcee AI: Trinity Mini1001001001008196.1%
o4 Mini1001001001008096.0%
Grok 4.1 Fast100100100958195.2%
Llama 3.1 70B100100100908394.7%
DeepSeek V3 (2024-12-26)100100100868494.2%
Gemini 3 Flash (Preview)10010090898793.1%
Z.AI GLM 4.7 Flash100100100847491.6%
GPT-4o, May 13th (temp=1)1001001001005891.6%
Ministral 3 8B10010097867591.5%
Gemma 3 4B10010089887891.0%
Claude Opus 4100100100817190.4%
Rocinante 12B100100100975490.1%
Ministral 3 14B100100100836689.9%
Gemini 2.5 Flash Lite10010099757189.1%
Ministral 8B10010098875888.6%
Gemini 2.5 Pro10010089866788.3%
Z.AI GLM 4.610010088827188.3%
Writer: Palmyra X5100100100736888.1%
Gemini 3 Pro (Preview)10010092787188.1%
Mistral Small 3.2 24B100100100746688.0%
GPT-5.210010093875887.8%
Claude 3.7 Sonnet1009889797187.6%
Cohere Command R+ (Aug. 2024)969594906287.4%
GPT-4o Mini (temp=1)10010081787887.3%
Claude 3 Haiku1009392827087.2%
Qwen 2.5 72B1001001001003286.3%
Llama 3.1 Nemotron 70B1009794865485.9%
GPT-4o Mini (temp=0)10010092894585.3%
DeepSeek-V2 Chat10010081776885.0%
Z.AI GLM 4.71009286746884.0%
Z.AI GLM 51009391814782.3%
Ministral 3 3B10010097744182.3%
Gemma 3 12B10010099684181.5%
Mistral NeMO100100100565081.1%
Gemini 2.5 Flash1009083635478.1%
Mistral Medium 3.110010087515077.4%
Arcee AI: Trinity Large (Preview)10010094504176.8%
GPT-4.1 Mini928374676375.9%
Mistral Small Creative1009285584475.7%
DeepSeek V3.1898883635475.5%
DeepSeek V3.2957975656075.0%
Claude 3.5 Sonnet10010080543373.2%
Ministral 3B1009583631771.8%
Claude 3.5 Haiku10010078631370.9%
Qwen 3.5 Plus (2026-02-15)1007773662467.8%
ByteDance Seed 1.6 Flash100887762967.1%
Mistral Large100959241566.5%
GPT-4.1 Nano1009768501565.9%
Llama 3.1 8B100807668064.7%
Claude Sonnet 4.694858256063.3%
Claude Haiku 4.5100946355062.5%
Claude Opus 4.51006259454161.3%
GPT-4o, Aug. 6th (temp=0)10010049331258.6%
GPT-5 Mini868047383757.6%
Minimax M2.51007466321457.2%
Grok 4 Fast817048444056.5%
GPT-4o, Aug. 6th (temp=1)96866030054.3%
Z.AI GLM 4.5666363541852.8%
Claude Sonnet 4100795019049.5%
Stealth: Aurora Alpha70626037045.7%
Hermes 3 70B100662624043.2%
Claude Opus 4.6804940291041.7%
Grok 459423934034.9%
MoonshotAI: Kimi K2.51002413121031.7%
Claude Sonnet 4.559572115030.4%
ByteDance Seed 1.671500015.4%
GPT-5 Nano1990005.6%