AI-ism adverb frequency

Test: Bad Writing Habits

Avg. Score
86.6%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Grok 4.1 Fast97.8%$0.001837.8s93%
2ByteDance Seed 1.6 Flash97.0%$0.001327.3s91%
3Grok 4 Fast94.2%$0.001724.1s86%
4o4 Mini95.4%$0.01525.7s86%
5Stealth: Aurora Alpha92.6%$0.00009.8s84%
6o4 Mini High95.8%$0.02547.2s87%
7Gemini 3 Flash (Preview)92.3%$0.007819.6s84%
8GPT-5 Mini94.0%$0.010057.4s86%
9GPT-4.193.4%$0.01844.7s85%
10Mistral Medium 3.191.9%$0.004836.5s81%
11Mistral Large 390.8%$0.003330.3s80%
12ByteDance Seed 1.696.3%$0.0132.5m89%
13GPT-5 Nano90.9%$0.00421.4m85%
14Qwen 3.5 Plus (2026-02-15)89.6%$0.006031.5s78%
15Mistral Large90.6%$0.01430.9s78%
16Ministral 3 14B88.8%$0.000711.7s75%
17Ministral 8B88.5%$0.000410.4s75%
18Mistral Small Creative87.9%$0.00079.1s75%
19Arcee AI: Trinity Mini88.3%$0.00039.2s73%
20Qwen 2.5 72B88.7%$0.001036.7s76%
21Z.AI GLM 590.4%$0.00841.2m80%
22Mistral Large 289.4%$0.01329.4s76%
23Ministral 3 8B86.9%$0.000819.6s75%
24Z.AI GLM 4.7 Flash89.3%$0.00171.2m79%
25Ministral 3B87.9%$0.00018.1s70%
26Grok 493.4%$0.0481.7m86%
27DeepSeek V3 (2025-03-24)88.8%$0.001439.4s73%
28Ministral 3 3B87.5%$0.000511.1s70%
29GPT-5.293.6%$0.0561.5m83%
30Qwen 3.5 397B A17B93.6%$0.0143.0m84%
31Llama 3.1 8B87.9%$0.00031.3m75%
32Writer: Palmyra X586.4%$0.01122.0s71%
33Llama 3.1 70B86.7%$0.001529.4s69%
34Gemini 3 Pro (Preview)89.5%$0.05554.4s79%
35Z.AI GLM 4.685.4%$0.006551.5s74%
36Gemini 2.5 Pro87.8%$0.03636.2s74%
37Mistral NeMO83.8%$0.000510.1s68%
38GPT-4o, Aug. 6th (temp=0)85.8%$0.02322.7s72%
39Claude Opus 4.692.2%$0.0781.2m82%
40Claude Sonnet 4.588.3%$0.03538.1s73%
41MoonshotAI: Kimi K2.593.5%$0.0193.2m83%
42Minimax M2.586.1%$0.00341.3m75%
43Z.AI GLM 4.787.1%$0.0101.4m76%
44Rocinante 12B86.3%$0.001438.4s68%
45GPT-595.4%$0.0652.8m87%
46Cohere Command R+ (Aug. 2024)86.5%$0.02052.5s73%
47WizardLM 2 8x22b88.0%$0.00261.8m74%
48Claude Haiku 4.583.4%$0.01121.6s70%
49GPT-5.191.3%$0.0541.8m78%
50DeepSeek V3.285.9%$0.00141.9m74%
51GPT-4o, Aug. 6th (temp=1)83.4%$0.01824.4s69%
52GPT-4o Mini (temp=0)82.7%$0.001234.8s66%
53Llama 3.1 Nemotron 70B82.9%$0.003831.7s66%
54Gemini 2.5 Flash Lite80.4%$0.00099.5s66%
55Gemini 2.5 Flash80.8%$0.005210.6s66%
56Gemma 3 27B82.6%$0.000652.6s66%
57DeepSeek V3.184.8%$0.00201.8m71%
58GPT-4o, May 13th (temp=0)83.6%$0.03514.1s66%
59DeepSeek V3 (2024-12-26)82.6%$0.002154.6s66%
60Claude Sonnet 483.5%$0.03243.7s70%
61GPT-4o Mini (temp=1)80.9%$0.001234.8s65%
62DeepSeek-V2 Chat82.2%$0.002153.3s64%
63Claude Opus 4.586.0%$0.07053.4s73%
64Hermes 3 405B82.0%$0.003253.2s63%
65Arcee AI: Trinity Large (Preview)80.8%$0.000043.6s63%
66Claude 3.5 Sonnet85.1%$0.04835.5s66%
67GPT-4o, May 13th (temp=1)81.2%$0.03314.4s64%
68Hermes 3 70B82.2%$0.00101.2m63%
69Gemma 3 12B79.0%$0.000441.3s62%
70GPT-4.1 Mini78.4%$0.002719.0s60%
71Z.AI GLM 4.578.3%$0.005142.1s62%
72Claude Sonnet 4.680.3%$0.03139.3s61%
73Gemma 3 4B74.1%$0.000220.0s57%
74Gemini 3.1 Pro (Preview)88.5%$0.1071.8m70%
75Claude 3 Haiku74.3%$0.002514.9s54%
76Claude 3.7 Sonnet76.7%$0.04246.7s61%
77Claude 3.5 Haiku76.9%$0.003510.8s44%
78GPT-4.1 Nano67.5%$0.000713.3s50%
79Claude Opus 485.6%$0.2091.4m73%
80Mistral Small 3.2 24B79.3%$0.00695.7m49%
86.62%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast1001001001009398.5%
ByteDance Seed 1.6 Flash100100100979498.1%
ByteDance Seed 1.610010096969597.5%
GPT-5.2989898989697.5%
o4 Mini High1009796969697.1%
Grok 410010096969396.8%
o4 Mini100100100978496.1%
GPT-5 Mini989696959495.8%
Claude Opus 4.61009794949495.7%
Qwen 3.5 397B A17B10010093939295.5%
GPT-51009796938995.2%
Arcee AI: Trinity Mini10010093918994.5%
Gemini 3 Pro (Preview)1009793929094.5%
Mistral Large979796938994.5%
Grok 4 Fast979693939394.5%
Mistral Large 31009595929194.4%
GPT-5.1989494949294.4%
DeepSeek V3 (2024-12-26)100100100898294.2%
Mistral Large 21009595918693.5%
Ministral 3 8B1009595928593.2%
Qwen 3.5 Plus (2026-02-15)1009393918892.9%
GPT-4.11009692898492.4%
Claude Sonnet 4.510010093868392.2%
Gemini 3.1 Pro (Preview)1009690878591.7%
GPT-4o, May 13th (temp=0)1009895927491.7%
DeepSeek V3 (2025-03-24)10010089878191.3%
Gemini 2.5 Pro1009692878191.1%
DeepSeek-V2 Chat10010092897390.7%
Cohere Command R+ (Aug. 2024)10010094916890.6%
Mistral Medium 3.1939290908890.5%
GPT-5 Nano939292898790.3%
Gemini 3 Flash (Preview)979489888390.1%
MoonshotAI: Kimi K2.51009288868590.1%
Z.AI GLM 51009392848089.9%
DeepSeek V3.2939391917989.4%
GPT-4o, Aug. 6th (temp=1)959488868188.9%
Stealth: Aurora Alpha919088888788.9%
Writer: Palmyra X5959088878488.8%
Mistral Small Creative1009592837488.7%
Claude Opus 4929090868688.7%
DeepSeek V3.1939290878088.5%
Llama 3.1 8B959191858088.3%
Ministral 8B949291907287.7%
Ministral 3 3B10010088856587.6%
Z.AI GLM 4.7 Flash938887858387.4%
Claude Opus 4.5949286847386.0%
Claude 3 Haiku959385837185.6%
GPT-4o, Aug. 6th (temp=0)898685858385.5%
Z.AI GLM 4.7969183797384.5%
Claude 3.5 Sonnet10010086696784.5%
WizardLM 2 8x22b938482828084.3%
Qwen 2.5 72B968985767684.2%
Rocinante 12B1008984846484.0%
Ministral 3 14B1009278757383.7%
Claude Sonnet 4.6929190766883.5%
Ministral 3B10010092646183.4%
Arcee AI: Trinity Large (Preview)978983747283.0%
Minimax M2.5929184786782.4%
Hermes 3 70B938481787481.9%
Gemma 3 27B969177756881.4%
Claude 3.5 Haiku10010085724680.7%
GPT-4.1 Nano908781776880.7%
Claude Haiku 4.5938079757279.8%
GPT-4.1 Mini918674737379.6%
Claude Sonnet 4958380766179.3%
Gemini 2.5 Flash838382756878.3%
Z.AI GLM 4.6908376746978.2%
Gemma 3 12B917875727077.2%
Gemini 2.5 Flash Lite817976757376.8%
Llama 3.1 70B908078706376.3%
Z.AI GLM 4.5898375706175.6%
Hermes 3 405B1008376704975.6%
GPT-4o Mini (temp=0)838381686375.3%
Claude 3.7 Sonnet837675717075.0%
GPT-4o Mini (temp=1)868175736075.0%
Mistral NeMO978274625574.2%
GPT-4o, May 13th (temp=1)797674726974.1%
Llama 3.1 Nemotron 70B857674725872.9%
Gemma 3 4B877373676272.4%
Mistral Small 3.2 24B817566666169.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
o4 Mini High100100100100100100.0%
Qwen 3.5 397B A17B10010098969597.8%
Grok 4.1 Fast10010096969597.3%
ByteDance Seed 1.6 Flash10010097969397.2%
DeepSeek V3 (2025-03-24)1001001001008597.0%
GPT-510010098949196.7%
Claude Opus 4.610010095949496.7%
ByteDance Seed 1.6100100100968696.5%
Grok 41009696959296.1%
GPT-5.21009897939195.8%
Gemini 3.1 Pro (Preview)969696969094.8%
o4 Mini1009696968594.7%
Mistral Small 3.2 24B100100100928094.4%
MoonshotAI: Kimi K2.51009594909093.9%
Qwen 3.5 Plus (2026-02-15)979594929193.9%
GPT-5 Mini989795938793.9%
GPT-5.1989896948393.8%
Mistral Medium 3.11009795957993.3%
Writer: Palmyra X51009595918593.3%
Ministral 3 8B1009594908793.2%
Ministral 3B100100100877993.1%
Z.AI GLM 4.7 Flash1009695898592.9%
Z.AI GLM 4.7969493929092.8%
Ministral 8B1009593908692.7%
Mistral Large 21009594878692.6%
Mistral Large 310010091878592.5%
Mistral Small Creative969694908592.5%
GPT-4.1969591918792.2%
Stealth: Aurora Alpha939291919191.7%
Arcee AI: Trinity Large (Preview)1009791878391.5%
Claude Sonnet 4.51009592917991.4%
Z.AI GLM 5969392908591.1%
GPT-5 Nano959390908691.0%
Ministral 3 14B1009493858391.0%
Arcee AI: Trinity Mini1009292918090.9%
WizardLM 2 8x22b969693888290.8%
GPT-4o, May 13th (temp=0)979694858090.4%
Mistral Large969491908090.2%
Claude Opus 4.5929291918289.7%
Claude 3.5 Sonnet1009488877989.5%
Claude Sonnet 41009089887989.2%
Grok 4 Fast1009692817688.9%
Cohere Command R+ (Aug. 2024)949291858388.9%
Claude Opus 4969391867788.3%
Minimax M2.51009391837488.3%
Gemini 3 Pro (Preview)978988868088.2%
Gemini 3 Flash (Preview)949190897788.2%
GPT-4o, Aug. 6th (temp=0)1009085858188.1%
DeepSeek V3.11009693846487.5%
Qwen 2.5 72B1008989797786.9%
Llama 3.1 70B999489846886.7%
DeepSeek V3.2898986858386.2%
GPT-4o Mini (temp=0)1008783827485.1%
Rocinante 12B1009288796584.7%
Claude Haiku 4.5878786837883.9%
Llama 3.1 8B929086846683.5%
Hermes 3 405B1009188805983.5%
Mistral NeMO938885846783.4%
Z.AI GLM 4.6928886777082.6%
GPT-4.1 Mini918383807582.6%
Gemini 2.5 Flash949186865582.5%
DeepSeek V3 (2024-12-26)10010089675582.1%
Gemini 2.5 Flash Lite928584787182.1%
Gemma 3 27B958280777481.7%
Gemma 3 12B929185746681.5%
Claude Sonnet 4.6958676757581.4%
GPT-4o, May 13th (temp=1)868383827381.2%
Gemma 3 4B938178777681.0%
Gemini 2.5 Pro868482807380.9%
GPT-4o, Aug. 6th (temp=1)969079786080.8%
Ministral 3 3B1009175736580.8%
Hermes 3 70B1009783734980.4%
DeepSeek-V2 Chat918779737280.4%
GPT-4o Mini (temp=1)797876767677.3%
Z.AI GLM 4.5878769676575.1%
Claude 3.7 Sonnet867271706773.1%
GPT-4.1 Nano837568676571.8%
Claude 3.5 Haiku1008275523869.4%
Claude 3 Haiku908476593468.9%
Llama 3.1 Nemotron 70B937168565468.4%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100100100100.0%
Claude Opus 4.6100100100979297.9%
GPT-5989898979697.4%
ByteDance Seed 1.6 Flash10010097959196.6%
Gemini 3.1 Pro (Preview)100100100928896.1%
GPT-5.2989696959495.7%
ByteDance Seed 1.610010095948895.6%
GPT-5.11009895929295.3%
Mistral Medium 3.110010097918895.1%
Qwen 3.5 397B A17B979695929094.2%
Grok 4 Fast979797888692.8%
Stealth: Aurora Alpha979797918192.8%
Mistral Small Creative1009696937792.5%
GPT-4.1 Mini1009492888692.1%
DeepSeek V3 (2025-03-24)959291919091.8%
Llama 3.1 70B10010093848191.5%
MoonshotAI: Kimi K2.5969695868591.4%
Arcee AI: Trinity Mini1009690868591.4%
GPT-5 Nano989292918391.0%
GPT-5 Mini989492868490.8%
Mistral Large 21009791877990.8%
Grok 4949389898890.6%
o4 Mini969392878590.5%
Ministral 8B100100100797490.5%
Claude 3.5 Sonnet949392888490.3%
Writer: Palmyra X5969393898090.2%
GPT-4.11009391877990.1%
Z.AI GLM 5979588888290.1%
Ministral 3 3B10010088867690.0%
o4 Mini High1009389868290.0%
Gemini 3 Flash (Preview)949488878789.9%
Mistral Large969590888189.8%
Llama 3.1 8B949088868688.8%
Ministral 3B938988888688.7%
DeepSeek V3.11009389847788.7%
Claude Sonnet 4.6968989868188.2%
Cohere Command R+ (Aug. 2024)959491867588.2%
Qwen 2.5 72B968987868188.1%
Claude Opus 4969284847786.6%
Claude Sonnet 4.5908987868086.4%
Z.AI GLM 4.7 Flash948685858286.4%
Qwen 3.5 Plus (2026-02-15)939187857686.3%
GPT-4o Mini (temp=1)938787828286.2%
Ministral 3 14B949087827986.2%
Rocinante 12B1009582807385.9%
Claude Sonnet 4948787837985.9%
Hermes 3 405B969390846585.7%
Gemini 3 Pro (Preview)908886828185.5%
DeepSeek V3.2908785847985.2%
Z.AI GLM 4.6908885818185.1%
Claude Opus 4.5948989856885.0%
Claude 3.5 Haiku10010085726885.0%
GPT-4o Mini (temp=0)918988867285.0%
DeepSeek-V2 Chat1009583776784.5%
Gemini 2.5 Pro969186787284.4%
WizardLM 2 8x22b938984827384.4%
Claude 3 Haiku1009481767084.3%
Ministral 3 8B959090796684.0%
Llama 3.1 Nemotron 70B958983757583.3%
Z.AI GLM 4.7909089747283.0%
Mistral Large 3868585847582.8%
GPT-4o, May 13th (temp=1)958080797882.5%
GPT-4o, Aug. 6th (temp=0)958880757382.1%
Claude 3.7 Sonnet888785846381.5%
DeepSeek V3 (2024-12-26)1009084765781.4%
Minimax M2.5968783766481.1%
Mistral NeMO948482737080.8%
Hermes 3 70B918976746779.4%
Gemini 2.5 Flash Lite898482776078.8%
GPT-4o, May 13th (temp=0)948383696378.3%
Claude Haiku 4.5868077747277.8%
GPT-4o, Aug. 6th (temp=1)858584666476.9%
Gemma 3 27B848170696974.3%
Mistral Small 3.2 24B10010080751473.7%
Gemini 2.5 Flash858279684471.7%
Arcee AI: Trinity Large (Preview)928874535171.4%
Gemma 3 12B797474656471.1%
Gemma 3 4B747167625966.4%
Z.AI GLM 4.5797664635166.3%
GPT-4.1 Nano736866654463.2%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast1001001001009599.0%
GPT-5100100100989598.6%
Qwen 3.5 397B A17B100100100959598.0%
Mistral Small 3.2 24B1009998979397.5%
Claude Opus 4.61009797969697.3%
ByteDance Seed 1.6 Flash10010096959196.5%
Qwen 3.5 Plus (2026-02-15)1009797949396.1%
o4 Mini High979796959495.7%
Grok 4 Fast1009796968995.7%
Gemini 3.1 Pro (Preview)1009795939395.6%
GPT-5.21009694949395.5%
DeepSeek-V2 Chat10010096918995.3%
GPT-4.1969696969395.3%
Gemini 3 Flash (Preview)979695949495.2%
GPT-5 Mini989694949395.1%
ByteDance Seed 1.610010095888894.3%
Stealth: Aurora Alpha1009695908994.2%
GPT-5.1989695938893.9%
MoonshotAI: Kimi K2.51009595898993.7%
Mistral Medium 3.11009594918793.6%
GPT-4o, Aug. 6th (temp=1)1009594908993.6%
o4 Mini1009793908793.5%
Writer: Palmyra X5969695918893.2%
Qwen 2.5 72B10010094898293.2%
Grok 41009795908393.0%
GPT-4o, Aug. 6th (temp=0)100100100957092.9%
Ministral 8B959592929192.8%
Ministral 3 14B10010091878492.4%
Z.AI GLM 5979392908892.1%
Rocinante 12B949492908891.8%
Claude Sonnet 4.51009691878391.6%
Z.AI GLM 4.7 Flash1009291898691.5%
GPT-5 Nano959391908791.4%
Ministral 3 8B1009592868491.3%
Ministral 3 3B1009089878690.6%
DeepSeek V3.11009686858590.3%
Gemini 3 Pro (Preview)1009291907890.1%
Llama 3.1 70B10010092837489.8%
GPT-4o, May 13th (temp=0)10010095787589.7%
Claude Sonnet 4.61009591818189.7%
Gemini 2.5 Pro969590878089.6%
Claude 3.5 Sonnet939292878289.4%
Z.AI GLM 4.71009491827889.2%
Arcee AI: Trinity Mini1009292827989.1%
DeepSeek V3 (2025-03-24)10010091846989.0%
Mistral NeMO929088858587.9%
Ministral 3B919190887987.8%
Mistral Small Creative959491867387.7%
Hermes 3 70B969487837887.6%
Mistral Large959384838387.6%
Claude Opus 4929188878087.4%
WizardLM 2 8x22b969388817786.9%
GPT-4o, May 13th (temp=1)919189877686.7%
GPT-4o Mini (temp=0)1009184797986.5%
Claude 3 Haiku959585817385.8%
Llama 3.1 Nemotron 70B939285847185.2%
Claude Haiku 4.51009085797084.8%
DeepSeek V3.2918685837784.5%
Mistral Large 3918988847084.4%
Claude Opus 4.5969683816684.2%
Cohere Command R+ (Aug. 2024)958986787384.1%
Gemini 2.5 Flash928685807683.9%
Llama 3.1 8B949390756783.6%
Gemini 2.5 Flash Lite1009278747182.9%
Z.AI GLM 4.6958983786582.0%
Mistral Large 2918684826481.4%
Claude Sonnet 4888382777581.1%
Arcee AI: Trinity Large (Preview)1009075756681.1%
Minimax M2.5888782786580.1%
Gemma 3 4B918676747379.8%
DeepSeek V3 (2024-12-26)848381797079.6%
Gemma 3 27B878480776778.9%
Gemma 3 12B888079786778.2%
GPT-4o Mini (temp=1)1008276696478.1%
Claude 3.7 Sonnet947674727177.4%
GPT-4.1 Mini907771716975.8%
Hermes 3 405B878477696075.2%
Z.AI GLM 4.5827776716674.3%
Claude 3.5 Haiku827957564764.2%
GPT-4.1 Nano757067555264.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100100100100.0%
o4 Mini1001001001009799.4%
ByteDance Seed 1.6 Flash100100100979698.4%
GPT-5.21009896969496.7%
Grok 4 Fast1009796969496.6%
ByteDance Seed 1.610010095929195.6%
GPT-510010096948795.5%
GPT-4.110010096958795.5%
Claude Opus 4.6979695949495.4%
Qwen 3.5 Plus (2026-02-15)1009795929295.3%
o4 Mini High1009796928594.0%
Grok 4979493929093.2%
GPT-5.1979594908892.5%
Gemini 3 Flash (Preview)979795878592.2%
Arcee AI: Trinity Mini959494898992.1%
Qwen 2.5 72B10010087868691.8%
Qwen 3.5 397B A17B979592888791.8%
Mistral Large1009190908991.7%
Stealth: Aurora Alpha1009789888391.1%
Ministral 3B10010090847990.8%
Z.AI GLM 4.7 Flash969594868190.5%
MoonshotAI: Kimi K2.51009488868390.3%
GPT-5 Nano929291888689.8%
DeepSeek V3 (2025-03-24)10010094827389.8%
Ministral 3 14B1009388877989.5%
GPT-5 Mini939391878189.2%
DeepSeek V3.2939388878589.0%
Mistral Large 3969494817989.0%
Z.AI GLM 5969488837988.1%
Llama 3.1 8B969593886888.1%
GPT-4o, Aug. 6th (temp=0)959488867688.1%
Claude Opus 4909090878387.9%
Gemini 2.5 Pro928887878587.9%
Z.AI GLM 4.7929290847987.6%
Ministral 8B1009695747387.6%
Hermes 3 405B10010089816887.5%
Gemini 3.1 Pro (Preview)1009390826986.9%
Llama 3.1 Nemotron 70B948988828186.9%
Claude Sonnet 4.5969386847486.6%
GPT-4o, Aug. 6th (temp=1)919089867786.5%
Claude 3.5 Sonnet938886857986.0%
Mistral Medium 3.1968885837885.9%
Rocinante 12B918986818085.6%
DeepSeek V3 (2024-12-26)1009590806385.5%
Mistral Small Creative969189767585.3%
Writer: Palmyra X5968987797585.2%
Hermes 3 70B948584838085.1%
GPT-4o, May 13th (temp=1)1009290766484.4%
Minimax M2.5938686797884.4%
Mistral Large 2969084777284.0%
Cohere Command R+ (Aug. 2024)1008878787684.0%
GPT-4o Mini (temp=0)888887787783.8%
Gemini 3 Pro (Preview)948886767583.7%
Ministral 3 3B908989767483.7%
DeepSeek-V2 Chat1009079757383.5%
DeepSeek V3.1888786787783.2%
GPT-4.1 Mini1008786855783.0%
Ministral 3 8B969284737082.9%
Claude Sonnet 4898887767282.4%
Claude Haiku 4.5938280797882.4%
Llama 3.1 70B1008280757482.2%
Mistral NeMO948582757482.0%
WizardLM 2 8x22b928580776980.7%
Claude 3.7 Sonnet908680766980.1%
Claude Sonnet 4.6878580777280.0%
GPT-4o, May 13th (temp=0)898180757379.8%
Claude Opus 4.5868577747278.8%
Z.AI GLM 4.6907875747077.4%
Gemini 2.5 Flash868481775877.2%
GPT-4o Mini (temp=1)858276727177.2%
Gemini 2.5 Flash Lite858282736377.0%
Z.AI GLM 4.5898677676577.0%
Mistral Small 3.2 24B998278774977.0%
Gemma 3 27B888275756576.9%
Claude 3 Haiku827873716673.9%
Gemma 3 12B817774676572.9%
GPT-4.1 Nano797269625467.4%
Arcee AI: Trinity Large (Preview)888263584567.1%
Claude 3.5 Haiku78756954055.2%
Gemma 3 4B706646424052.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100100100100.0%
Ministral 3 3B100100100100100100.0%
o4 Mini1001001001009799.4%
o4 Mini High1001001001009799.4%
Arcee AI: Trinity Mini1001001001009598.9%
GPT-510010099989798.9%
Claude Opus 4.6100100100979798.7%
GPT-5.11009898989898.6%
DeepSeek V3 (2025-03-24)1001001001009398.5%
Grok 4 Fast10010097979798.2%
GPT-5.210010098989598.1%
ByteDance Seed 1.61001001001009098.1%
GPT-4.1100100100969397.8%
GPT-5 Mini989898969496.8%
ByteDance Seed 1.6 Flash10010096969296.7%
Ministral 3B100100100939096.7%
MoonshotAI: Kimi K2.510010096959296.6%
Ministral 8B100100100929096.4%
Z.AI GLM 5100100100938996.3%
Mistral Large 210010096958996.0%
Qwen 3.5 Plus (2026-02-15)1009796939295.7%
Claude Sonnet 41009695959295.7%
Claude Opus 4.510010096968695.5%
GPT-5 Nano979794949495.2%
Hermes 3 70B100100100948195.0%
Writer: Palmyra X51009696948995.0%
Gemini 3 Flash (Preview)1009797929095.0%
Claude Sonnet 4.51009696958795.0%
Grok 4979796949094.8%
Stealth: Aurora Alpha979696968794.5%
WizardLM 2 8x22b1009696919094.4%
Ministral 3 14B1009695938894.3%
Mistral Medium 3.11009795909094.2%
Qwen 3.5 397B A17B1009393939294.2%
Z.AI GLM 4.7 Flash10010092898893.8%
Gemini 3 Pro (Preview)979696948693.8%
Claude Sonnet 4.6100100100917793.7%
Z.AI GLM 4.61009696957993.2%
DeepSeek-V2 Chat959594938892.9%
Gemini 2.5 Pro1009594918492.8%
Rocinante 12B10010094908092.8%
Gemma 3 27B1009594928192.5%
Gemini 3.1 Pro (Preview)1009796927792.4%
Llama 3.1 Nemotron 70B10010094848392.3%
DeepSeek V3.2969391918891.9%
Minimax M2.5969291909091.9%
Claude 3.5 Sonnet10010094867891.5%
Ministral 3 8B979692868691.4%
Llama 3.1 70B1009292888491.2%
GPT-4o Mini (temp=1)1009188878790.9%
Llama 3.1 8B1009592907790.8%
Mistral Small Creative959590888290.2%
Claude Opus 4969490878289.9%
Cohere Command R+ (Aug. 2024)959494897689.7%
Z.AI GLM 4.7978988878689.3%
GPT-4o, Aug. 6th (temp=0)959087868688.7%
Claude Haiku 4.5939287868588.7%
Claude 3.5 Haiku10010086867088.4%
Mistral Large 3919088878488.0%
Mistral Large1009484818188.0%
GPT-4o Mini (temp=0)969686817887.5%
Arcee AI: Trinity Large (Preview)949286828086.8%
GPT-4.1 Mini1009089857086.7%
DeepSeek V3 (2024-12-26)928887877986.5%
DeepSeek V3.1938986838186.5%
Gemini 2.5 Flash918989857986.4%
Mistral Small 3.2 24B1009793806286.2%
Gemini 2.5 Flash Lite959185837786.1%
GPT-4o, May 13th (temp=1)929188817886.0%
Qwen 2.5 72B949188827686.0%
Mistral NeMO1009386826284.4%
Gemma 3 4B929289786983.9%
Z.AI GLM 4.5968988786182.6%
GPT-4o, Aug. 6th (temp=1)969184727082.6%
Gemma 3 12B968979777182.2%
Claude 3 Haiku908887736981.5%
GPT-4o, May 13th (temp=0)1008884775981.3%
Hermes 3 405B878582796980.2%
GPT-4.1 Nano908776706978.6%
Claude 3.7 Sonnet878273695974.2%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 1.6100100100100100100.0%
o4 Mini High10010097969397.2%
GPT-5 Mini10010096959497.0%
ByteDance Seed 1.6 Flash1009895949195.4%
o4 Mini10010093938994.9%
Grok 4.1 Fast1009797948594.7%
Grok 4 Fast979594948793.5%
Gemini 3 Flash (Preview)1009793918793.4%
Gemini 2.5 Pro1009493898993.1%
MoonshotAI: Kimi K2.5979795927891.8%
Mistral Large10010091868291.6%
GPT-5999492908291.3%
Stealth: Aurora Alpha969589898791.1%
GPT-5 Nano959391908691.0%
Mistral Large 3969691868690.8%
Mistral Small 3.2 24B10010099797490.4%
Z.AI GLM 5949489888690.2%
Mistral Large 21009691887589.9%
Gemma 3 27B959190878589.7%
Qwen 3.5 397B A17B969289878389.5%
Claude Opus 4.61009593817789.0%
Gemini 3 Pro (Preview)949191868389.0%
Grok 4929189888088.1%
GPT-4.1939189858388.0%
Ministral 3 3B949188848187.7%
Claude Sonnet 4.51009292856887.3%
Qwen 2.5 72B959385827686.2%
Z.AI GLM 4.7 Flash898887868186.1%
WizardLM 2 8x22b1008686857185.6%
Llama 3.1 Nemotron 70B928888857284.8%
Ministral 3 14B969589826184.8%
Ministral 8B928787847384.7%
Ministral 3B1009392865284.5%
Qwen 3.5 Plus (2026-02-15)948585817684.3%
GPT-5.2898882828084.1%
Minimax M2.5898686837583.8%
DeepSeek V3 (2025-03-24)929088787183.8%
Claude Haiku 4.5939184787383.7%
Claude Opus 4.5948981797583.6%
Mistral Medium 3.1948884787483.5%
Ministral 3 8B928783807583.3%
Rocinante 12B1008988716983.3%
GPT-5.1888683807883.0%
DeepSeek V3.2938481807582.9%
GPT-4o, Aug. 6th (temp=0)878582817882.7%
GPT-4o, Aug. 6th (temp=1)929190776282.2%
Gemini 2.5 Flash Lite908785826882.1%
Llama 3.1 8B928988746681.8%
Mistral Small Creative958479787381.8%
GPT-4o, May 13th (temp=1)928480787481.8%
Gemma 3 4B908585777181.7%
Arcee AI: Trinity Mini958483737381.6%
Cohere Command R+ (Aug. 2024)898780807181.3%
Gemma 3 12B918382806981.2%
Claude Sonnet 4.6968677767081.2%
Arcee AI: Trinity Large (Preview)898580767480.9%
Z.AI GLM 4.6888784747180.8%
Claude 3.5 Haiku1008180776780.8%
GPT-4o, May 13th (temp=0)918782806180.2%
Hermes 3 405B898382737380.0%
Claude Opus 4878177767679.5%
Hermes 3 70B958476737079.5%
Z.AI GLM 4.7848482767079.0%
Writer: Palmyra X5838380786978.7%
Mistral NeMO868484756178.2%
DeepSeek V3.1878381716777.8%
Claude 3 Haiku858176766776.9%
GPT-4.1 Mini927373727276.5%
Claude 3.5 Sonnet898778695876.4%
DeepSeek-V2 Chat888481705876.3%
Z.AI GLM 4.5868477775776.2%
Claude Sonnet 4868672686775.9%
Gemini 3.1 Pro (Preview)868072716975.6%
DeepSeek V3 (2024-12-26)787773727274.7%
GPT-4o Mini (temp=1)817977755974.1%
Gemini 2.5 Flash817573726472.8%
Llama 3.1 70B867876724571.4%
Claude 3.7 Sonnet817268676570.6%
GPT-4o Mini (temp=0)777574555066.2%
GPT-4.1 Nano656060484155.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 1.6100100100959097.1%
o4 Mini High10010097979297.0%
Gemini 3 Flash (Preview)1009797939095.3%
ByteDance Seed 1.6 Flash979494939394.3%
MoonshotAI: Kimi K2.51009695898993.7%
Qwen 3.5 397B A17B1009392929193.5%
o4 Mini1009493938793.4%
Gemini 3 Pro (Preview)959592919092.6%
Arcee AI: Trinity Large (Preview)979695878592.1%
Mistral NeMO10010091868391.8%
GPT-5 Mini979392918591.6%
Stealth: Aurora Alpha989589888791.3%
GPT-5959591918391.2%
Mistral Medium 3.1979392888390.7%
Grok 4.1 Fast979290898690.5%
GPT-5 Nano949290898589.9%
GPT-4.11008989868589.9%
Z.AI GLM 4.7 Flash929292868589.4%
Minimax M2.5939390898189.2%
Qwen 2.5 72B1009688867689.0%
Z.AI GLM 5979791827688.5%
Grok 4 Fast949188858488.5%
Z.AI GLM 4.7949087868588.4%
Ministral 8B949392867688.3%
Grok 4949088868288.2%
Llama 3.1 8B939288877887.9%
Mistral Small Creative969289847887.7%
Z.AI GLM 4.6979286857887.5%
Arcee AI: Trinity Mini969491876887.3%
Mistral Large 3959389817887.1%
Mistral Large949387827987.0%
GPT-4o, May 13th (temp=1)939289847787.0%
Rocinante 12B919187867786.7%
Qwen 3.5 Plus (2026-02-15)918988828286.6%
Gemini 3.1 Pro (Preview)918886868186.5%
Hermes 3 70B1009385827386.4%
Mistral Small 3.2 24B1009187787586.3%
Claude Sonnet 4.51009288787185.6%
DeepSeek V3.1908985847985.5%
Claude 3.7 Sonnet979484787485.4%
Claude Sonnet 4939186787885.1%
DeepSeek V3 (2025-03-24)939280808085.1%
Ministral 3 3B959081807984.8%
Ministral 3 8B939285827184.5%
Claude Opus 4.5948784837584.5%
Ministral 3B888685838084.3%
Claude Opus 4948682827784.1%
Mistral Large 2959482767384.1%
GPT-5.2868484848283.9%
GPT-4o Mini (temp=1)938884807583.8%
GPT-4o, Aug. 6th (temp=1)928987866583.8%
WizardLM 2 8x22b908883817683.7%
Ministral 3 14B949086836583.6%
Gemma 3 12B969087776683.1%
DeepSeek V3.2888883807683.0%
GPT-4o, May 13th (temp=0)918987737182.4%
Claude Haiku 4.5908381817782.3%
Z.AI GLM 4.5888785826982.2%
Llama 3.1 70B939384766482.1%
Claude Opus 4.6868484827582.1%
Gemini 2.5 Pro939080757281.8%
Writer: Palmyra X5878785846681.7%
Cohere Command R+ (Aug. 2024)979089676581.5%
Gemini 2.5 Flash Lite898779786880.5%
Claude 3 Haiku918583726979.9%
Gemini 2.5 Flash838180797779.8%
DeepSeek V3 (2024-12-26)938481706478.6%
Gemma 3 4B868278737077.6%
Claude 3.5 Haiku1007978716077.5%
GPT-4o, Aug. 6th (temp=0)878376746576.9%
GPT-4o Mini (temp=0)817776757376.3%
Hermes 3 405B878784764876.3%
GPT-5.1817878736875.6%
Claude 3.5 Sonnet908377715775.5%
DeepSeek-V2 Chat888786595675.5%
Gemma 3 27B787777766774.9%
Llama 3.1 Nemotron 70B1008374615574.7%
GPT-4.1 Mini877772676673.6%
GPT-4.1 Nano868169545268.5%
Claude Sonnet 4.6757369584163.4%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 1.61009897969597.3%
ByteDance Seed 1.6 Flash1009895959596.5%
Grok 4.1 Fast979494949294.2%
GPT-5 Mini1009894898893.9%
Stealth: Aurora Alpha989493929093.5%
Grok 4 Fast1009592908993.4%
o4 Mini High1009793908693.1%
Grok 4979492898591.5%
Llama 3.1 70B939191918890.9%
GPT-4.11009790848190.4%
DeepSeek V3 (2025-03-24)1009589887689.8%
Mistral Large 3959189878389.0%
Ministral 3 3B949489848489.0%
Z.AI GLM 4.6938988878388.1%
Cohere Command R+ (Aug. 2024)979687827888.0%
o4 Mini969087868088.0%
WizardLM 2 8x22b948786858587.5%
Mistral Medium 3.1988685848487.4%
Ministral 8B929187848287.2%
GPT-5 Nano929285858086.8%
Gemini 3 Pro (Preview)908786858486.2%
Z.AI GLM 5908986868086.1%
Claude Opus 4.6939284818186.0%
MoonshotAI: Kimi K2.5979488797185.8%
Rocinante 12B979690747185.6%
Llama 3.1 Nemotron 70B919188827585.4%
Gemini 3 Flash (Preview)918684848285.3%
Z.AI GLM 4.7 Flash918986797784.6%
GPT-5898883818184.6%
Qwen 3.5 397B A17B898784847984.6%
GPT-5.1878584828183.9%
Mistral NeMO928887767383.4%
Claude Opus 4.5858584828083.2%
Llama 3.1 8B928883777683.2%
Mistral Large 2938985826683.2%
Mistral Large898985777583.1%
GPT-4o, Aug. 6th (temp=0)918584847182.9%
Ministral 3 8B918982797382.6%
Gemini 2.5 Pro948581787181.8%
Claude Sonnet 4.5888682777581.7%
Z.AI GLM 4.7928481767581.7%
GPT-5.2908180797881.6%
DeepSeek V3 (2024-12-26)888682816981.2%
Claude Sonnet 4878785747381.0%
DeepSeek V3.2958280747480.9%
Qwen 3.5 Plus (2026-02-15)848480797880.9%
Claude Opus 4968984746180.8%
Qwen 2.5 72B858482817080.4%
GPT-4o Mini (temp=0)918786676679.2%
Gemini 2.5 Flash927978767079.1%
Hermes 3 405B958581745878.5%
Minimax M2.5938972706778.3%
Claude Haiku 4.5828180787178.3%
Ministral 3 14B927877737278.3%
GPT-4.1 Mini868680776278.1%
Mistral Small Creative867976767478.1%
GPT-4o, May 13th (temp=0)908684735878.0%
Mistral Small 3.2 24B918374707077.8%
GPT-4o, May 13th (temp=1)878681805377.2%
Z.AI GLM 4.5858483716276.9%
DeepSeek V3.1878073727076.2%
Ministral 3B947572716675.6%
Writer: Palmyra X5808074746975.4%
GPT-4o, Aug. 6th (temp=1)878372716275.3%
Claude 3.5 Sonnet1008580694275.1%
Hermes 3 70B828074716774.6%
Gemma 3 27B887875715974.3%
Claude 3.7 Sonnet787675746974.3%
Arcee AI: Trinity Mini797272716772.1%
Gemini 2.5 Flash Lite817776695872.1%
Arcee AI: Trinity Large (Preview)797472676571.5%
GPT-4o Mini (temp=1)827671655970.7%
DeepSeek-V2 Chat807570666170.3%
Claude Sonnet 4.6858274555469.8%
Gemma 3 12B777574675469.3%
Claude 3.5 Haiku847971604868.5%
Gemini 3.1 Pro (Preview)868263624868.2%
Gemma 3 4B787869595167.2%
Claude 3 Haiku787371525164.8%
GPT-4.1 Nano706054463352.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 1.6 Flash10010097969597.6%
GPT-4.11009796939395.9%
ByteDance Seed 1.6989897969195.9%
Grok 4979797948894.8%
Stealth: Aurora Alpha1009594939194.5%
Grok 4.1 Fast979793939094.2%
o4 Mini High1009796918694.0%
GPT-5979393938993.0%
Qwen 2.5 72B1009695957992.8%
Gemini 3 Flash (Preview)1009491908992.5%
o4 Mini1009796907892.4%
Z.AI GLM 4.7 Flash979393918892.3%
WizardLM 2 8x22b1009594878592.1%
Llama 3.1 8B1009793908091.9%
Rocinante 12B949391919091.8%
Grok 4 Fast949292928891.6%
Mistral Large 31009691878291.2%
GPT-5 Mini949392898891.1%
Cohere Command R+ (Aug. 2024)1009690898090.9%
GPT-4o, May 13th (temp=0)969490878690.8%
MoonshotAI: Kimi K2.5979592917990.7%
GPT-5 Nano929290908990.4%
Arcee AI: Trinity Mini1009493897590.2%
Ministral 3 8B1009487858389.9%
Claude Opus 4.6939391878689.9%
GPT-4o, Aug. 6th (temp=0)959591838189.2%
Mistral Medium 3.1979292858189.1%
Minimax M2.5949492867988.9%
Llama 3.1 70B1009286867988.6%
Mistral NeMO1009289827988.4%
Gemini 2.5 Flash979688857688.4%
Llama 3.1 Nemotron 70B10010086807688.4%
Mistral Large 21009691787788.3%
Arcee AI: Trinity Large (Preview)969687877688.2%
Z.AI GLM 5939392857888.0%
Mistral Small 3.2 24B929087868387.5%
GPT-4o Mini (temp=0)1009087817887.3%
Qwen 3.5 Plus (2026-02-15)948786858487.3%
Claude Sonnet 4.5918987858487.1%
Ministral 8B1009584837286.9%
Claude Haiku 4.5969287837686.8%
Z.AI GLM 4.6928987878086.8%
Mistral Large919087857986.6%
Qwen 3.5 397B A17B929087867886.5%
Ministral 3 3B898888888086.5%
Gemini 3 Pro (Preview)918987867786.0%
Hermes 3 405B1009382787786.0%
Mistral Small Creative939391876686.0%
GPT-5.2918786848085.8%
Hermes 3 70B1009388757185.5%
Claude Sonnet 4929088817685.3%
DeepSeek V3.1969189856585.0%
Gemma 3 27B928783818184.8%
Ministral 3 14B968784797884.6%
Gemini 2.5 Pro948784807784.5%
Ministral 3B1008888846284.4%
GPT-4o, Aug. 6th (temp=1)959186786983.9%
DeepSeek V3.2908887797683.7%
DeepSeek V3 (2025-03-24)919183777783.7%
Claude 3.5 Haiku1008584777283.6%
Gemini 2.5 Flash Lite979581766983.4%
GPT-5.1878382818183.0%
GPT-4o Mini (temp=1)888685827482.9%
Claude 3.5 Sonnet1009483746282.6%
Claude Opus 4.5868681808082.6%
Claude 3.7 Sonnet949083796582.3%
Z.AI GLM 4.7858584777681.5%
DeepSeek V3 (2024-12-26)918280787581.0%
Claude Sonnet 4.6919077746880.0%
DeepSeek-V2 Chat928776756779.3%
Gemma 3 12B968276746277.9%
Claude Opus 4888375706876.9%
Writer: Palmyra X5928476686476.7%
GPT-4o, May 13th (temp=1)837674717075.0%
Z.AI GLM 4.5858373686073.8%
Gemini 3.1 Pro (Preview)777168646368.5%
GPT-4.1 Mini916762625767.6%
GPT-4.1 Nano898064624467.6%
Claude 3 Haiku766565645965.9%
Gemma 3 4B777350494959.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 1.610010097959397.2%
Grok 4.1 Fast1009794949495.7%
Rocinante 12B10010096958194.4%
ByteDance Seed 1.6 Flash1009694948794.2%
Mistral Medium 3.1959494949093.5%
Llama 3.1 70B1009393938793.3%
o4 Mini1009691908892.9%
GPT-5 Mini969594908792.3%
Grok 41009292908691.9%
GPT-5 Nano919191888789.8%
Grok 4 Fast959392878289.7%
Mistral Large 21009591838089.7%
Mistral Large 3979595817688.9%
Llama 3.1 Nemotron 70B949489838388.6%
MoonshotAI: Kimi K2.5949392838088.5%
Z.AI GLM 5949490857988.4%
Ministral 8B979788807988.2%
Mistral Large968787868588.0%
Hermes 3 405B1009588797787.8%
GPT-4.1938987868387.6%
Qwen 3.5 397B A17B939290857687.4%
DeepSeek V3 (2025-03-24)10010088856487.3%
Claude Opus 4.6929088877987.3%
Gemini 3 Flash (Preview)979491846987.0%
o4 Mini High978986857786.8%
Claude Opus 4.5949189807886.5%
Stealth: Aurora Alpha929189818086.5%
GPT-5938785848386.4%
Llama 3.1 8B938686848185.9%
Qwen 2.5 72B959285787885.9%
Ministral 3 14B918885828085.4%
Claude 3.5 Sonnet958984807584.7%
Ministral 3 8B949084787684.3%
DeepSeek V3.1888785837783.8%
Z.AI GLM 4.7 Flash878684837983.7%
Mistral Small Creative939186846283.3%
GPT-5.2868584837983.3%
WizardLM 2 8x22b919182767582.8%
Gemini 3 Pro (Preview)908685797482.6%
Cohere Command R+ (Aug. 2024)918380797982.3%
Claude Sonnet 4868682807882.3%
Z.AI GLM 4.7908482817281.8%
Gemini 2.5 Pro858383787781.2%
Claude Opus 4898685766780.5%
Ministral 3B888582757180.5%
Mistral NeMO928685776180.1%
Z.AI GLM 4.6878479787380.1%
Arcee AI: Trinity Mini1008376726879.8%
Minimax M2.5848080787679.8%
Ministral 3 3B948873717179.5%
GPT-4o, May 13th (temp=1)857978787779.4%
GPT-4o Mini (temp=0)818181777679.2%
DeepSeek V3.2878376747378.5%
GPT-4o, May 13th (temp=0)868280766878.4%
GPT-4o Mini (temp=1)878776717078.2%
GPT-4o, Aug. 6th (temp=1)878377737078.1%
Claude Haiku 4.5878482785978.0%
Qwen 3.5 Plus (2026-02-15)908375726977.9%
GPT-4o, Aug. 6th (temp=0)868275726876.5%
Gemma 3 27B868277676775.8%
Claude 3.5 Haiku10010077574575.6%
DeepSeek-V2 Chat848179676374.9%
Gemma 3 12B888870646074.1%
DeepSeek V3 (2024-12-26)898367666473.7%
GPT-5.1837671716773.6%
Hermes 3 70B897869666272.7%
Writer: Palmyra X5807675676572.7%
Gemini 2.5 Flash Lite857574705471.8%
Gemini 2.5 Flash847471705971.6%
GPT-4.1 Mini878665645571.5%
Arcee AI: Trinity Large (Preview)827771695470.3%
Z.AI GLM 4.5767574645668.9%
Claude Sonnet 4.5867271655168.7%
Gemini 3.1 Pro (Preview)757169666168.5%
Claude Sonnet 4.6847467644967.6%
Claude 3.7 Sonnet826968605266.1%
Gemma 3 4B807870563763.9%
GPT-4.1 Nano746362343453.4%
Mistral Small 3.2 24B686766352853.0%
Claude 3 Haiku636054403750.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
o4 Mini100100100979698.6%
Grok 4.1 Fast10010097979798.2%
Grok 4 Fast10010098959497.3%
GPT-5 Mini10010098959497.3%
ByteDance Seed 1.6 Flash1009897969597.2%
GPT-4.110010097979096.6%
Gemini 2.5 Pro10010096939396.6%
Grok 410010095949396.4%
WizardLM 2 8x22b100100100988396.3%
o4 Mini High10010094949396.2%
MoonshotAI: Kimi K2.510010094949195.7%
ByteDance Seed 1.610010094939195.7%
Ministral 3B10010098938795.7%
Mistral Large 31009695949395.5%
Gemini 3 Pro (Preview)979797949295.5%
Arcee AI: Trinity Mini1009694949395.4%
GPT-5 Nano989794949395.2%
Z.AI GLM 4.7 Flash979797939295.2%
GPT-51009896919095.0%
GPT-5.2969595949494.8%
DeepSeek V3 (2024-12-26)10010094938594.3%
Gemini 3 Flash (Preview)1009694918994.1%
Claude Opus 4.6989795928893.9%
Qwen 2.5 72B1009795908893.8%
DeepSeek V3.210010094918493.8%
Qwen 3.5 397B A17B1009392919193.6%
Rocinante 12B10010092908492.9%
Minimax M2.5969593918992.9%
GPT-4o, Aug. 6th (temp=0)959595938692.7%
Stealth: Aurora Alpha959491919092.0%
GPT-4o, May 13th (temp=1)1009693927991.9%
Gemma 3 12B969692928391.7%
Z.AI GLM 4.6969392908791.6%
Z.AI GLM 5979793878491.5%
Qwen 3.5 Plus (2026-02-15)1009390898691.5%
Llama 3.1 70B1009493937891.4%
DeepSeek V3.1969594947991.4%
Ministral 3 14B1009588878691.3%
Writer: Palmyra X51009291868691.1%
Claude 3 Haiku1009592907790.6%
Claude Opus 4.5949490898690.6%
Claude Sonnet 4.5969691888290.4%
Mistral Small Creative979795857990.4%
Llama 3.1 8B1009191878290.1%
Cohere Command R+ (Aug. 2024)1009289868390.1%
Ministral 3 8B969089898790.0%
Claude 3.5 Sonnet949089898990.0%
GPT-5.1949491878489.9%
Gemini 2.5 Flash Lite969691917589.7%
Mistral Medium 3.1979390878189.5%
GPT-4o Mini (temp=1)969592927189.3%
Claude Opus 4969388858489.3%
Gemini 2.5 Flash959291858389.2%
Ministral 3 3B10010090846988.6%
Gemma 3 27B969388858288.6%
Mistral Large1009691807588.5%
GPT-4o Mini (temp=0)968989828087.1%
Z.AI GLM 4.7909087858386.8%
Claude Sonnet 4969588876886.7%
Hermes 3 70B1009186787786.5%
DeepSeek V3 (2025-03-24)10010082807086.3%
Ministral 8B958884828286.2%
Gemini 3.1 Pro (Preview)918986867786.0%
Mistral Large 2908787838285.9%
GPT-4o, Aug. 6th (temp=1)959081818185.8%
GPT-4o, May 13th (temp=0)948985818085.8%
Mistral NeMO979681787685.8%
Claude 3.5 Haiku1009089896085.7%
GPT-4.1 Mini959184797785.3%
Claude 3.7 Sonnet919088837585.2%
Claude Haiku 4.5959389856485.2%
Z.AI GLM 4.5918785818084.7%
DeepSeek-V2 Chat1009191805884.2%
Llama 3.1 Nemotron 70B928786807283.6%
Mistral Small 3.2 24B1009790795083.2%
GPT-4.1 Nano909085777082.6%
Hermes 3 405B959384766482.3%
Arcee AI: Trinity Large (Preview)908979777481.8%
Gemma 3 4B918882786981.6%
Claude Sonnet 4.6958677747381.2%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100100100100.0%
GPT-5100100100989799.0%
GPT-5.1100100100989698.9%
GPT-5.210010098989698.3%
o4 Mini100100100969297.7%
ByteDance Seed 1.6 Flash10010097969697.6%
GPT-5 Mini1009898969497.2%
o4 Mini High1009796969596.9%
Mistral Large10010097959196.6%
Mistral Large 21009696959596.5%
Gemini 3.1 Pro (Preview)10010097978896.5%
Stealth: Aurora Alpha1009895959396.1%
MoonshotAI: Kimi K2.5979796959395.3%
Grok 4 Fast1009797938995.2%
Ministral 3 3B100100100948295.0%
Mistral Large 310010093928994.8%
Mistral Medium 3.110010092928994.7%
Qwen 3.5 397B A17B1009796968394.3%
GPT-4.11009696938594.1%
Grok 4989593939093.6%
ByteDance Seed 1.61009796958093.4%
Claude Haiku 4.51009595918292.7%
Z.AI GLM 4.71009291918992.7%
Ministral 3B10010092858492.3%
Writer: Palmyra X51009291908892.1%
Mistral Small Creative979494898291.2%
Gemini 3 Flash (Preview)969392878690.9%
Gemini 3 Pro (Preview)949390898790.5%
Qwen 3.5 Plus (2026-02-15)949488878789.9%
Rocinante 12B1009291877989.7%
DeepSeek V3 (2025-03-24)1009490828189.6%
Llama 3.1 8B989288868389.5%
Minimax M2.5929190898489.2%
Z.AI GLM 51009792837489.1%
Claude 3.5 Haiku100100100737288.9%
GPT-5 Nano918988888688.6%
Ministral 8B1008887858388.5%
Ministral 3 14B928989878488.4%
Gemma 3 27B959591867488.3%
Claude Sonnet 4.5929089858588.2%
Claude Opus 4.5938986868588.0%
Arcee AI: Trinity Mini949187868288.0%
Z.AI GLM 4.61008984848387.9%
Claude Opus 4.6979591807487.1%
Claude Opus 4939188828187.0%
Cohere Command R+ (Aug. 2024)949287857687.0%
Qwen 2.5 72B898888868286.8%
Gemini 2.5 Pro1009185797786.2%
Claude Sonnet 4959085827986.2%
WizardLM 2 8x22b938983838085.7%
Llama 3.1 70B1008685787685.0%
GPT-4o, Aug. 6th (temp=0)998786846884.8%
Z.AI GLM 4.7 Flash968985777283.9%
GPT-4o Mini (temp=0)1008482767483.3%
Claude 3.5 Sonnet1008685737182.8%
Hermes 3 405B948886796682.7%
DeepSeek V3.2909079787682.6%
GPT-4o, May 13th (temp=0)908986787082.5%
Gemma 3 4B918379787681.3%
Ministral 3 8B938580776980.5%
DeepSeek V3 (2024-12-26)898884786580.5%
GPT-4o, Aug. 6th (temp=1)898988835280.2%
Z.AI GLM 4.5918576746878.7%
Mistral NeMO878674737278.5%
DeepSeek V3.1938777726077.9%
Claude Sonnet 4.6908076756877.7%
Gemini 2.5 Flash Lite847676747376.7%
GPT-4.1 Mini827975747076.1%
DeepSeek-V2 Chat848078726575.6%
Claude 3.7 Sonnet828181735775.0%
GPT-4o Mini (temp=1)888173696474.9%
Arcee AI: Trinity Large (Preview)807676746774.6%
Claude 3 Haiku797775756774.5%
Gemma 3 12B888670705674.0%
Llama 3.1 Nemotron 70B847775696073.0%
Gemini 2.5 Flash907970666073.0%
GPT-4o, May 13th (temp=1)928170605871.9%
Hermes 3 70B848355545165.3%
GPT-4.1 Nano707066534160.0%
Mistral Small 3.2 24B756857401450.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6 Flash1001001001009599.0%
GPT-51001001001009498.9%
Qwen 3.5 397B A17B1001001001009398.6%
o4 Mini100100100979698.6%
MoonshotAI: Kimi K2.51001001001009298.5%
GPT-5.2100100100989598.5%
o4 Mini High100100100969397.9%
Gemini 3 Flash (Preview)1009897979196.6%
ByteDance Seed 1.61009797969296.5%
Grok 4 Fast10010097958896.1%
GPT-5.110010096968895.9%
Mistral Medium 3.110010095948995.8%
Claude 3.5 Haiku1001001001007594.9%
Gemini 3.1 Pro (Preview)1009696928994.9%
Mistral Large 310010094908994.5%
Z.AI GLM 51009693918994.0%
Grok 4979595929193.9%
Qwen 3.5 Plus (2026-02-15)1009794908993.9%
Arcee AI: Trinity Mini1009494919093.9%
Z.AI GLM 4.7 Flash1009693928793.8%
Stealth: Aurora Alpha979794938893.6%
Llama 3.1 8B1009993908593.3%
GPT-4.11009792888792.9%
Z.AI GLM 4.7969593918592.2%
Mistral Large100100100867391.8%
Mistral Large 210010092927491.6%
GPT-5 Mini969593928191.4%
Claude Sonnet 4.51009696937291.3%
Ministral 3 14B949493918591.2%
Arcee AI: Trinity Large (Preview)10010095907191.0%
DeepSeek V3 (2025-03-24)10010089838291.0%
Gemini 3 Pro (Preview)949291898890.8%
GPT-5 Nano959290898690.4%
Qwen 2.5 72B959590868389.8%
Z.AI GLM 4.5959392897889.2%
WizardLM 2 8x22b979686848289.1%
Hermes 3 405B939189878589.1%
Gemma 3 12B959590868089.0%
Writer: Palmyra X51009692807788.9%
Claude Opus 4.6939291888088.8%
GPT-4o, May 13th (temp=0)929088888588.5%
Gemini 2.5 Pro959189868088.3%
Ministral 8B919088878588.1%
Llama 3.1 70B1009288817988.0%
Minimax M2.5959287858087.8%
Claude 3.5 Sonnet1009285857687.5%
Mistral NeMO1008685857987.0%
Mistral Small Creative908988868186.8%
Claude Opus 4919191897086.5%
DeepSeek V3.1929185828286.4%
GPT-4o, May 13th (temp=1)888685838284.8%
Gemini 2.5 Flash Lite898885837984.7%
GPT-4o, Aug. 6th (temp=1)919185827484.7%
Llama 3.1 Nemotron 70B1008582797784.5%
DeepSeek-V2 Chat928683817984.3%
DeepSeek V3.2928684837383.6%
GPT-4o Mini (temp=0)1009082826183.1%
Mistral Small 3.2 24B10010085834783.1%
Z.AI GLM 4.6948680797582.9%
Ministral 3 8B928684797482.9%
Gemma 3 27B1009579786082.4%
Claude Haiku 4.5918988796482.4%
GPT-4o, Aug. 6th (temp=0)918785767282.1%
Hermes 3 70B968985696881.3%
DeepSeek V3 (2024-12-26)878782807081.2%
Gemini 2.5 Flash918782737281.1%
GPT-4o Mini (temp=1)878381797681.0%
Ministral 3B868382767680.7%
Claude Opus 4.5888879757080.1%
GPT-4.1 Mini908181747479.9%
Claude Sonnet 4918380737279.8%
Rocinante 12B1008476756379.7%
Ministral 3 3B939076746178.8%
Gemma 3 4B918783696178.3%
GPT-4.1 Nano868281775876.7%
Claude 3.7 Sonnet838078716876.0%
Cohere Command R+ (Aug. 2024)897474726674.8%
Claude Sonnet 4.6857974726073.8%
Claude 3 Haiku886967656069.9%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Haiku100100100100100100.0%
GPT-5.21001001001009799.3%
GPT-5100100100989899.3%
Grok 4.1 Fast1001001001009699.3%
Gemini 3.1 Pro (Preview)100100100979598.5%
GPT-5.1999898979597.4%
o4 Mini High10010095929295.8%
o4 Mini1009696949195.4%
ByteDance Seed 1.6 Flash1009795939195.1%
Claude Opus 4.6989897919194.9%
GPT-4.1969696939394.9%
Qwen 2.5 72B1009695948994.7%
ByteDance Seed 1.6989494949094.1%
Mistral Large10010096918393.9%
Qwen 3.5 397B A17B969696928793.6%
Mistral Medium 3.1969693929193.6%
Grok 4979594938993.5%
MoonshotAI: Kimi K2.5979692929093.4%
Grok 4 Fast989795938292.9%
Stealth: Aurora Alpha989393928992.8%
Ministral 3B959595918792.6%
Llama 3.1 8B1009592898692.4%
Ministral 3 14B969291918991.5%
Mistral Large 31009491878491.3%
Gemini 3 Flash (Preview)989792878391.3%
GPT-5 Nano959191898890.9%
GPT-5 Mini969292918390.8%
Llama 3.1 70B959594937590.4%
Hermes 3 70B1009389878190.1%
Z.AI GLM 4.7949289878489.2%
Mistral Small Creative949286867987.8%
Mistral Large 2949291857687.7%
Qwen 3.5 Plus (2026-02-15)898888878687.7%
DeepSeek V3 (2025-03-24)939392857587.5%
Minimax M2.5979390867287.5%
Gemini 2.5 Pro969390867387.5%
Z.AI GLM 4.7 Flash979088838187.4%
Z.AI GLM 4.6959189837887.3%
Claude Sonnet 4.5938887868287.1%
Gemini 3 Pro (Preview)949087867887.1%
Claude Opus 4968987857887.0%
DeepSeek V3.1938886827785.2%
GPT-4o, Aug. 6th (temp=1)919187857285.2%
Ministral 3 3B1009484787085.0%
Cohere Command R+ (Aug. 2024)929188827184.7%
GPT-4o, Aug. 6th (temp=0)908984827784.5%
Writer: Palmyra X5888883828084.2%
Llama 3.1 Nemotron 70B958583817784.1%
Rocinante 12B1009183806784.0%
WizardLM 2 8x22b928483827783.4%
Z.AI GLM 5928685827283.4%
DeepSeek V3.2898684817783.4%
Claude Sonnet 4.6958781777683.3%
Z.AI GLM 4.5959581786883.2%
Gemma 3 27B1009183766683.1%
Claude Opus 4.5908886836682.5%
DeepSeek V3 (2024-12-26)858584827582.2%
Ministral 3 8B918979777181.4%
GPT-4o Mini (temp=1)928683796481.0%
DeepSeek-V2 Chat969392715380.8%
GPT-4o Mini (temp=0)969676676780.3%
Mistral NeMO858181807480.0%
Hermes 3 405B969284695779.5%
Arcee AI: Trinity Large (Preview)898078757278.7%
Ministral 8B848277757077.7%
Claude Haiku 4.5888476726877.6%
Gemini 2.5 Flash877876757277.6%
Gemini 2.5 Flash Lite848281666675.9%
Arcee AI: Trinity Mini838272726975.6%
Mistral Small 3.2 24B918780655475.5%
Gemma 3 12B817978706775.3%
GPT-4o, May 13th (temp=1)817473717073.8%
Claude 3.7 Sonnet817771706372.5%
Claude 3 Haiku887574734971.8%
Claude 3.5 Sonnet868468615971.7%
Gemma 3 4B928371575571.6%
Claude Sonnet 4868273605170.4%
GPT-4o, May 13th (temp=0)828077664570.0%
GPT-4.1 Mini767063605464.5%
GPT-4.1 Nano746760584460.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
MoonshotAI: Kimi K2.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
GPT-5.110010098989899.0%
GPT-510010098979798.4%
Qwen 3.5 397B A17B10010097969697.8%
GPT-5.21009898959597.1%
o4 Mini High10010097959296.8%
ByteDance Seed 1.61009797979096.4%
Mistral Medium 3.110010096949095.9%
o4 Mini100100100938695.9%
Gemini 3.1 Pro (Preview)10010097968695.8%
Cohere Command R+ (Aug. 2024)100100100958495.8%
Grok 4 Fast979797969295.8%
Gemini 3 Flash (Preview)979796959395.6%
GPT-4o, May 13th (temp=0)10010096928895.2%
GPT-4.1979796939294.9%
Grok 4979794949294.6%
GPT-5 Mini979594949294.5%
Stealth: Aurora Alpha1009794929094.4%
Gemini 2.5 Pro10010096908494.1%
Z.AI GLM 51009793938593.6%
Ministral 3 14B10010093908393.3%
Qwen 3.5 Plus (2026-02-15)969393938993.0%
Arcee AI: Trinity Mini1009392918692.2%
WizardLM 2 8x22b1009494908292.0%
Claude Sonnet 4.51009696927591.7%
Claude Opus 4.6959592898691.5%
Ministral 3 3B10010092837990.9%
Gemini 3 Pro (Preview)949393908490.8%
Ministral 3B10010094867190.2%
Claude Sonnet 4.6959291878690.1%
Mistral Small 3.2 24B999998876790.1%
Llama 3.1 70B1009894867290.0%
Ministral 3 8B929292908390.0%
Mistral Large 31009188888289.7%
GPT-5 Nano929090898889.7%
Ministral 8B1009493857589.2%
Mistral Small Creative959486868589.2%
Mistral NeMO969593867689.0%
Z.AI GLM 4.7 Flash929189868288.0%
Mistral Large949491897287.9%
GPT-4o Mini (temp=0)969190828187.8%
Z.AI GLM 4.7909089848487.5%
Mistral Large 2949292837787.5%
Claude Opus 4.5929291828087.4%
GPT-4o, Aug. 6th (temp=0)949086848287.3%
Gemini 2.5 Flash969283828287.3%
Writer: Palmyra X5929086868287.2%
Z.AI GLM 4.6969085848087.0%
Claude Sonnet 4929191857586.7%
Hermes 3 405B1009389817086.5%
Gemma 3 27B959491777185.6%
Llama 3.1 8B939288856985.5%
DeepSeek V3.2978782817784.8%
Gemini 2.5 Flash Lite939288856784.8%
GPT-4o Mini (temp=1)1009680757284.6%
Claude 3.5 Sonnet1009393865284.6%
DeepSeek V3.1959382797384.3%
Qwen 2.5 72B888484828183.9%
GPT-4o, Aug. 6th (temp=1)1008481807383.5%
Claude Opus 4928986866383.1%
Hermes 3 70B10010075746582.8%
Minimax M2.5978782767182.6%
Arcee AI: Trinity Large (Preview)958483796982.0%
GPT-4.1 Mini898887757081.8%
DeepSeek V3 (2025-03-24)919084727081.3%
DeepSeek-V2 Chat949180766581.2%
GPT-4o, May 13th (temp=1)938779747080.6%
Llama 3.1 Nemotron 70B1008879755980.3%
Rocinante 12B968684755679.1%
Z.AI GLM 4.5919082706379.1%
Claude 3 Haiku868577737379.0%
DeepSeek V3 (2024-12-26)838281777178.7%
Gemma 3 4B848078776977.7%
Claude 3.5 Haiku1008684665177.4%
Claude 3.7 Sonnet928181696277.1%
Gemma 3 12B867774727076.0%
Claude Haiku 4.5887877716575.8%
GPT-4.1 Nano827471685870.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.210010098989898.7%
Grok 4.1 Fast10010097979698.1%
ByteDance Seed 1.6 Flash10010098969697.9%
o4 Mini High100100100969397.9%
GPT-51009898979697.8%
o4 Mini10010096969597.5%
Grok 4 Fast1009797979697.4%
Mistral Large100100100948996.7%
GPT-5.11009896969396.6%
Qwen 3.5 397B A17B1009896949396.2%
Gemini 3 Flash (Preview)979695959495.4%
GPT-5 Mini1009696949095.1%
Ministral 3 14B1001001001007494.7%
MoonshotAI: Kimi K2.510010094928794.7%
Mistral Large 3969594949294.1%
GPT-4.11009696908893.8%
ByteDance Seed 1.6979393939293.6%
Claude Opus 4.6979793928793.1%
Mistral Medium 3.1969692909092.7%
Qwen 2.5 72B1009591898892.6%
Mistral Large 210010094878092.2%
Z.AI GLM 5979391918691.6%
Grok 4979191918691.3%
Claude Sonnet 4.51009692878090.9%
Claude 3.5 Sonnet10010093867690.9%
Gemini 3 Pro (Preview)959390898790.8%
GPT-5 Nano939290908890.6%
WizardLM 2 8x22b1009290868490.3%
Minimax M2.5939292928089.7%
DeepSeek V3.2969493808088.9%
Stealth: Aurora Alpha949087878588.7%
Writer: Palmyra X510010085797988.6%
DeepSeek V3 (2025-03-24)1009594847188.6%
GPT-4o, Aug. 6th (temp=0)959086848387.9%
Mistral Small Creative929290838087.4%
Mistral NeMO959489887187.4%
Claude Haiku 4.5919089848287.3%
Qwen 3.5 Plus (2026-02-15)949187857987.1%
Z.AI GLM 4.6948888858187.1%
Ministral 8B1009483807686.6%
Claude Opus 4.5919188837785.9%
Cohere Command R+ (Aug. 2024)949184847685.9%
Llama 3.1 Nemotron 70B949388876785.8%
Z.AI GLM 4.7 Flash928986837885.6%
Ministral 3B949493777085.6%
Z.AI GLM 4.7948784828085.3%
GPT-4o Mini (temp=0)959085797785.1%
Ministral 3 8B948984837585.0%
Gemma 3 27B1008682817184.1%
Hermes 3 70B949184826783.6%
DeepSeek V3 (2024-12-26)948985826783.3%
Gemini 2.5 Pro928982807383.3%
Ministral 3 3B1009089795783.2%
Gemini 2.5 Flash958786737282.5%
Llama 3.1 8B928986836282.5%
Gemma 3 12B969283717082.3%
Claude Sonnet 4908882796981.7%
Claude Opus 4888681807181.1%
Llama 3.1 70B1009284804980.9%
Z.AI GLM 4.5928782786480.5%
Arcee AI: Trinity Large (Preview)878383757380.3%
DeepSeek V3.1918581776479.8%
Arcee AI: Trinity Mini848281806979.2%
DeepSeek-V2 Chat858278756977.7%
GPT-4o, May 13th (temp=0)918870676676.3%
GPT-4o, Aug. 6th (temp=1)838176726775.6%
GPT-4o Mini (temp=1)868377755575.3%
Hermes 3 405B1007872655473.8%
Claude 3.7 Sonnet908572655072.6%
Gemini 2.5 Flash Lite757371716871.4%
GPT-4o, May 13th (temp=1)948468624370.3%
Rocinante 12B918678761970.0%
Gemma 3 4B857770595769.6%
Claude Sonnet 4.6807069645868.3%
GPT-4.1 Mini747473595567.0%
GPT-4.1 Nano807264605265.6%
Mistral Small 3.2 24B946850463057.6%
Claude 3 Haiku725958513755.5%
Claude 3.5 Haiku1005944411351.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100100100100.0%
GPT-51001001001009899.6%
GPT-5.21001001001009899.5%
Grok 41001001001009799.4%
ByteDance Seed 1.61001001001009799.3%
GPT-4.11001001001009699.2%
o4 Mini High1001001001009699.2%
o4 Mini1001001001009599.0%
MoonshotAI: Kimi K2.51001001001009599.0%
Grok 4 Fast100100100979698.5%
Stealth: Aurora Alpha100100100979598.5%
Claude 3.5 Sonnet1001001001009298.3%
GPT-5 Mini10010098989598.2%
Qwen 3.5 397B A17B100100100969598.2%
GPT-5.11009898989798.1%
ByteDance Seed 1.6 Flash100100100959498.0%
Mistral Large100100100959397.5%
Arcee AI: Trinity Mini100100100949397.4%
Claude Sonnet 4.510010096969396.9%
Gemini 3.1 Pro (Preview)10010096949396.6%
Mistral Large 3100100100948996.5%
Z.AI GLM 510010097939196.0%
Mistral Medium 3.1100100100958495.9%
Ministral 3B100100100928895.9%
Gemini 2.5 Pro10010096968795.8%
GPT-4o Mini (temp=1)1009695959195.5%
Mistral Small Creative10010096958695.3%
Claude Opus 41009695949195.2%
Z.AI GLM 4.71009796968795.1%
Z.AI GLM 4.7 Flash1009696948995.0%
Claude Opus 4.61009595949094.9%
Llama 3.1 8B10010094918994.9%
GPT-4o, Aug. 6th (temp=1)969695959094.3%
WizardLM 2 8x22b1009696928694.1%
Claude Opus 4.5979795918994.0%
Ministral 3 14B1009493939094.0%
Mistral Small 3.2 24B1009991918993.9%
Ministral 3 8B10010091908893.9%
Gemini 3 Flash (Preview)969694948893.7%
Ministral 8B100100100878293.6%
Mistral Large 21009493928893.6%
DeepSeek V3.21009795908593.4%
GPT-5 Nano969594919093.3%
Claude Sonnet 4.61009591909093.3%
Claude Haiku 4.5969695928793.2%
Gemini 3 Pro (Preview)1009795957692.7%
Ministral 3 3B10010092927892.5%
Gemini 2.5 Flash1009492898692.4%
DeepSeek-V2 Chat1009690908592.3%
GPT-4o, May 13th (temp=1)969593918792.3%
Z.AI GLM 4.61009290908892.1%
Rocinante 12B10010092917892.1%
GPT-4o, Aug. 6th (temp=0)1009590888792.0%
Qwen 3.5 Plus (2026-02-15)1009790898492.0%
Writer: Palmyra X510010096927392.0%
Minimax M2.5969390908991.7%
Gemini 2.5 Flash Lite949291908991.0%
Llama 3.1 Nemotron 70B1009492868090.6%
GPT-4o Mini (temp=0)1009691907690.5%
Llama 3.1 70B10010085858190.4%
Gemma 3 27B969190908390.0%
Qwen 2.5 72B959290898489.9%
Cohere Command R+ (Aug. 2024)959594848189.8%
GPT-4.1 Mini1009587858289.7%
DeepSeek V3 (2024-12-26)929090897888.0%
DeepSeek V3.1959386868087.9%
Claude Sonnet 41009586797887.6%
Gemma 3 4B928988887887.0%
DeepSeek V3 (2025-03-24)1009494826286.5%
Hermes 3 405B929191827285.6%
Mistral NeMO959084837685.5%
Z.AI GLM 4.5939287827085.0%
GPT-4o, May 13th (temp=0)1009677767484.6%
Gemma 3 12B1008782817184.0%
Arcee AI: Trinity Large (Preview)948481777782.8%
Hermes 3 70B938280797882.3%
Claude 3.7 Sonnet978781737081.6%
Claude 3 Haiku838381746677.6%
GPT-4.1 Nano848484686577.0%
Claude 3.5 Haiku827761541257.1%