AI-ism adverb frequency

Test: Bad Writing Habits

Avg. Score
87.6%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Grok 4.1 Fast97.7%$0.001837.8s93%
2ByteDance Seed 1.6 Flash97.0%$0.001327.3s91%
3Grok 4 Fast94.2%$0.001724.1s86%
4o4 Mini95.4%$0.01525.7s86%
5Stealth: Aurora Alpha92.6%$0.00009.8s84%
6Inception Mercury 292.6%$0.00327.0s84%
7ByteDance Seed 2.0 Lite98.0%$0.0122.2m95%
8o4 Mini High95.8%$0.02547.2s87%
9Gemini 3 Flash (Preview)92.3%$0.007819.6s84%
10Gemini 3.1 Flash Lite (Preview)91.5%$0.00308.4s81%
11GPT-5 Mini94.0%$0.010057.4s86%
12Qwen 3.5 9B95.3%$0.00111.4m85%
13Qwen 3.5 Flash93.2%$0.002547.5s84%
14Gemini 3 Flash (Preview, Reasoning)91.9%$0.01230.1s84%
15GPT-4.193.4%$0.01844.7s85%
16Mistral Medium 3.191.9%$0.004836.5s81%
17GPT-5.4 Nano (Reasoning, Low)90.1%$0.005520.6s81%
18Qwen 3.5 35B93.6%$0.0181.0m84%
19GPT-5.4 Mini (Reasoning, Low)91.0%$0.01516.8s81%
20GPT-5.4 Nano89.6%$0.005726.3s82%
21GPT-5.4 Mini (Reasoning)92.4%$0.02228.1s81%
22Mistral Large 390.8%$0.003330.3s79%
23GPT-5.4 Mini90.9%$0.01516.8s80%
24GPT-5.4 Nano (Reasoning)89.7%$0.006124.5s80%
25ByteDance Seed 1.696.3%$0.0132.5m89%
26Z.AI GLM 5 Turbo90.9%$0.008133.2s79%
27Qwen 3.5 122B93.2%$0.0251.1m83%
28GPT-5 Nano90.8%$0.00421.4m85%
29Qwen 3.5 Plus (2026-02-15)89.5%$0.006031.5s78%
30Ministral 3 14B88.8%$0.000711.7s75%
31Mistral Large90.6%$0.01430.9s78%
32Ministral 8B88.5%$0.000410.4s75%
33Mistral Small Creative87.8%$0.00079.1s75%
34Stealth: Healer Alpha87.8%$0.000023.7s77%
35Nemotron 3 Nano90.0%$0.00101.1m80%
36Qwen 3 32B90.8%$0.001554.6s77%
37Arcee AI: Trinity Mini88.3%$0.00039.2s73%
38Qwen 2.5 72B88.6%$0.001036.7s76%
39Z.AI GLM 590.4%$0.00841.2m80%
40Qwen 3.5 27B92.8%$0.0201.6m82%
41Mistral Large 289.3%$0.01329.4s76%
42Inception Mercury90.6%$0.01117.6s71%
43Ministral 3 8B86.9%$0.000819.6s75%
44Stealth: Hunter Alpha88.7%$0.000055.0s77%
45Z.AI GLM 4.7 Flash89.3%$0.00171.2m78%
46Nemotron 3 Super90.3%$0.00001.4m78%
47Ministral 3B87.9%$0.00018.1s70%
48Mistral Small 4 (Reasoning)87.0%$0.002230.2s75%
49DeepSeek V3 (2025-03-24)88.7%$0.001439.4s73%
50Ministral 3 3B87.4%$0.000511.1s70%
51Grok 493.4%$0.0481.7m86%
52Mistral Small 486.3%$0.001418.2s72%
53Grok 4.20 (Beta, Reasoning)89.5%$0.03934.0s78%
54GPT-5.293.5%$0.0561.5m83%
55MiniMax M2.787.4%$0.00401.1m76%
56Aion 2.088.6%$0.00641.3m76%
57LFM2 24B87.1%$0.000228.4s69%
58Qwen 3.5 397B A17B93.6%$0.0143.0m84%
59Llama 3.1 8B87.9%$0.00031.3m75%
60Writer: Palmyra X586.3%$0.01122.0s71%
61Llama 3.1 70B86.6%$0.001529.4s69%
62Qwen3 235B A22B Instruct 250786.6%$0.001159.2s73%
63GPT-5.491.7%$0.0491.4m80%
64Gemini 3 Pro (Preview)89.4%$0.05554.4s79%
65GPT-5.4 (Reasoning, Low)92.2%$0.0551.4m79%
66Z.AI GLM 4.685.4%$0.006551.5s73%
67Mistral NeMO83.7%$0.000510.1s68%
68Gemini 2.5 Pro87.8%$0.03636.2s74%
69GPT-4o, Aug. 6th (temp=0)85.8%$0.02322.7s72%
70Claude Sonnet 4.588.3%$0.03538.1s73%
71Claude Opus 4.692.2%$0.0781.2m82%
72MiniMax M2.586.1%$0.00341.3m74%
73Z.AI GLM 4.787.0%$0.0101.4m76%
74MoonshotAI: Kimi K2.593.5%$0.0193.2m83%
75Rocinante 12B86.2%$0.001438.4s67%
76GPT-595.4%$0.0652.8m87%
77Cohere Command R+ (Aug. 2024)86.5%$0.02052.5s72%
78WizardLM 2 8x22b88.0%$0.00261.8m74%
79Claude Haiku 4.583.3%$0.01121.6s69%
80Grok 4.20 (Beta)82.9%$0.01815.8s70%
81GPT-5.191.3%$0.0541.8m78%
82DeepSeek V3.285.8%$0.00141.9m74%
83GPT-4o, Aug. 6th (temp=1)83.4%$0.01824.4s68%
84GPT-4o Mini (temp=0)82.7%$0.001234.8s66%
85Llama 3.1 Nemotron 70B82.9%$0.003831.7s65%
86Gemini 2.5 Flash Lite80.4%$0.00099.5s65%
87Gemini 2.5 Flash80.7%$0.005210.6s65%
88Claude Opus 4.6 (Reasoning)91.0%$0.0881.4m79%
89Gemma 3 27B82.6%$0.000652.6s66%
90DeepSeek V3.184.7%$0.00201.8m71%
91DeepSeek V3 (2024-12-26)82.6%$0.002154.6s66%
92GPT-4o, May 13th (temp=0)83.5%$0.03514.1s66%
93Claude Sonnet 483.4%$0.03243.7s70%
94GPT-4o Mini (temp=1)80.9%$0.001234.8s65%
95Gemini 2.5 Flash Lite (Reasoning)79.3%$0.002830.8s67%
96ByteDance Seed 2.0 Mini93.1%$0.00454.9m84%
97DeepSeek-V2 Chat82.2%$0.002153.3s64%
98Claude Opus 4.586.0%$0.07053.4s73%
99Hermes 3 405B81.9%$0.003253.2s63%
100Arcee AI: Trinity Large (Preview)80.8%$0.000043.6s63%
101GPT-5.4 (Reasoning)93.2%$0.0892.6m81%
102Claude 3.5 Sonnet85.0%$0.04835.5s65%
103GPT-4o, May 13th (temp=1)81.1%$0.03314.4s64%
104Hermes 3 70B82.2%$0.00101.2m63%
105Gemma 3 12B78.9%$0.000441.3s62%
106GPT-4.1 Mini78.3%$0.002719.0s60%
107Z.AI GLM 4.578.2%$0.005142.1s62%
108Gemini 2.5 Flash (Reasoning)77.1%$0.01121.5s60%
109Claude Sonnet 4.6 (Reasoning)84.0%$0.0601.2m67%
110Claude Sonnet 4.680.3%$0.03139.3s61%
111Gemma 3 4B74.0%$0.000220.0s57%
112Gemini 3.1 Pro (Preview)88.4%$0.1071.8m70%
113Claude 3 Haiku74.2%$0.002514.9s54%
114Claude 3.7 Sonnet76.6%$0.04246.7s61%
115Claude 3.5 Haiku76.7%$0.003510.8s44%
116GPT-4.1 Nano67.5%$0.000713.3s49%
117Claude Opus 485.5%$0.2091.4m73%
118Mistral Small 3.2 24B79.3%$0.00695.7m49%
87.61%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 122B100100100100100100.0%
Nemotron 3 Super1001001001009799.4%
Grok 4.1 Fast1001001001009298.5%
GPT-5.410010098979798.5%
ByteDance Seed 1.6 Flash100100100979398.1%
GPT-5.4 (Reasoning)10010098969597.9%
ByteDance Seed 2.0 Lite100100100969397.8%
ByteDance Seed 1.610010096969597.5%
GPT-5.2989898989697.4%
o4 Mini High1009796969697.1%
GPT-5.4 (Reasoning, Low)10010096969397.1%
Grok 410010096969396.8%
GPT-5.4 Mini (Reasoning)979797979596.8%
Gemini 3.1 Flash Lite (Preview)1009696969296.1%
o4 Mini100100100978496.1%
Qwen 3.5 27B100100100909096.0%
Claude Opus 4.6 (Reasoning)10010098919195.9%
GPT-5 Mini989696959495.8%
Claude Opus 4.61009794949495.6%
GPT-5.4 Mini (Reasoning, Low)989796959295.5%
Qwen 3.5 397B A17B10010093939295.5%
Qwen 3.5 Flash100100100948395.3%
Gemini 3 Flash (Preview, Reasoning)1009796939095.3%
GPT-51009796938995.1%
Stealth: Hunter Alpha1009794939295.1%
Qwen 3.5 35B1009796958795.1%
GPT-5.4 Nano (Reasoning)1009893929094.6%
Arcee AI: Trinity Mini10010093918994.5%
Mistral Large979796938994.5%
Gemini 3 Pro (Preview)1009793929094.5%
Grok 4 Fast979693939394.4%
Mistral Large 31009595929194.4%
GPT-5.1989494949294.3%
DeepSeek V3 (2024-12-26)100100100898294.2%
Mistral Large 21009595918693.5%
GPT-5.4 Nano969695928893.5%
Inception Mercury 21009694898893.4%
Qwen 3 32B999594908993.4%
ByteDance Seed 2.0 Mini1009593908993.3%
Ministral 3 8B1009595928593.2%
GPT-5.4 Mini1009690908993.2%
Qwen 3.5 Plus (2026-02-15)1009393918892.9%
GPT-5.4 Nano (Reasoning, Low)949494918892.4%
GPT-4.11009692898492.3%
Claude Sonnet 4.510010093868292.2%
Z.AI GLM 5 Turbo1009291908792.1%
GPT-4o, May 13th (temp=0)1009895927491.7%
Gemini 3.1 Pro (Preview)1009690878591.6%
DeepSeek V3 (2025-03-24)10010089868191.2%
Gemini 2.5 Pro1009692878191.0%
Qwen 3.5 9B959593888391.0%
DeepSeek-V2 Chat10010092897290.6%
Cohere Command R+ (Aug. 2024)10010094916890.5%
Mistral Medium 3.1939290908890.5%
GPT-5 Nano939292898790.3%
Gemini 3 Flash (Preview)979489888290.1%
MoonshotAI: Kimi K2.51009288868590.0%
Z.AI GLM 51009392848089.9%
Qwen3 235B A22B Instruct 2507969290898089.4%
DeepSeek V3.2939391917989.3%
Stealth: Aurora Alpha919088888788.9%
GPT-4o, Aug. 6th (temp=1)959488868188.9%
Writer: Palmyra X5959088878388.8%
Mistral Small Creative1009592837488.7%
Claude Opus 4929090868688.7%
Grok 4.20 (Beta, Reasoning)979189887888.5%
DeepSeek V3.1939290878088.4%
Llama 3.1 8B959191858088.3%
LFM2 24B1009491876988.3%
MiniMax M2.7969588837988.2%
Claude Sonnet 4.6 (Reasoning)969386848188.0%
Ministral 8B949291907287.7%
Ministral 3 3B10010088856587.6%
Z.AI GLM 4.7 Flash938887858387.4%
Nemotron 3 Nano929088868087.0%
Aion 2.0979288857086.4%
Claude Opus 4.5949286847385.9%
Stealth: Healer Alpha938585848285.7%
Claude 3 Haiku959385837185.5%
GPT-4o, Aug. 6th (temp=0)898685858385.5%
Grok 4.20 (Beta)939081818084.9%
Z.AI GLM 4.7969183797384.5%
Claude 3.5 Sonnet10010086696784.4%
WizardLM 2 8x22b938482828084.1%
Qwen 2.5 72B968985767684.1%
Rocinante 12B1008884836383.8%
Ministral 3 14B1009278757383.6%
Claude Sonnet 4.6929190766883.4%
Ministral 3B10010092646183.4%
Arcee AI: Trinity Large (Preview)978983747283.0%
MiniMax M2.5919184786782.3%
Hermes 3 70B938481787481.8%
Gemma 3 27B969177756881.3%
Gemini 2.5 Flash Lite (Reasoning)908582786980.8%
GPT-4.1 Nano908781776880.6%
Mistral Small 4 (Reasoning)848181797880.6%
Mistral Small 4908980796380.4%
Claude 3.5 Haiku10010085724580.4%
Claude Haiku 4.5938079757279.8%
GPT-4.1 Mini908674737379.4%
Claude Sonnet 4958380766179.2%
Gemini 2.5 Flash838382756878.2%
Z.AI GLM 4.6908276746978.1%
Gemma 3 12B917875727077.2%
Inception Mercury1009973635076.9%
Gemini 2.5 Flash Lite817876757376.6%
Llama 3.1 70B908078706376.3%
Z.AI GLM 4.5898375706175.5%
Hermes 3 405B1008376704875.5%
GPT-4o Mini (temp=0)838381676375.3%
GPT-4o Mini (temp=1)868175736074.9%
Claude 3.7 Sonnet837674717074.9%
Mistral NeMO978274625574.1%
GPT-4o, May 13th (temp=1)797674726974.1%
Gemini 2.5 Flash (Reasoning)828074666573.1%
Llama 3.1 Nemotron 70B857674725872.8%
Gemma 3 4B877373676272.3%
Mistral Small 3.2 24B817566656169.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
o4 Mini High100100100100100100.0%
Qwen 3.5 35B1001001001009699.2%
ByteDance Seed 2.0 Lite1001001001009699.2%
Qwen 3.5 397B A17B10010098969597.8%
Qwen 3.5 9B1001001001008897.5%
GPT-5.4 (Reasoning)100100100959397.5%
Grok 4.1 Fast10010096959597.3%
GPT-5.4 Mini10010098969397.2%
ByteDance Seed 1.6 Flash10010097969397.2%
DeepSeek V3 (2025-03-24)1001001001008597.0%
GPT-5.41009897959496.9%
GPT-510010098949196.7%
Claude Opus 4.610010095949496.7%
ByteDance Seed 1.6100100100968696.5%
LFM2 24B100100100968796.5%
Claude Opus 4.6 (Reasoning)1009797949396.3%
Qwen 3.5 27B1009796959496.2%
Grok 41009696959296.0%
GPT-5.21009897939195.8%
GPT-5.4 (Reasoning, Low)1009895939395.6%
Qwen 3.5 Flash979796969295.6%
Gemini 3 Flash (Preview, Reasoning)979796949395.4%
Gemini 3.1 Flash Lite (Preview)1009695939195.0%
GPT-5.4 Mini (Reasoning)979795949294.9%
Gemini 3.1 Pro (Preview)969696969094.8%
o4 Mini1009696968594.7%
Qwen 3 32B1009493939294.5%
Qwen 3.5 122B969595949294.4%
Mistral Small 3.2 24B100100100928094.4%
Inception Mercury 21009594928994.2%
Qwen 3.5 Plus (2026-02-15)979594929193.8%
MoonshotAI: Kimi K2.51009594908993.8%
GPT-5 Mini989795938793.8%
GPT-5.1989896948393.8%
Mistral Medium 3.11009795957993.3%
Writer: Palmyra X51009595918593.2%
Ministral 3 8B1009594908793.2%
Nemotron 3 Super1009692918793.2%
Ministral 3B100100100877993.1%
Z.AI GLM 4.7 Flash1009695898592.9%
Z.AI GLM 4.7969493929092.8%
Z.AI GLM 5 Turbo10010095858492.7%
Ministral 8B1009593908692.7%
Mistral Large 21009594878692.6%
Mistral Large 310010090878592.5%
Stealth: Hunter Alpha949393938992.5%
Mistral Small Creative969694908592.5%
GPT-4.1969591918692.1%
GPT-5.4 Mini (Reasoning, Low)1009593878592.0%
Qwen3 235B A22B Instruct 25071009592908391.8%
Stealth: Aurora Alpha939291919191.7%
Arcee AI: Trinity Large (Preview)1009791878391.5%
Claude Sonnet 4.51009592907891.3%
Claude Sonnet 4.6 (Reasoning)1009595868191.3%
ByteDance Seed 2.0 Mini949292908791.2%
Z.AI GLM 5969392908591.1%
GPT-5.4 Nano (Reasoning)989593898091.0%
GPT-5 Nano959390908691.0%
Ministral 3 14B1009493858390.9%
Arcee AI: Trinity Mini1009292918090.9%
WizardLM 2 8x22b969693878290.8%
GPT-4o, May 13th (temp=0)979694858090.4%
Mistral Small 4 (Reasoning)949392888490.3%
Mistral Large969491908090.2%
GPT-5.4 Nano (Reasoning, Low)929191908790.1%
GPT-5.4 Nano919189898889.7%
Claude Opus 4.5929291918289.6%
Claude 3.5 Sonnet1009388877889.5%
Claude Sonnet 41009089887989.2%
Grok 4 Fast1009692817688.9%
Cohere Command R+ (Aug. 2024)949291858288.9%
Aion 2.0959090878288.8%
Nemotron 3 Nano978988878288.5%
MiniMax M2.51009391837488.2%
Claude Opus 4969391867688.2%
Gemini 3 Pro (Preview)978988868088.1%
GPT-4o, Aug. 6th (temp=0)1009085858188.1%
Gemini 3 Flash (Preview)949189897788.1%
Stealth: Healer Alpha939189838387.9%
MiniMax M2.7929187858287.4%
DeepSeek V3.11009693846487.4%
Mistral Small 41009289886787.3%
Qwen 2.5 72B1008989797786.9%
Llama 3.1 70B999489846886.7%
Grok 4.20 (Beta)948885838286.2%
DeepSeek V3.2898986858286.1%
GPT-4o Mini (temp=0)1008783827485.1%
Grok 4.20 (Beta, Reasoning)949289876284.7%
Rocinante 12B1009288786584.6%
Claude Haiku 4.5878686837883.9%
Gemini 2.5 Flash Lite (Reasoning)928884847183.8%
Llama 3.1 8B929086846583.5%
Hermes 3 405B1009087805983.4%
Mistral NeMO938885846783.4%
Z.AI GLM 4.6928886777082.5%
GPT-4.1 Mini918383807582.5%
Gemini 2.5 Flash949186865582.5%
DeepSeek V3 (2024-12-26)10010089675582.1%
Gemini 2.5 Flash Lite928584787182.0%
Gemma 3 27B958280777481.6%
Gemini 2.5 Flash (Reasoning)958582776981.5%
Gemma 3 12B929185746681.5%
Claude Sonnet 4.6958576757581.3%
GPT-4o, May 13th (temp=1)868383827281.1%
Gemma 3 4B938178767680.9%
Gemini 2.5 Pro868482797380.8%
Ministral 3 3B1009175736580.7%
GPT-4o, Aug. 6th (temp=1)969079786080.7%
Hermes 3 70B1009783734980.4%
DeepSeek-V2 Chat918779737280.3%
Inception Mercury1008681735679.2%
GPT-4o Mini (temp=1)797876767677.3%
Z.AI GLM 4.5878769676575.0%
Claude 3.7 Sonnet867271706773.0%
GPT-4.1 Nano837568676571.7%
Claude 3.5 Haiku1008274523869.1%
Claude 3 Haiku908476593468.9%
Llama 3.1 Nemotron 70B937168565468.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 2.0 Lite1001001001009699.2%
Claude Opus 4.6100100100979297.8%
Qwen 3.5 9B10010098959597.7%
GPT-5989898979697.4%
Claude Opus 4.6 (Reasoning)100100100969197.4%
GPT-5.4989898969496.6%
ByteDance Seed 1.6 Flash10010097959196.6%
GPT-5.4 (Reasoning, Low)1009898969196.4%
Gemini 3.1 Pro (Preview)100100100928896.1%
GPT-5.2989696959495.7%
GPT-5.4 (Reasoning)989896949395.7%
Inception Mercury100100100908995.6%
Qwen 3.5 35B1009795949295.5%
ByteDance Seed 1.610010095948895.5%
GPT-5.11009895929295.3%
Mistral Medium 3.110010097918895.1%
GPT-5.4 Mini (Reasoning)989794939194.6%
Qwen 3.5 397B A17B979695929094.2%
Qwen 3.5 122B1009594919093.9%
Qwen 3.5 Flash979693939093.8%
Qwen 3.5 27B979695919093.8%
Qwen 3 32B10010097868693.7%
GPT-5.4 Mini969693939093.5%
Gemini 3 Flash (Preview, Reasoning)1009493908892.9%
GPT-5.4 Mini (Reasoning, Low)969593928892.9%
Stealth: Aurora Alpha979797918192.8%
Grok 4 Fast979797888692.7%
Inception Mercury 2979593918792.6%
Mistral Small Creative1009696937792.5%
GPT-5.4 Nano959592928892.2%
GPT-4.1 Mini1009492888692.0%
Gemini 3.1 Flash Lite (Preview)1009392888791.9%
DeepSeek V3 (2025-03-24)959291919091.7%
Llama 3.1 70B10010093848091.4%
MoonshotAI: Kimi K2.5969695858591.4%
Arcee AI: Trinity Mini1009690868591.4%
ByteDance Seed 2.0 Mini949492898791.3%
GPT-5.4 Nano (Reasoning)949390908991.2%
GPT-5 Nano989292918391.0%
GPT-5 Mini989492868390.8%
Mistral Large 21009791877990.7%
Grok 4949389898890.5%
o4 Mini969392878590.5%
Ministral 8B100100100797490.4%
Claude 3.5 Sonnet949392888490.3%
Writer: Palmyra X5969393898090.1%
Z.AI GLM 5979588888290.0%
Ministral 3 3B10010088867690.0%
GPT-4.11009391877990.0%
o4 Mini High1009389868290.0%
Grok 4.20 (Beta, Reasoning)1009587868290.0%
Gemini 3 Flash (Preview)949488878789.9%
Mistral Large969590888189.8%
GPT-5.4 Nano (Reasoning, Low)969089888289.2%
Nemotron 3 Nano989784848389.0%
Nemotron 3 Super959591838088.9%
Llama 3.1 8B939088868688.8%
Ministral 3B938988888688.7%
DeepSeek V3.11009389847788.6%
Claude Sonnet 4.6968989868188.2%
Cohere Command R+ (Aug. 2024)959491867588.1%
Qwen 2.5 72B968987868188.1%
Aion 2.0928988878387.9%
Mistral Small 4 (Reasoning)959088838387.8%
Stealth: Healer Alpha928987878387.6%
MiniMax M2.7949390827887.4%
Grok 4.20 (Beta)1009485827486.9%
Claude Opus 4969284847686.5%
Claude Sonnet 4.5908987868086.4%
Z.AI GLM 4.7 Flash948685858286.3%
Qwen 3.5 Plus (2026-02-15)939187857686.2%
Claude Sonnet 4.6 (Reasoning)969686817286.2%
GPT-4o Mini (temp=1)938787828286.1%
Ministral 3 14B949087827886.1%
Claude Sonnet 4948787837985.9%
Rocinante 12B1009582807385.9%
Hermes 3 405B969390846585.7%
Stealth: Hunter Alpha1009180797885.5%
Gemini 3 Pro (Preview)908886828185.5%
DeepSeek V3.2908785837985.1%
Z.AI GLM 4.6908885818085.0%
Claude Opus 4.5948989856885.0%
GPT-4o Mini (temp=0)918988857284.9%
Claude 3.5 Haiku10010085726784.8%
Qwen3 235B A22B Instruct 25071009078787684.5%
DeepSeek-V2 Chat1009583776784.5%
Gemini 2.5 Pro969186787184.3%
WizardLM 2 8x22b938984827384.3%
Claude 3 Haiku1009481767084.2%
Ministral 3 8B959090796684.0%
Z.AI GLM 5 Turbo898783797883.3%
Llama 3.1 Nemotron 70B958982757583.2%
Z.AI GLM 4.7909088747282.9%
Mistral Large 3868585847582.8%
GPT-4o, May 13th (temp=1)958080797882.4%
GPT-4o, Aug. 6th (temp=0)958880757382.1%
Mistral Small 4969587686482.0%
Claude 3.7 Sonnet888785846381.4%
DeepSeek V3 (2024-12-26)1009084765781.3%
MiniMax M2.5968782766481.0%
Mistral NeMO948482737080.7%
LFM2 24B959088854279.8%
Hermes 3 70B918975746779.3%
Gemini 2.5 Flash Lite898482776078.7%
GPT-4o, May 13th (temp=0)948383696378.2%
Claude Haiku 4.5868077747277.7%
GPT-4o, Aug. 6th (temp=1)858584666476.8%
Gemini 2.5 Flash Lite (Reasoning)868179706075.0%
Gemma 3 27B848170696974.3%
Mistral Small 3.2 24B10010080751473.7%
Gemini 2.5 Flash (Reasoning)838073655972.1%
Gemini 2.5 Flash858279684471.6%
Arcee AI: Trinity Large (Preview)928874535071.3%
Gemma 3 12B797474656471.0%
Gemma 3 4B737067625866.3%
Z.AI GLM 4.5797664635166.2%
GPT-4.1 Nano736866654463.1%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Inception Mercury1001001001009699.3%
Grok 4.1 Fast1001001001009599.0%
GPT-5.4 (Reasoning)100100100989799.0%
GPT-5100100100989598.6%
Qwen 3.5 122B100100100979698.6%
GPT-5.4 (Reasoning, Low)1001001001009398.6%
ByteDance Seed 2.0 Lite100100100969698.3%
Qwen 3.5 35B10010098979798.3%
Qwen 3.5 397B A17B100100100959598.0%
GPT-5.4 Mini (Reasoning)10010097979598.0%
Mistral Small 3.2 24B1009998979397.5%
Claude Opus 4.61009797969697.3%
Inception Mercury 210010097959296.8%
GPT-5.4989897969596.7%
ByteDance Seed 1.6 Flash10010096959196.5%
Qwen 3.5 Plus (2026-02-15)1009797949396.1%
o4 Mini High979796959495.7%
Grok 4 Fast1009796968995.6%
Gemini 3.1 Pro (Preview)1009695939395.6%
GPT-5.21009694949395.4%
Qwen 3.5 Flash1009696968895.4%
DeepSeek-V2 Chat10010096918995.3%
GPT-4.1969696969395.3%
Gemini 3 Flash (Preview)979695949495.2%
GPT-5 Mini989694949395.1%
Claude Opus 4.6 (Reasoning)1009797948594.7%
Qwen 3.5 27B10010095928694.7%
Qwen 3.5 9B10010095908894.5%
ByteDance Seed 1.610010095888894.2%
Stealth: Aurora Alpha1009695908994.2%
GPT-5.1989695938893.9%
Nemotron 3 Nano969594929293.7%
MoonshotAI: Kimi K2.51009595898993.6%
Mistral Medium 3.11009594918793.5%
GPT-4o, Aug. 6th (temp=1)1009594908993.5%
o4 Mini1009793908793.5%
GPT-5.4 Mini (Reasoning, Low)989593938993.4%
Aion 2.01009695888793.2%
Writer: Palmyra X5969695918893.2%
Qwen 2.5 72B10010094898293.1%
Grok 41009795908393.0%
GPT-4o, Aug. 6th (temp=0)100100100957092.9%
Z.AI GLM 5 Turbo1009594928392.9%
GPT-5.4 Mini1009392908992.9%
Ministral 8B959592929192.7%
Qwen 3 32B1009695947792.5%
Mistral Small 4 (Reasoning)1009694908292.3%
Ministral 3 14B10010091878492.3%
Z.AI GLM 5979392908892.1%
GPT-5.4 Nano (Reasoning)969692918491.9%
ByteDance Seed 2.0 Mini10010095877791.8%
Gemini 3 Flash (Preview, Reasoning)969593898691.7%
Rocinante 12B949492908891.7%
Qwen3 235B A22B Instruct 2507969595957891.7%
Claude Sonnet 4.51009691878391.6%
Z.AI GLM 4.7 Flash1009291898691.5%
Gemini 3.1 Flash Lite (Preview)1009291898491.4%
GPT-5 Nano959391908791.4%
Ministral 3 8B1009592868491.2%
LFM2 24B10010093837891.0%
Ministral 3 3B1009089878690.6%
Mistral Small 41009793867690.4%
DeepSeek V3.11009686858590.2%
Gemini 3 Pro (Preview)1009290907890.0%
Llama 3.1 70B10010092837489.8%
GPT-4o, May 13th (temp=0)10010095787589.7%
Claude Sonnet 4.61009591818189.6%
Gemini 2.5 Pro969590878089.6%
Claude 3.5 Sonnet939292878289.4%
Z.AI GLM 4.71009491827889.1%
Arcee AI: Trinity Mini1009292827989.1%
DeepSeek V3 (2025-03-24)10010091846988.9%
Claude Sonnet 4.6 (Reasoning)1009190827988.7%
GPT-5.4 Nano (Reasoning, Low)949190897988.6%
GPT-5.4 Nano949088878388.5%
Mistral NeMO929088858587.9%
Ministral 3B919190887987.8%
Mistral Small Creative959491867387.6%
Hermes 3 70B969487837887.6%
Mistral Large959384838387.5%
Grok 4.20 (Beta, Reasoning)969686807887.4%
Claude Opus 4929188878087.4%
WizardLM 2 8x22b959388817786.9%
GPT-4o, May 13th (temp=1)919189877686.7%
Stealth: Hunter Alpha939185838186.7%
GPT-4o Mini (temp=0)1009184797986.5%
Nemotron 3 Super1008682827985.8%
Claude 3 Haiku959585817385.7%
Llama 3.1 Nemotron 70B939285847185.1%
Stealth: Healer Alpha919189847184.9%
Claude Haiku 4.51009085797084.8%
DeepSeek V3.2918685837784.4%
Mistral Large 3918988847084.4%
Claude Opus 4.5969683806684.2%
Cohere Command R+ (Aug. 2024)958886787384.1%
Gemini 2.5 Flash928685807683.9%
MiniMax M2.7929280807483.7%
Llama 3.1 8B949390756783.5%
Gemini 2.5 Flash Lite1009278747182.9%
Z.AI GLM 4.6958983776581.9%
Mistral Large 2918684826481.4%
Arcee AI: Trinity Large (Preview)1009075756681.1%
Claude Sonnet 4888382777581.1%
Grok 4.20 (Beta)908281777480.7%
MiniMax M2.5888782786580.0%
Gemma 3 4B918675747379.8%
DeepSeek V3 (2024-12-26)848381797079.5%
Gemma 3 27B878480776778.8%
Gemma 3 12B888078786778.1%
GPT-4o Mini (temp=1)1008276686478.0%
Claude 3.7 Sonnet947674727177.3%
Gemini 2.5 Flash (Reasoning)888279746377.3%
GPT-4.1 Mini907771716975.7%
Hermes 3 405B878476696075.1%
Z.AI GLM 4.5827776716674.3%
Gemini 2.5 Flash Lite (Reasoning)848072706273.5%
GPT-4.1 Nano757067555263.9%
Claude 3.5 Haiku817957554763.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100100100100.0%
o4 Mini1001001001009799.4%
Inception Mercury100100100999899.4%
ByteDance Seed 1.6 Flash100100100979698.4%
GPT-5.4 Mini (Reasoning)10010098979598.0%
GPT-5.4 (Reasoning, Low)989898969596.9%
GPT-5.21009896969496.7%
Grok 4 Fast1009796969496.6%
GPT-5.4 (Reasoning)1009896969396.5%
Z.AI GLM 5 Turbo10010097939296.5%
Qwen 3.5 122B10010094949396.2%
ByteDance Seed 2.0 Lite1009696969295.9%
Qwen 3.5 9B10010096948995.7%
ByteDance Seed 2.0 Mini100100100928695.7%
Claude Opus 4.6 (Reasoning)989797949295.6%
ByteDance Seed 1.610010095929195.6%
GPT-510010096948795.5%
GPT-4.110010096958795.5%
GPT-5.41009695939395.5%
Claude Opus 4.6979695949495.3%
Qwen 3.5 Plus (2026-02-15)1009795929295.3%
Qwen 3.5 27B1009796958794.9%
Qwen 3.5 Flash10010097898894.9%
GPT-5.4 Mini (Reasoning, Low)979794939394.8%
Qwen 3.5 35B979696948994.3%
o4 Mini High1009796928593.9%
Grok 4979493929093.1%
GPT-5.4 Mini1009693888893.1%
Inception Mercury 21009792898893.0%
Qwen3 235B A22B Instruct 25071009696868692.8%
GPT-5.4 Nano (Reasoning, Low)969492928992.6%
GPT-5.1979594908892.5%
Gemini 3 Flash (Preview)979795878592.2%
Arcee AI: Trinity Mini959494898892.1%
Qwen 2.5 72B10010087868691.8%
Qwen 3.5 397B A17B979592888791.8%
Mistral Large1009190908991.7%
Gemini 3 Flash (Preview, Reasoning)979190908891.2%
Stealth: Aurora Alpha1009789888391.1%
Ministral 3B10010090847990.8%
Z.AI GLM 4.7 Flash969594868190.4%
MoonshotAI: Kimi K2.51009488868390.2%
Gemini 3.1 Flash Lite (Preview)969588868590.2%
Nemotron 3 Nano969689888290.0%
GPT-5 Nano929291888689.8%
DeepSeek V3 (2025-03-24)10010094827389.7%
Ministral 3 14B1009388877989.5%
GPT-5 Mini939391878189.2%
Mistral Large 3969494817989.0%
DeepSeek V3.2939288878589.0%
Mistral Small 4 (Reasoning)919090888588.8%
Stealth: Hunter Alpha979290867788.6%
Aion 2.01009385848088.5%
Llama 3.1 8B969593886888.0%
Z.AI GLM 5969488837988.0%
GPT-4o, Aug. 6th (temp=0)959488867688.0%
Claude Opus 4909090878387.9%
Gemini 2.5 Pro928887878587.9%
GPT-5.4 Nano (Reasoning)939089848287.8%
GPT-5.4 Nano918988868487.7%
Ministral 8B1009695737387.5%
Z.AI GLM 4.7929290847987.5%
Hermes 3 405B10010089816887.5%
Grok 4.20 (Beta, Reasoning)959189827887.0%
Nemotron 3 Super929086848286.9%
Llama 3.1 Nemotron 70B948988828186.8%
Gemini 3.1 Pro (Preview)1009390826986.8%
Claude Sonnet 4.5969386837486.5%
GPT-4o, Aug. 6th (temp=1)919089867786.5%
Mistral Small 410010092746586.1%
Claude 3.5 Sonnet938886857985.9%
Mistral Medium 3.1968785837885.8%
Stealth: Healer Alpha938986827885.6%
Rocinante 12B918986818085.5%
DeepSeek V3 (2024-12-26)1009590806385.5%
Mistral Small Creative969089767585.2%
Writer: Palmyra X5968987797585.1%
Hermes 3 70B948583828085.0%
Claude Sonnet 4.6 (Reasoning)1009384806584.5%
GPT-4o, May 13th (temp=1)1009290766484.4%
MiniMax M2.5938686797884.4%
Mistral Large 2969084777284.0%
Cohere Command R+ (Aug. 2024)1008878777683.9%
GPT-4o Mini (temp=0)888887787783.8%
Grok 4.20 (Beta)878682828183.7%
Ministral 3 3B908989767483.7%
Gemini 3 Pro (Preview)948885767583.6%
DeepSeek-V2 Chat1009079757383.4%
DeepSeek V3.1888785787783.1%
GPT-4.1 Mini1008786855782.9%
Ministral 3 8B969284737082.8%
Qwen 3 32B949390696682.5%
Claude Sonnet 4898787767282.3%
Claude Haiku 4.5938279797882.3%
Llama 3.1 70B1008280757482.2%
Mistral NeMO948582757481.9%
WizardLM 2 8x22b928580776980.6%
MiniMax M2.7888483806680.2%
Claude 3.7 Sonnet908680766980.1%
Claude Sonnet 4.6878580767179.9%
GPT-4o, May 13th (temp=0)898180757379.8%
Claude Opus 4.5868577747278.7%
Gemini 2.5 Flash Lite (Reasoning)898075727177.4%
Z.AI GLM 4.6897875747077.3%
Gemini 2.5 Flash868481775877.2%
GPT-4o Mini (temp=1)858276727177.1%
Gemini 2.5 Flash Lite848281736377.0%
Mistral Small 3.2 24B998178774976.9%
Gemma 3 27B888275756576.9%
Z.AI GLM 4.5898677676576.9%
LFM2 24B949382664976.8%
Claude 3 Haiku827873716673.9%
Gemma 3 12B817774676572.8%
Gemini 2.5 Flash (Reasoning)777775656371.2%
GPT-4.1 Nano797269625467.3%
Arcee AI: Trinity Large (Preview)888263574567.1%
Claude 3.5 Haiku78746953055.0%
Gemma 3 4B706645424052.5%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.4 (Reasoning)100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
GPT-5.4100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Inception Mercury1001001001009999.8%
o4 Mini1001001001009799.4%
o4 Mini High1001001001009799.4%
Arcee AI: Trinity Mini1001001001009598.9%
GPT-510010099989798.9%
Claude Opus 4.6100100100979798.7%
GPT-5.11009898989898.6%
GPT-5.4 (Reasoning, Low)10010098989798.6%
DeepSeek V3 (2025-03-24)1001001001009398.5%
Claude Opus 4.6 (Reasoning)100100100979598.3%
Qwen 3.5 27B100100100969598.3%
GPT-5.4 Mini (Reasoning)10010097979798.2%
GPT-5.4 Mini (Reasoning, Low)10010098989698.2%
Grok 4 Fast10010097979798.2%
GPT-5.210010098989598.1%
ByteDance Seed 1.61001001001009098.1%
ByteDance Seed 2.0 Lite100100100969397.8%
GPT-4.1100100100969397.8%
Qwen 3 32B10010098969597.6%
GPT-5.4 Mini989898989497.0%
GPT-5 Mini989898969496.8%
ByteDance Seed 1.6 Flash10010096969296.7%
Ministral 3B100100100939096.7%
Qwen 3.5 9B1001001001008396.7%
Z.AI GLM 5 Turbo100100100929196.7%
MoonshotAI: Kimi K2.510010096959296.5%
Ministral 8B100100100929096.4%
Z.AI GLM 5100100100938996.3%
Aion 2.010010096929296.0%
Mistral Large 210010096958996.0%
Qwen 3.5 Plus (2026-02-15)1009796939295.7%
Mistral Small 4 (Reasoning)1009796959095.7%
Claude Sonnet 41009695959295.7%
GPT-5.4 Nano (Reasoning)989898939195.6%
GPT-5.4 Nano (Reasoning, Low)989796949395.6%
Qwen 3.5 Flash1009795959195.6%
Claude Opus 4.510010096968695.5%
Stealth: Hunter Alpha10010097928895.4%
Qwen 3.5 122B1009795939195.2%
GPT-5 Nano979794949495.2%
ByteDance Seed 2.0 Mini10010095938795.2%
Hermes 3 70B100100100948195.0%
Writer: Palmyra X51009695948995.0%
Gemini 3 Flash (Preview)1009797929095.0%
Claude Sonnet 4.51009696958795.0%
Grok 4979796949094.8%
Stealth: Aurora Alpha979696968794.5%
WizardLM 2 8x22b1009696919094.4%
Ministral 3 14B1009695938894.3%
Qwen 3.5 35B10010093928694.3%
Qwen3 235B A22B Instruct 25071009696928894.2%
Mistral Medium 3.11009795909094.2%
Qwen 3.5 397B A17B1009393929294.2%
Z.AI GLM 4.7 Flash10010092898893.8%
Gemini 3 Pro (Preview)979696948693.7%
Claude Sonnet 4.6100100100917793.6%
Mistral Small 4959595929093.4%
Stealth: Healer Alpha1009694948293.2%
Z.AI GLM 4.61009696957993.2%
GPT-5.4 Nano989593918993.0%
Gemini 3 Flash (Preview, Reasoning)979794938492.9%
DeepSeek-V2 Chat959494938792.9%
Gemini 2.5 Pro1009594918492.8%
Rocinante 12B10010094908092.7%
Gemma 3 27B1009594928192.5%
Gemini 3.1 Pro (Preview)1009796927692.4%
Llama 3.1 Nemotron 70B10010094848392.3%
MiniMax M2.7979694928292.1%
MiniMax M2.5969291909091.9%
DeepSeek V3.2969391918891.9%
Nemotron 3 Nano949491908991.8%
Claude 3.5 Sonnet10010094867791.4%
Ministral 3 8B979692868691.4%
Claude Sonnet 4.6 (Reasoning)1009691858591.4%
Gemini 3.1 Flash Lite (Preview)929291918991.3%
Llama 3.1 70B1009292888491.1%
GPT-4o Mini (temp=1)1009188878790.9%
Llama 3.1 8B1009591907790.8%
Inception Mercury 2969494858590.8%
Grok 4.20 (Beta, Reasoning)949391888690.3%
Mistral Small Creative959590888290.2%
Nemotron 3 Super1009692828190.1%
Claude Opus 4969490878289.8%
Cohere Command R+ (Aug. 2024)959494897689.7%
Z.AI GLM 4.7978988878589.2%
GPT-4o, Aug. 6th (temp=0)959087868688.7%
Claude Haiku 4.5929287868588.6%
LFM2 24B959493847788.5%
Claude 3.5 Haiku10010086867088.3%
Mistral Large 3919088878488.0%
Mistral Large1009484818188.0%
GPT-4o Mini (temp=0)969686817887.4%
Arcee AI: Trinity Large (Preview)949286828086.8%
GPT-4.1 Mini1009089847086.6%
DeepSeek V3 (2024-12-26)928887877986.5%
DeepSeek V3.1938986838186.4%
Gemini 2.5 Flash918988847986.3%
Mistral Small 3.2 24B1009793806286.2%
Gemini 2.5 Flash Lite959185837786.0%
Qwen 2.5 72B949188827686.0%
GPT-4o, May 13th (temp=1)929188817886.0%
Gemini 2.5 Flash (Reasoning)909084848085.6%
Gemini 2.5 Flash Lite (Reasoning)958888857185.5%
Mistral NeMO1009386826284.4%
Gemma 3 4B929289786983.8%
Z.AI GLM 4.5968988786182.6%
GPT-4o, Aug. 6th (temp=1)969184727082.5%
Grok 4.20 (Beta)918280807882.4%
Gemma 3 12B968979777182.2%
Claude 3 Haiku908887736981.4%
GPT-4o, May 13th (temp=0)1008884775981.3%
Hermes 3 405B878582796980.1%
GPT-4.1 Nano908776706978.6%
Claude 3.7 Sonnet878273695974.1%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 1.6100100100100100100.0%
o4 Mini High10010097969397.2%
GPT-5 Mini10010096959497.0%
ByteDance Seed 2.0 Lite1009797949396.2%
ByteDance Seed 1.6 Flash1009895949195.4%
o4 Mini10010093928994.9%
Grok 4.1 Fast1009797948594.6%
Qwen 3.5 Flash1009693938994.2%
Grok 4 Fast979594948793.5%
Gemini 3 Flash (Preview)1009793908793.4%
Gemini 2.5 Pro1009493898993.0%
ByteDance Seed 2.0 Mini979494918892.7%
Gemini 3.1 Flash Lite (Preview)969292928992.2%
MoonshotAI: Kimi K2.5979795927891.7%
Mistral Large10010091868291.6%
GPT-5989492898291.3%
Stealth: Aurora Alpha969589898791.1%
GPT-5 Nano959391908691.0%
Inception Mercury 2989491908390.9%
Mistral Large 3969691868690.8%
Mistral Small 3.2 24B10010099797490.4%
Z.AI GLM 5949489888690.2%
Qwen 3.5 35B979692877989.9%
Qwen 3.5 9B1009692837989.9%
Mistral Large 21009691887589.9%
Gemma 3 27B959190878589.7%
Qwen 3.5 397B A17B969289868389.4%
Stealth: Hunter Alpha979489848189.0%
Gemini 3 Pro (Preview)949191868388.9%
Claude Opus 4.61009493817788.9%
Qwen 3.5 27B969387858288.7%
GPT-5.4 Nano (Reasoning)958988868488.4%
GPT-5.4 Nano (Reasoning, Low)949088858488.3%
Grok 4929189888088.0%
Qwen 3.5 122B969291847888.0%
GPT-4.1929089858387.9%
Ministral 3 3B949188848187.7%
Nemotron 3 Super929190867987.6%
Claude Sonnet 4.51009291856887.2%
Stealth: Healer Alpha979288817887.2%
GPT-5.4 Mini (Reasoning)908989858287.0%
Claude Opus 4.6 (Reasoning)949285848087.0%
Gemini 3 Flash (Preview, Reasoning)939185838387.0%
GPT-5.4 Nano918787858286.7%
Qwen 2.5 72B959385827686.2%
Z.AI GLM 4.7 Flash898887868186.1%
Grok 4.20 (Beta, Reasoning)898886848386.1%
Qwen3 235B A22B Instruct 2507888886868286.0%
GPT-5.4888787858285.9%
LFM2 24B969285837385.7%
GPT-5.4 (Reasoning)908985838285.7%
WizardLM 2 8x22b1008686857185.5%
Nemotron 3 Nano938585828285.5%
MiniMax M2.7959386807485.4%
Z.AI GLM 5 Turbo969392806785.4%
Qwen 3 32B959082817985.4%
Ministral 3 14B969589826184.8%
Llama 3.1 Nemotron 70B928888847284.8%
Ministral 8B928787847384.7%
Ministral 3B1009392865284.5%
Qwen 3.5 Plus (2026-02-15)948585817684.2%
GPT-5.4 Mini (Reasoning, Low)888785837884.2%
GPT-5.2898882828084.0%
MiniMax M2.5898686837583.8%
Claude Haiku 4.5939184787383.7%
DeepSeek V3 (2025-03-24)929087787083.6%
Claude Opus 4.5948981797583.5%
Mistral Medium 3.1948884777483.4%
Ministral 3 8B918783807583.3%
Aion 2.0928583837383.2%
Rocinante 12B1008988706883.0%
GPT-5.1888682807882.9%
DeepSeek V3.2938481807582.8%
GPT-4o, Aug. 6th (temp=0)878582817882.6%
Grok 4.20 (Beta)908983767482.4%
GPT-4o, Aug. 6th (temp=1)919190776282.2%
Gemini 2.5 Flash Lite908685826882.1%
Llama 3.1 8B928988746681.8%
Mistral Small Creative958479787381.8%
GPT-4o, May 13th (temp=1)928480787481.7%
Gemma 3 4B908585767181.6%
Arcee AI: Trinity Mini958483737381.5%
Cohere Command R+ (Aug. 2024)898780807181.3%
Gemma 3 12B918382806981.1%
Mistral Small 4919081766881.1%
Claude Sonnet 4.6968677767081.0%
Arcee AI: Trinity Large (Preview)898580767480.9%
Z.AI GLM 4.6888784747080.7%
Claude 3.5 Haiku1008080766680.5%
GPT-4o, May 13th (temp=0)918782806180.1%
Hermes 3 405B898382737379.9%
Claude Opus 4878177767679.4%
Hermes 3 70B958476737079.4%
Gemini 2.5 Flash Lite (Reasoning)838279787378.9%
Z.AI GLM 4.7848482756978.9%
Mistral Small 4 (Reasoning)918784676578.8%
GPT-5.4 Mini898475737278.6%
Writer: Palmyra X5838380786978.6%
Mistral NeMO868484756178.1%
DeepSeek V3.1878381716677.7%
Claude Sonnet 4.6 (Reasoning)877876727277.1%
Claude 3 Haiku858176766676.9%
GPT-4.1 Mini927373727276.4%
DeepSeek-V2 Chat888481705876.2%
Claude 3.5 Sonnet898778695876.1%
Z.AI GLM 4.5868377775776.0%
Claude Sonnet 4868672686775.8%
GPT-5.4 (Reasoning, Low)838275706875.4%
Gemini 3.1 Pro (Preview)868072706975.4%
DeepSeek V3 (2024-12-26)787773727274.6%
GPT-4o Mini (temp=1)817977755974.1%
Inception Mercury997069695772.9%
Gemini 2.5 Flash817572726472.6%
Gemini 2.5 Flash (Reasoning)897672636272.2%
Llama 3.1 70B867876724571.3%
Claude 3.7 Sonnet817268676470.4%
GPT-4o Mini (temp=0)777574555066.1%
GPT-4.1 Nano656060484154.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 9B100100100949397.5%
ByteDance Seed 1.6100100100959097.1%
o4 Mini High10010097979297.0%
ByteDance Seed 2.0 Lite1009797979496.8%
Gemini 3 Flash (Preview)1009797939095.3%
Qwen 3 32B1009595948994.7%
ByteDance Seed 1.6 Flash979494939394.2%
MoonshotAI: Kimi K2.51009695898893.6%
Qwen 3.5 397B A17B1009392929193.4%
o4 Mini1009493938793.4%
Gemini 3 Pro (Preview)959592919092.6%
ByteDance Seed 2.0 Mini959595928692.5%
Qwen 3.5 27B969393908992.1%
Arcee AI: Trinity Large (Preview)979695878592.1%
Mistral NeMO10010091868391.8%
GPT-5 Mini979392918591.6%
Mistral Small 4939392908991.3%
Stealth: Aurora Alpha989589888791.3%
GPT-5959591918391.1%
LFM2 24B1009589868691.1%
Mistral Medium 3.1979392888390.7%
Qwen 3.5 Flash949393908490.7%
Grok 4.1 Fast969290898690.4%
Gemini 3 Flash (Preview, Reasoning)949493888189.9%
GPT-5 Nano949290898589.9%
GPT-4.11008989868589.8%
Nemotron 3 Nano989290868589.8%
Qwen 3.5 35B939392868389.5%
Inception Mercury10010099975289.5%
Gemini 3.1 Flash Lite (Preview)969685858489.4%
MiniMax M2.7959089888689.3%
Z.AI GLM 4.7 Flash929292868589.3%
MiniMax M2.5939390898189.2%
Qwen 2.5 72B1009688867689.0%
Z.AI GLM 5979791827688.5%
Grok 4 Fast949188858488.4%
Qwen 3.5 122B938988878588.4%
Z.AI GLM 4.7949087868588.4%
Ministral 8B949392867688.3%
Grok 4949088868288.1%
Llama 3.1 8B939288877887.9%
Mistral Small Creative969289837887.7%
Z.AI GLM 4.6979286857887.4%
Arcee AI: Trinity Mini969491876887.3%
Mistral Large 3959389817887.0%
Mistral Large949387827987.0%
GPT-4o, May 13th (temp=1)939289847786.9%
Z.AI GLM 5 Turbo949389857486.9%
Rocinante 12B919187867786.6%
Qwen 3.5 Plus (2026-02-15)918988828286.5%
Gemini 3.1 Pro (Preview)918886868186.4%
Hermes 3 70B1009385827386.4%
Mistral Small 3.2 24B1009187787586.3%
Stealth: Healer Alpha949084838086.1%
Mistral Small 4 (Reasoning)908887867985.8%
Inception Mercury 2929083838185.8%
GPT-5.4 Nano919185818085.6%
Claude Sonnet 4.51009188787185.5%
DeepSeek V3.1908985837985.4%
Claude 3.7 Sonnet979484787485.4%
DeepSeek V3 (2025-03-24)939280808085.0%
Claude Sonnet 4939186787785.0%
GPT-5.4 Mini929083817984.9%
Ministral 3 3B959081807984.8%
Ministral 3 8B939285827184.5%
Claude Opus 4.5948784837584.4%
Ministral 3B888685838084.3%
Nemotron 3 Super898883818084.1%
Claude Opus 4948682827684.1%
Mistral Large 2959482767384.1%
GPT-5.2868484848283.9%
GPT-5.4 (Reasoning, Low)898888787683.8%
GPT-4o Mini (temp=1)938884807583.8%
Stealth: Hunter Alpha939385777183.8%
GPT-4o, Aug. 6th (temp=1)928987866583.7%
WizardLM 2 8x22b908883817683.6%
Ministral 3 14B949086836583.6%
Aion 2.0918382807983.2%
Gemma 3 12B969087776683.1%
DeepSeek V3.2888883807682.9%
GPT-5.4 Mini (Reasoning, Low)939381757182.6%
GPT-4o, May 13th (temp=0)918987737182.4%
Claude Haiku 4.5908381817782.2%
Z.AI GLM 4.5888785826882.1%
Llama 3.1 70B939384766482.1%
Claude Opus 4.6868484817582.0%
Qwen3 235B A22B Instruct 2507928881747481.9%
GPT-5.4 Nano (Reasoning, Low)848382807981.9%
Gemini 2.5 Pro939080757181.7%
Writer: Palmyra X5878785846681.7%
Cohere Command R+ (Aug. 2024)979089676581.5%
GPT-5.4 Mini (Reasoning)868684767681.4%
Grok 4.20 (Beta, Reasoning)888483817181.4%
GPT-5.4 Nano (Reasoning)838281807980.8%
Gemini 2.5 Flash (Reasoning)928382806780.7%
Gemini 2.5 Flash Lite898779786880.5%
Claude 3 Haiku918583726979.9%
Gemini 2.5 Flash838180797679.7%
Grok 4.20 (Beta)858282806478.7%
DeepSeek V3 (2024-12-26)938481706478.5%
GPT-5.4 (Reasoning)868181746978.5%
Gemma 3 4B868278737077.6%
Claude 3.5 Haiku1007978706077.3%
Claude Opus 4.6 (Reasoning)858077756777.0%
GPT-4o, Aug. 6th (temp=0)878376746576.9%
Claude Sonnet 4.6 (Reasoning)797877757476.5%
GPT-5.4828177717176.5%
GPT-4o Mini (temp=0)817776757376.3%
Hermes 3 405B878784764876.2%
GPT-5.1817878736775.5%
DeepSeek-V2 Chat888786595675.4%
Claude 3.5 Sonnet908377705675.4%
Gemini 2.5 Flash Lite (Reasoning)858382656074.9%
Gemma 3 27B777777766774.9%
Llama 3.1 Nemotron 70B1008374615574.6%
GPT-4.1 Mini877772676673.5%
GPT-4.1 Nano868169545268.5%
Claude Sonnet 4.6757369584163.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Inception Mercury1009999989798.3%
ByteDance Seed 2.0 Lite979797979797.3%
ByteDance Seed 1.61009897969597.2%
ByteDance Seed 1.6 Flash1009895959596.5%
Qwen 3.5 9B100100100929096.4%
Grok 4.1 Fast979494949294.1%
GPT-5 Mini1009894898893.9%
Stealth: Aurora Alpha989493929093.5%
Grok 4 Fast1009592908993.4%
o4 Mini High1009793908693.0%
Grok 4979491898591.5%
Llama 3.1 70B939191918890.9%
ByteDance Seed 2.0 Mini979392878590.8%
Qwen 3 32B979595897790.7%
GPT-4.11009790848090.3%
DeepSeek V3 (2025-03-24)1009589887689.8%
Inception Mercury 2949089878789.4%
Nemotron 3 Super969087878689.2%
Ministral 3 3B949489848489.0%
Mistral Large 3959189868389.0%
Z.AI GLM 4.6938988878388.0%
Cohere Command R+ (Aug. 2024)979687827887.9%
o4 Mini969087868087.9%
Grok 4.20 (Beta, Reasoning)929288858087.6%
WizardLM 2 8x22b948786858587.4%
Mistral Medium 3.1988685848487.3%
Ministral 8B929187848287.1%
Qwen 3.5 Flash898887868687.1%
Stealth: Healer Alpha919189887686.9%
Gemini 3 Flash (Preview, Reasoning)939186848186.9%
Z.AI GLM 5 Turbo928886858386.8%
Gemini 3.1 Flash Lite (Preview)929292827686.8%
GPT-5 Nano929285858086.8%
Gemini 3 Pro (Preview)908685858486.1%
Z.AI GLM 5908986857986.1%
Claude Opus 4.6939283818085.9%
MoonshotAI: Kimi K2.5979488797185.7%
Nemotron 3 Nano908888867685.6%
MiniMax M2.7939291827085.6%
Rocinante 12B979690747185.5%
Llama 3.1 Nemotron 70B919188827585.4%
Gemini 3 Flash (Preview)918684848285.2%
Stealth: Hunter Alpha898787827884.8%
GPT-5.4 (Reasoning, Low)898584838284.7%
Qwen 3.5 35B939287777584.7%
Claude Opus 4.6 (Reasoning)898887817984.7%
Z.AI GLM 4.7 Flash918986797784.6%
GPT-5898883818184.5%
Qwen 3.5 397B A17B898784847984.5%
Mistral Small 4 (Reasoning)928784807784.2%
GPT-5.1878584828183.8%
Qwen 3.5 27B928882787883.8%
GPT-5.4 Nano898583828083.7%
Mistral NeMO928887767383.3%
Claude Opus 4.5858584828083.2%
Mistral Large 2938985826683.1%
Llama 3.1 8B928883777683.1%
Mistral Large898985777583.0%
Aion 2.0898683837583.0%
GPT-4o, Aug. 6th (temp=0)918584847082.8%
GPT-5.4 Mini888584827582.8%
Ministral 3 8B918982797382.6%
Grok 4.20 (Beta)868484817682.2%
GPT-5.4 Nano (Reasoning, Low)898684836982.1%
GPT-5.4 (Reasoning)868581807982.1%
GPT-5.4 Nano (Reasoning)898181807881.9%
Gemini 2.5 Pro948580787181.7%
Claude Sonnet 4.5888582777581.6%
Z.AI GLM 4.7928481767581.6%
GPT-5.2908180797881.5%
Mistral Small 4928382797181.5%
DeepSeek V3 (2024-12-26)888582816981.2%
Qwen 3.5 122B968881726981.2%
GPT-5.4 Mini (Reasoning)858281807781.0%
Claude Sonnet 4878785747380.9%
DeepSeek V3.2958280747380.8%
Claude Opus 4968984746080.8%
Qwen 3.5 Plus (2026-02-15)848479797780.8%
GPT-5.4828281797980.7%
Qwen 2.5 72B858482816980.3%
LFM2 24B928282806480.1%
GPT-5.4 Mini (Reasoning, Low)858280787580.0%
GPT-4o Mini (temp=0)918786666679.2%
Gemini 2.5 Flash927978767078.9%
Hermes 3 405B958581745778.4%
MiniMax M2.5938972706778.3%
Ministral 3 14B927877737278.2%
Claude Haiku 4.5818180787178.2%
Mistral Small Creative867976767478.0%
GPT-4.1 Mini868580776178.0%
Gemini 2.5 Flash Lite (Reasoning)888778696777.9%
GPT-4o, May 13th (temp=0)908684735877.9%
Mistral Small 3.2 24B918374707077.7%
GPT-4o, May 13th (temp=1)878681795277.1%
Z.AI GLM 4.5858483716276.7%
DeepSeek V3.1868073717076.1%
Ministral 3B947572716675.6%
Writer: Palmyra X5808074746875.3%
GPT-4o, Aug. 6th (temp=1)878372716275.2%
Claude 3.5 Sonnet1008580694275.0%
Hermes 3 70B828073706774.5%
Gemma 3 27B887875715974.2%
Claude 3.7 Sonnet787675736974.2%
Claude Sonnet 4.6 (Reasoning)807777755673.1%
Qwen3 235B A22B Instruct 2507837974745372.7%
Gemini 2.5 Flash (Reasoning)797574736272.7%
Arcee AI: Trinity Mini797272716672.0%
Gemini 2.5 Flash Lite817776695872.0%
Arcee AI: Trinity Large (Preview)797472676571.4%
GPT-4o Mini (temp=1)827671655970.6%
DeepSeek-V2 Chat807570666170.1%
Claude Sonnet 4.6858273545469.7%
Gemma 3 12B777574675469.1%
Claude 3.5 Haiku847971604868.4%
Gemini 3.1 Pro (Preview)858263624767.9%
Gemma 3 4B787869595167.1%
Claude 3 Haiku787371515164.7%
GPT-4.1 Nano706054453352.4%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 1.6 Flash10010097969597.6%
ByteDance Seed 2.0 Lite100100100949497.6%
GPT-4.11009796939395.9%
ByteDance Seed 1.6989897969195.8%
Qwen 3.5 9B100100100977894.9%
Grok 4979797948894.8%
Stealth: Aurora Alpha1009594939194.5%
Inception Mercury 21009695948794.3%
Grok 4.1 Fast979793939094.1%
o4 Mini High1009796918694.0%
ByteDance Seed 2.0 Mini979794938894.0%
GPT-5979393938992.9%
Qwen 2.5 72B1009695957892.7%
Gemini 3 Flash (Preview)1009491908892.4%
o4 Mini1009796907892.4%
Z.AI GLM 4.7 Flash979392918892.2%
Qwen 3.5 Flash969393918792.2%
WizardLM 2 8x22b1009594878592.1%
Llama 3.1 8B1009793908091.9%
Nemotron 3 Nano969393938591.8%
Rocinante 12B949391909091.7%
Grok 4 Fast949292928891.5%
Gemini 3 Flash (Preview, Reasoning)979490888791.3%
LFM2 24B1009694917591.3%
Mistral Large 31009691878291.2%
GPT-5 Mini949392898891.1%
Gemini 3.1 Flash Lite (Preview)929292918890.9%
Cohere Command R+ (Aug. 2024)1009690898090.9%
GPT-4o, May 13th (temp=0)969490878690.8%
MoonshotAI: Kimi K2.5979592917890.6%
Qwen 3.5 35B939190908990.4%
GPT-5 Nano929290908990.4%
Arcee AI: Trinity Mini1009493897590.1%
Ministral 3 8B1009487858389.9%
Claude Opus 4.6939391878689.8%
Aion 2.0979389888089.4%
MiniMax M2.7969492867789.2%
GPT-4o, Aug. 6th (temp=0)959591838189.2%
Mistral Medium 3.1979291858189.1%
MiniMax M2.5949492867988.9%
Qwen 3.5 122B969287868288.7%
Qwen 3 32B988888878288.6%
Llama 3.1 70B1009286867988.5%
Mistral NeMO1009289827988.4%
Gemini 2.5 Flash979688857688.4%
Llama 3.1 Nemotron 70B10010085807688.3%
Mistral Large 21009691787788.3%
Arcee AI: Trinity Large (Preview)969687867588.2%
Z.AI GLM 5939392857787.9%
GPT-5.4 Mini (Reasoning, Low)939088878187.9%
Mistral Small 3.2 24B929087868387.5%
GPT-5.4 Mini (Reasoning)929089858287.4%
GPT-4o Mini (temp=0)1009087817887.3%
Qwen 3.5 Plus (2026-02-15)948786858487.2%
Nemotron 3 Super939187848187.2%
GPT-5.4 Mini938989877887.2%
Claude Sonnet 4.5918987858387.0%
Ministral 8B1009584837286.9%
Inception Mercury979586817486.8%
Claude Haiku 4.5969287827686.8%
Z.AI GLM 4.6928987878086.7%
Mistral Large919087857986.6%
Qwen 3.5 397B A17B929087867886.5%
Ministral 3 3B888888887986.4%
GPT-5.4 Nano898987858286.3%
GPT-5.4 (Reasoning)969084818186.1%
Gemini 3 Pro (Preview)918987867786.0%
GPT-5.4 Nano (Reasoning, Low)928786858086.0%
Hermes 3 405B1009382787786.0%
Mistral Small Creative939391876685.9%
GPT-5.2918786848085.8%
Hermes 3 70B1009388757185.5%
Qwen 3.5 27B939087827585.5%
GPT-5.4 (Reasoning, Low)938983818085.4%
Claude Sonnet 4929088817685.2%
GPT-5.4888785858185.1%
DeepSeek V3.1969189856485.0%
Gemma 3 27B928683818184.7%
Ministral 3 14B968783797884.5%
Gemini 2.5 Pro948784807784.4%
Ministral 3B1008888846284.4%
Stealth: Healer Alpha898783828184.3%
Grok 4.20 (Beta, Reasoning)878685847883.9%
GPT-4o, Aug. 6th (temp=1)959186786983.8%
Claude Opus 4.6 (Reasoning)908787847183.7%
DeepSeek V3.2908887797583.6%
DeepSeek V3 (2025-03-24)919183777683.6%
Claude 3.5 Haiku1008583777183.4%
Gemini 2.5 Flash Lite979580766983.4%
GPT-5.1878382818182.9%
GPT-4o Mini (temp=1)888685827482.8%
Claude 3.5 Sonnet1009483736282.6%
Claude Opus 4.5868681808082.6%
Gemini 2.5 Flash Lite (Reasoning)938383817282.4%
Z.AI GLM 5 Turbo928980787282.4%
Claude 3.7 Sonnet949083796582.3%
Qwen3 235B A22B Instruct 2507938884737382.1%
Claude Sonnet 4.6 (Reasoning)1008880776481.8%
Mistral Small 4958782757081.6%
Z.AI GLM 4.7858584777681.4%
Mistral Small 4 (Reasoning)928281787281.1%
GPT-5.4 Nano (Reasoning)868482787581.0%
DeepSeek V3 (2024-12-26)918280787581.0%
Claude Sonnet 4.6919077746879.9%
DeepSeek-V2 Chat928776756779.3%
Gemma 3 12B968276746277.9%
Stealth: Hunter Alpha868473727177.3%
Claude Opus 4888375706876.8%
Writer: Palmyra X5928476686376.6%
GPT-4o, May 13th (temp=1)837674717075.0%
Z.AI GLM 4.5858373686073.7%
Gemini 2.5 Flash (Reasoning)807667676470.9%
Grok 4.20 (Beta)837775664970.1%
Gemini 3.1 Pro (Preview)767168646368.4%
GPT-4.1 Mini916762625767.6%
GPT-4.1 Nano898064624467.6%
Claude 3 Haiku766565645965.8%
Gemma 3 4B777350494959.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 2.0 Lite100100100979798.7%
ByteDance Seed 1.610010097959397.2%
Grok 4.1 Fast1009794949395.7%
Rocinante 12B10010096958194.4%
ByteDance Seed 1.6 Flash1009694948794.2%
ByteDance Seed 2.0 Mini979795928893.8%
Mistral Medium 3.1959494949093.5%
Llama 3.1 70B1009393938793.3%
o4 Mini1009691908892.9%
GPT-5 Mini969594908792.3%
Qwen 3.5 35B1009291908792.0%
Grok 41009291898691.9%
Qwen 3.5 9B969493897789.8%
GPT-5 Nano919191888789.7%
Mistral Large 21009591838089.7%
Grok 4 Fast959391878289.7%
Mistral Large 3979595817688.9%
Qwen 3.5 122B939089888588.8%
Llama 3.1 Nemotron 70B949489838288.5%
MoonshotAI: Kimi K2.5949392838088.4%
Z.AI GLM 5949490857988.4%
Ministral 8B979788807988.1%
Mistral Large968787858588.0%
Hermes 3 405B1009588787787.7%
Stealth: Hunter Alpha959086868287.7%
GPT-4.1938987868387.6%
Qwen 3 32B959585847887.5%
Inception Mercury999894806587.3%
Qwen 3.5 397B A17B939290857687.3%
Nemotron 3 Nano948886868387.3%
Inception Mercury 2918987868487.3%
DeepSeek V3 (2025-03-24)10010088856487.3%
Claude Opus 4.6929088877987.2%
MiniMax M2.7968886838387.1%
Gemini 3 Flash (Preview)979491846986.9%
o4 Mini High978986857686.7%
Gemini 3 Flash (Preview, Reasoning)919087858086.6%
Stealth: Aurora Alpha929189818086.5%
Claude Opus 4.5949189807886.5%
GPT-5938785848386.4%
Qwen 3.5 Flash969489767486.0%
Llama 3.1 8B938686848185.9%
GPT-5.4 Nano928585848485.9%
Qwen 2.5 72B959285787885.8%
Z.AI GLM 5 Turbo958987837585.8%
Nemotron 3 Super928685848185.7%
Ministral 3 14B918885828085.3%
GPT-5.4 Nano (Reasoning)888886848085.2%
Claude 3.5 Sonnet958984807584.7%
Ministral 3 8B949084787684.3%
Grok 4.20 (Beta, Reasoning)908985807884.3%
GPT-5.4 Nano (Reasoning, Low)888685857884.2%
DeepSeek V3.1888785837783.8%
Z.AI GLM 4.7 Flash868684837883.6%
Mistral Small Creative939186836283.3%
GPT-5.2868584837983.3%
Gemini 3.1 Flash Lite (Preview)969278767283.0%
LFM2 24B898585807582.8%
WizardLM 2 8x22b919182767582.7%
Gemini 3 Pro (Preview)908685797382.5%
Cohere Command R+ (Aug. 2024)918280797982.3%
Claude Sonnet 4868582807882.2%
Mistral Small 4938579767681.8%
Grok 4.20 (Beta)898685806981.8%
Z.AI GLM 4.7908482817281.7%
GPT-5.4 Mini (Reasoning, Low)898281797781.6%
Claude Opus 4.6 (Reasoning)908582767381.4%
Qwen3 235B A22B Instruct 2507898887746781.1%
Qwen 3.5 27B908381797281.1%
Gemini 2.5 Pro848383787781.1%
Stealth: Healer Alpha908578787380.8%
Mistral Small 4 (Reasoning)938482737280.6%
Ministral 3B888582757180.5%
Claude Opus 4898684766780.4%
Mistral NeMO928685776180.0%
Z.AI GLM 4.6878479777380.0%
Aion 2.0878482757179.8%
MiniMax M2.5848080787679.7%
Arcee AI: Trinity Mini1008375726879.7%
Ministral 3 3B948873717179.5%
GPT-4o, May 13th (temp=1)857978787779.3%
GPT-4o Mini (temp=0)818181777679.1%
GPT-5.4 (Reasoning)898477727278.6%
DeepSeek V3.2868376747378.4%
GPT-4o, May 13th (temp=0)868280766878.3%
GPT-4o Mini (temp=1)878776717078.2%
GPT-5.4 Mini888179756878.1%
GPT-4o, Aug. 6th (temp=1)878377736978.0%
Claude Haiku 4.5878482785877.9%
Qwen 3.5 Plus (2026-02-15)908375726977.8%
GPT-5.4858176747277.6%
GPT-4o, Aug. 6th (temp=0)868275726876.5%
GPT-5.4 Mini (Reasoning)848374736876.4%
GPT-5.4 (Reasoning, Low)807878766775.8%
Gemma 3 27B868177676775.7%
Claude 3.5 Haiku10010076564475.3%
Claude Sonnet 4.6 (Reasoning)828272716875.1%
DeepSeek-V2 Chat848079676374.8%
Gemma 3 12B888870646074.0%
DeepSeek V3 (2024-12-26)898367666473.6%
GPT-5.1837671706773.5%
Hermes 3 70B897869666272.6%
Writer: Palmyra X5807675676572.5%
Gemini 2.5 Flash Lite (Reasoning)787872696271.8%
Gemini 2.5 Flash Lite857574705471.7%
Gemini 2.5 Flash847471705871.4%
GPT-4.1 Mini878665645571.4%
Arcee AI: Trinity Large (Preview)827771695370.3%
Z.AI GLM 4.5767574645668.9%
Claude Sonnet 4.5867271645168.6%
Gemini 3.1 Pro (Preview)757169666168.2%
Claude Sonnet 4.6847467634967.4%
Gemini 2.5 Flash (Reasoning)757066626166.8%
Claude 3.7 Sonnet826968595265.9%
Gemma 3 4B807770553663.8%
GPT-4.1 Nano746362343453.3%
Mistral Small 3.2 24B686766352852.9%
Claude 3 Haiku636054403750.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 2.0 Lite1001001001009799.3%
o4 Mini100100100979698.5%
Grok 4.1 Fast10010097979698.1%
Qwen 3.5 9B100100100969297.7%
Qwen 3 32B1009897979597.5%
Grok 4 Fast10010098959497.3%
GPT-5 Mini10010098959497.3%
ByteDance Seed 1.6 Flash1009897969597.2%
GPT-4.110010097979096.6%
Gemini 2.5 Pro10010096939396.6%
Gemini 3.1 Flash Lite (Preview)100100100968796.5%
Grok 410010095949396.4%
WizardLM 2 8x22b100100100988396.3%
o4 Mini High10010094949396.2%
Qwen 3.5 122B1009696969296.0%
MoonshotAI: Kimi K2.510010094939195.7%
Ministral 3B10010098938795.6%
GPT-5.4 (Reasoning)1009695959395.6%
Qwen 3.5 35B1009696959195.6%
ByteDance Seed 1.610010094939195.6%
Mistral Large 31009695949395.5%
Gemini 3 Pro (Preview)979797949295.4%
Arcee AI: Trinity Mini1009694949395.4%
GPT-5 Nano989794949395.2%
Z.AI GLM 4.7 Flash979797939295.2%
GPT-51009896919094.9%
GPT-5.2969595949494.8%
GPT-5.4 Mini989694939294.6%
LFM2 24B1009695958794.5%
Qwen 3.5 Flash1009695948894.4%
Claude Opus 4.6 (Reasoning)1009894918994.4%
Qwen 3.5 27B10010096928494.4%
Inception Mercury 21009895918894.2%
DeepSeek V3 (2024-12-26)10010094938594.2%
Gemini 3 Flash (Preview)1009694918994.1%
GPT-5.4 Mini (Reasoning)989593939194.1%
Grok 4.20 (Beta, Reasoning)1009493929294.0%
Z.AI GLM 5 Turbo10010097938093.9%
Gemini 3 Flash (Preview, Reasoning)1009793908993.9%
Claude Opus 4.6979795928893.9%
Qwen 2.5 72B1009795908893.8%
DeepSeek V3.210010093918493.8%
ByteDance Seed 2.0 Mini979794918993.6%
Qwen 3.5 397B A17B1009392919193.5%
Aion 2.0979695928593.1%
Inception Mercury10010099996693.0%
Stealth: Hunter Alpha979694898792.9%
Stealth: Healer Alpha1009791908792.9%
Rocinante 12B10010092908392.8%
MiniMax M2.5969593918992.8%
GPT-4o, Aug. 6th (temp=0)959595938692.7%
Nemotron 3 Nano959493928992.4%
GPT-5.4 (Reasoning, Low)949493928792.1%
GPT-5.4 Mini (Reasoning, Low)939393928992.0%
Stealth: Aurora Alpha959491919092.0%
Mistral Small 4969492908891.9%
GPT-4o, May 13th (temp=1)1009693927991.9%
Gemma 3 12B969592918391.6%
Z.AI GLM 5979793878491.5%
Z.AI GLM 4.6969392908791.5%
Qwen 3.5 Plus (2026-02-15)1009390898691.5%
Llama 3.1 70B1009493937891.4%
DeepSeek V3.1959594947991.3%
Ministral 3 14B1009588878691.3%
Claude Sonnet 4.6 (Reasoning)979693908191.3%
GPT-5.4 Nano (Reasoning, Low)959392918591.2%
Writer: Palmyra X51009291868691.1%
Claude 3 Haiku1009592907790.6%
GPT-5.4959493918090.6%
Claude Opus 4.5949490898690.6%
Qwen3 235B A22B Instruct 2507969291908490.5%
Claude Sonnet 4.5969691878290.4%
Mistral Small Creative979795857990.3%
Llama 3.1 8B1009191878290.1%
Cohere Command R+ (Aug. 2024)1009289868390.0%
Nemotron 3 Super939191898690.0%
Ministral 3 8B969089898790.0%
Claude 3.5 Sonnet949089898989.9%
MiniMax M2.7969191878589.9%
Mistral Small 4 (Reasoning)979191858489.9%
GPT-5.1949491878489.8%
Gemini 2.5 Flash Lite969591917589.7%
Mistral Medium 3.1979390878089.4%
GPT-4o Mini (temp=1)969592927189.3%
Claude Opus 4969388858489.3%
Gemini 2.5 Flash959291858289.1%
GPT-5.4 Nano (Reasoning)959588858388.8%
GPT-5.4 Nano939188888488.8%
Ministral 3 3B10010090846988.6%
Gemma 3 27B969288858288.5%
Mistral Large1009691807588.4%
GPT-4o Mini (temp=0)968988828087.1%
Z.AI GLM 4.7909087858386.8%
Claude Sonnet 4969588876886.7%
Hermes 3 70B1009186787786.4%
DeepSeek V3 (2025-03-24)10010082797086.2%
Ministral 8B958884828286.2%
Gemini 3.1 Pro (Preview)918986867785.9%
Gemini 2.5 Flash Lite (Reasoning)959084818085.9%
Mistral Large 2908787838285.8%
GPT-4o, Aug. 6th (temp=1)959081818085.8%
GPT-4o, May 13th (temp=0)948985818085.7%
Mistral NeMO979681787685.7%
Claude 3.5 Haiku1009089896085.5%
GPT-4.1 Mini959184797785.2%
Claude 3.7 Sonnet909088827585.1%
Claude Haiku 4.5959389856485.1%
Grok 4.20 (Beta)918888867184.8%
Gemini 2.5 Flash (Reasoning)918886817884.7%
Z.AI GLM 4.5918785818084.6%
DeepSeek-V2 Chat1009191805884.2%
Llama 3.1 Nemotron 70B928786807283.5%
Mistral Small 3.2 24B1009790795083.2%
GPT-4.1 Nano909085777082.5%
Hermes 3 405B959384766382.2%
Arcee AI: Trinity Large (Preview)908979777481.8%
Gemma 3 4B918882786881.5%
Claude Sonnet 4.6958677747381.1%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100100100100.0%
GPT-5100100100989799.0%
GPT-5.1100100100989698.9%
GPT-5.210010098989698.3%
o4 Mini100100100969297.7%
ByteDance Seed 1.6 Flash10010097969697.6%
GPT-5 Mini1009898969497.2%
ByteDance Seed 2.0 Lite10010097979197.1%
o4 Mini High1009796969596.9%
GPT-5.4 (Reasoning)10010096959396.7%
Mistral Large10010096959196.6%
ByteDance Seed 2.0 Mini10010097939396.6%
Mistral Large 21009696959596.5%
Qwen 3.5 9B10010096959296.5%
Qwen 3.5 35B10010098949096.5%
Gemini 3.1 Pro (Preview)10010097978896.5%
Stealth: Aurora Alpha1009895959396.1%
GPT-5.4 (Reasoning, Low)10010098929096.0%
GPT-5.4 Nano979797959395.8%
Qwen 3.5 27B979696969595.8%
Nemotron 3 Super10010097938995.6%
GPT-5.4 Mini (Reasoning)1009895939295.5%
MoonshotAI: Kimi K2.5979796959395.3%
Grok 4 Fast1009797938995.1%
Ministral 3 3B100100100948295.0%
GPT-5.4 Mini (Reasoning, Low)989796958995.0%
Mistral Large 310010093928994.8%
Mistral Medium 3.110010092928994.7%
GPT-5.4 Nano (Reasoning, Low)969595949394.6%
Qwen 3.5 397B A17B1009796968394.3%
Qwen 3.5 122B989696928994.2%
GPT-4.11009696938594.1%
Inception Mercury1001001001007194.0%
Inception Mercury 2989796928794.0%
Grok 4.20 (Beta, Reasoning)979693939193.9%
Nemotron 3 Nano1009794908793.7%
Grok 4989593939093.6%
GPT-5.4 Mini969693929193.5%
ByteDance Seed 1.61009796957993.4%
GPT-5.4 Nano (Reasoning)989595938392.9%
Claude Haiku 4.51009595918292.7%
Z.AI GLM 4.71009291918992.7%
Gemini 3 Flash (Preview, Reasoning)979492918992.4%
Ministral 3B10010092858492.3%
Writer: Palmyra X51009291898792.1%
GPT-5.4989895878291.9%
Qwen 3 32B1009696907891.9%
Z.AI GLM 5 Turbo969695868591.6%
Mistral Small Creative979494898291.1%
Gemini 3 Flash (Preview)969392878690.9%
MiniMax M2.71009589868390.6%
Gemini 3 Pro (Preview)949390898790.5%
Gemini 3.1 Flash Lite (Preview)969695838290.4%
Claude Opus 4.6 (Reasoning)949492878490.1%
Mistral Small 4 (Reasoning)1009190858590.1%
Qwen 3.5 Plus (2026-02-15)949488878789.8%
Qwen3 235B A22B Instruct 25071009392828289.8%
LFM2 24B1009489848189.7%
Rocinante 12B1009290877989.7%
DeepSeek V3 (2025-03-24)1009490828189.6%
Llama 3.1 8B989288868389.5%
MiniMax M2.5929190898489.1%
Z.AI GLM 51009792837489.1%
Claude 3.5 Haiku100100100737288.9%
GPT-5 Nano918988888688.5%
Ministral 8B1008887858388.5%
Ministral 3 14B928989878488.4%
Gemma 3 27B959591867488.2%
Claude Sonnet 4.5929089858588.1%
Claude Opus 4.5938986868587.9%
Arcee AI: Trinity Mini949187868287.9%
Z.AI GLM 4.61008984838387.9%
Qwen 3.5 Flash969489827787.8%
Stealth: Hunter Alpha929286848287.2%
Claude Opus 4.6979591807387.1%
Claude Opus 4939188828186.9%
Cohere Command R+ (Aug. 2024)949287857686.9%
Qwen 2.5 72B898888868286.8%
Gemini 2.5 Pro1009185797786.1%
Claude Sonnet 4958985827986.1%
WizardLM 2 8x22b938983838085.6%
Llama 3.1 70B1008685787685.0%
Mistral Small 4928786827885.0%
GPT-4o, Aug. 6th (temp=0)998786846884.7%
Stealth: Healer Alpha908986827384.2%
Z.AI GLM 4.7 Flash968985777283.9%
Aion 2.0928787787583.8%
GPT-4o Mini (temp=0)1008482767483.3%
Claude 3.5 Sonnet1008685737082.7%
Hermes 3 405B938886796682.6%
DeepSeek V3.2909079787682.5%
GPT-4o, May 13th (temp=0)908986786982.5%
Gemini 2.5 Flash (Reasoning)878785817081.8%
Claude Sonnet 4.6 (Reasoning)878582817481.8%
Gemma 3 4B918379787581.2%
Ministral 3 8B938579776980.5%
DeepSeek V3 (2024-12-26)898784776580.4%
GPT-4o, Aug. 6th (temp=1)898988835280.1%
Grok 4.20 (Beta)948177737079.1%
Z.AI GLM 4.5918576746878.7%
Mistral NeMO878674737178.4%
DeepSeek V3.1938777726077.8%
Claude Sonnet 4.6908076756877.6%
Gemini 2.5 Flash Lite847676747376.7%
GPT-4.1 Mini827974747075.9%
Gemini 2.5 Flash Lite (Reasoning)898274686675.9%
DeepSeek-V2 Chat848077726575.5%
Claude 3.7 Sonnet828181735774.9%
GPT-4o Mini (temp=1)888173696474.9%
Arcee AI: Trinity Large (Preview)807676736774.5%
Claude 3 Haiku797675756774.5%
Gemma 3 12B888670705673.9%
Llama 3.1 Nemotron 70B847775695972.9%
Gemini 2.5 Flash907970656072.8%
GPT-4o, May 13th (temp=1)928169595871.8%
Hermes 3 70B838255545165.2%
GPT-4.1 Nano707065534159.8%
Mistral Small 3.2 24B756857401450.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 2.0 Lite1001001001009799.3%
ByteDance Seed 1.6 Flash1001001001009599.0%
GPT-51001001001009498.9%
Qwen 3.5 397B A17B1001001001009398.6%
o4 Mini100100100979698.6%
MoonshotAI: Kimi K2.51001001001009298.5%
GPT-5.2100100100989598.5%
GPT-5.4 (Reasoning)100100100989298.0%
o4 Mini High100100100969397.8%
Qwen 3.5 Flash100100100979297.8%
Gemini 3 Flash (Preview)1009897979196.6%
ByteDance Seed 1.61009797969296.4%
Grok 4 Fast10010097958896.1%
GPT-5.110010096968895.9%
Qwen 3.5 27B10010096939195.9%
Qwen 3.5 35B999695959495.9%
Mistral Medium 3.110010095948995.8%
Nemotron 3 Super100100100918595.2%
Mistral Small 410010095928995.1%
GPT-5.4 Mini1009594939394.9%
GPT-5.4 Mini (Reasoning)979797938994.9%
Claude 3.5 Haiku1001001001007494.9%
Gemini 3.1 Pro (Preview)1009696928994.9%
Qwen 3.5 122B979695949194.7%
GPT-5.4 (Reasoning, Low)1009795958694.7%
GPT-5.4 Nano979595959294.7%
Mistral Large 310010094908994.5%
Z.AI GLM 51009693918993.9%
Grok 4979595929193.9%
Qwen 3.5 Plus (2026-02-15)1009794908893.9%
Arcee AI: Trinity Mini1009494919093.8%
Z.AI GLM 4.7 Flash1009693928793.8%
GPT-5.41009893908993.8%
Z.AI GLM 5 Turbo1009595928793.7%
Qwen 3.5 9B1009995938193.7%
Stealth: Aurora Alpha979794938893.6%
ByteDance Seed 2.0 Mini1009795908693.5%
GPT-5.4 Nano (Reasoning)969494938993.3%
Gemini 3 Flash (Preview, Reasoning)979794918893.3%
Llama 3.1 8B1009993908593.3%
Qwen 3 32B959493929092.9%
GPT-4.11009792888792.9%
Z.AI GLM 4.7969593918592.2%
GPT-5.4 Nano (Reasoning, Low)959591908992.0%
Mistral Large100100100867391.8%
Mistral Large 210010092927491.6%
Stealth: Hunter Alpha979595927991.5%
GPT-5 Mini969593928191.4%
Grok 4.20 (Beta, Reasoning)969494938091.4%
Inception Mercury 2959493888691.3%
Claude Sonnet 4.51009696937291.3%
Claude Opus 4.6 (Reasoning)959592918291.2%
Ministral 3 14B949493918491.2%
Gemini 3.1 Flash Lite (Preview)969692888391.0%
Arcee AI: Trinity Large (Preview)10010095907191.0%
DeepSeek V3 (2025-03-24)10010089838291.0%
Gemini 3 Pro (Preview)949291898890.8%
GPT-5.4 Mini (Reasoning, Low)989289888690.7%
GPT-5 Nano959290898690.4%
Qwen 2.5 72B959590868389.8%
Nemotron 3 Nano1009489887589.3%
Aion 2.0939291878389.2%
LFM2 24B10010093787589.2%
Z.AI GLM 4.5959392887889.2%
Hermes 3 405B939188878589.1%
WizardLM 2 8x22b979686848289.0%
Gemma 3 12B959590868089.0%
Writer: Palmyra X51009692807788.9%
Claude Opus 4.6939291887988.8%
GPT-4o, May 13th (temp=0)929088888588.5%
MiniMax M2.7959492847788.5%
Stealth: Healer Alpha949189858488.5%
Gemini 2.5 Pro959189868088.2%
Ministral 8B919087878588.1%
Llama 3.1 70B1009288817988.0%
MiniMax M2.5959287858087.8%
Claude 3.5 Sonnet1009285857687.5%
Mistral Small 4 (Reasoning)939189858087.5%
Mistral NeMO1008685857986.9%
Mistral Small Creative908988868186.7%
Claude Opus 4919191897086.4%
DeepSeek V3.1929185828186.3%
GPT-4o, May 13th (temp=1)888685838284.8%
Gemini 2.5 Flash Lite898884837984.7%
GPT-4o, Aug. 6th (temp=1)919185827484.7%
Llama 3.1 Nemotron 70B1008582797784.4%
DeepSeek-V2 Chat928683817984.3%
DeepSeek V3.2928684837383.6%
GPT-4o Mini (temp=0)1009082826183.1%
Mistral Small 3.2 24B10010085834783.1%
Ministral 3 8B928684797482.9%
Z.AI GLM 4.6948680797582.9%
Gemma 3 27B1009479786082.3%
Claude Haiku 4.5918988796482.3%
GPT-4o, Aug. 6th (temp=0)918785767282.1%
Qwen3 235B A22B Instruct 2507929184835981.9%
Gemini 2.5 Flash Lite (Reasoning)918382816981.3%
Hermes 3 70B968985686881.3%
DeepSeek V3 (2024-12-26)878782807081.1%
Gemini 2.5 Flash918782737281.0%
GPT-4o Mini (temp=1)878381797581.0%
Grok 4.20 (Beta)908482826680.7%
Ministral 3B868382767680.7%
Claude Opus 4.5888879757080.0%
Gemini 2.5 Flash (Reasoning)969581715780.0%
GPT-4.1 Mini908180747479.9%
Claude Sonnet 4918380737279.7%
Rocinante 12B1008476756379.5%
Ministral 3 3B939076746178.8%
Gemma 3 4B918783696178.3%
Inception Mercury999182675077.7%
GPT-4.1 Nano868281775876.6%
Claude 3.7 Sonnet838078716875.9%
Claude Sonnet 4.6 (Reasoning)907775716275.0%
Cohere Command R+ (Aug. 2024)897474726574.7%
Claude Sonnet 4.6857974716073.8%
Claude 3 Haiku886966656069.9%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Haiku100100100100100100.0%
GPT-5.21001001001009799.3%
GPT-5100100100989899.3%
Grok 4.1 Fast1001001001009699.3%
ByteDance Seed 2.0 Lite100100100979798.8%
Gemini 3.1 Pro (Preview)100100100979598.5%
GPT-5.1999898979597.4%
GPT-5.4 Mini (Reasoning)1009696969596.5%
GPT-5.4 (Reasoning)989897949396.1%
o4 Mini High10010095929195.8%
o4 Mini1009696949195.3%
Qwen 3.5 27B979797978895.2%
ByteDance Seed 1.6 Flash1009795939195.1%
Claude Opus 4.6989897919194.9%
GPT-4.1969696939394.9%
Qwen 2.5 72B1009695948994.7%
Qwen 3.5 35B979793939194.2%
Claude Opus 4.6 (Reasoning)1009793928894.2%
ByteDance Seed 1.6989494949094.0%
Grok 4.20 (Beta, Reasoning)1009696938594.0%
Mistral Large10010096918393.9%
Gemini 3.1 Flash Lite (Preview)1009696928593.7%
Mistral Medium 3.1969693929193.6%
Qwen 3.5 397B A17B969696928793.6%
Grok 4979594938993.5%
GPT-5.4 (Reasoning, Low)969594938993.4%
MoonshotAI: Kimi K2.5979692929093.4%
Inception Mercury 21009492928993.4%
GPT-5.4 Nano (Reasoning, Low)959593938893.0%
Qwen 3.5 122B1009692898892.9%
Stealth: Aurora Alpha989393928992.8%
Grok 4 Fast989795938292.8%
ByteDance Seed 2.0 Mini1009797898092.6%
Ministral 3B959595918792.6%
Llama 3.1 8B1009592898692.4%
GPT-5.4979593918692.3%
Z.AI GLM 5 Turbo979690898892.2%
Gemini 3 Flash (Preview, Reasoning)979795868391.7%
GPT-5.4 Mini (Reasoning, Low)939392909091.6%
Aion 2.0979491908691.6%
Ministral 3 14B969291918991.5%
Qwen 3.5 9B949494888791.4%
Mistral Large 31009491878491.3%
Gemini 3 Flash (Preview)989792878391.2%
Qwen 3.5 Flash949492928291.0%
GPT-5 Nano959191898890.9%
GPT-5 Mini969292918390.8%
GPT-5.4 Mini949292898790.6%
Llama 3.1 70B959594937590.4%
Hermes 3 70B1009389878190.1%
Inception Mercury10010093916690.0%
Nemotron 3 Super979696936789.8%
Nemotron 3 Nano1009585858289.4%
Z.AI GLM 4.7949289878489.2%
Stealth: Hunter Alpha979693877389.0%
Grok 4.20 (Beta)958888858588.5%
GPT-5.4 Nano (Reasoning)939188858488.3%
GPT-5.4 Nano938988888388.1%
Mistral Small Creative949286867987.8%
Mistral Large 2949291857687.7%
Mistral Small 4 (Reasoning)939190877887.6%
Qwen 3.5 Plus (2026-02-15)898888878687.6%
MiniMax M2.5979390867287.5%
DeepSeek V3 (2025-03-24)939392857587.5%
Gemini 2.5 Pro969389867287.4%
Z.AI GLM 4.7 Flash979088838087.4%
Z.AI GLM 4.6959189837887.2%
Claude Sonnet 4.5938787868287.1%
Gemini 3 Pro (Preview)949087867887.0%
Claude Opus 4968987857887.0%
Qwen3 235B A22B Instruct 2507968784828085.9%
Mistral Small 4969187797485.5%
Qwen 3 32B1008883837285.3%
Stealth: Healer Alpha979187846885.3%
LFM2 24B949187866985.3%
DeepSeek V3.1938886827785.2%
GPT-4o, Aug. 6th (temp=1)919186857285.1%
Claude Sonnet 4.6 (Reasoning)968685837585.1%
Ministral 3 3B1009484786985.0%
Cohere Command R+ (Aug. 2024)929188827084.6%
GPT-4o, Aug. 6th (temp=0)908984827784.5%
Writer: Palmyra X5888883828084.2%
Llama 3.1 Nemotron 70B958482817784.1%
Rocinante 12B1009183806683.9%
WizardLM 2 8x22b928483827683.3%
Z.AI GLM 5928685827283.3%
DeepSeek V3.2898684817783.3%
Claude Sonnet 4.6958781777683.2%
Z.AI GLM 4.5959581786783.1%
Gemma 3 27B1009183766683.1%
MiniMax M2.7878584817883.0%
Claude Opus 4.5908886836682.5%
DeepSeek V3 (2024-12-26)858584827582.1%
Ministral 3 8B918979777181.3%
GPT-4o Mini (temp=1)928683796481.0%
DeepSeek-V2 Chat969392705380.8%
GPT-4o Mini (temp=0)969676676680.2%
Mistral NeMO858181797480.0%
Hermes 3 405B969284695679.4%
Arcee AI: Trinity Large (Preview)898078757278.6%
Ministral 8B848277757077.7%
Claude Haiku 4.5888476726877.6%
Gemini 2.5 Flash877776747277.5%
Gemini 2.5 Flash Lite848280666675.8%
Arcee AI: Trinity Mini838272726975.5%
Mistral Small 3.2 24B918780655475.5%
Gemma 3 12B817978706775.2%
GPT-4o, May 13th (temp=1)807473717073.7%
Claude 3.7 Sonnet817770706372.4%
Claude 3 Haiku887574734971.8%
Gemini 2.5 Flash Lite (Reasoning)807471716271.7%
Claude 3.5 Sonnet868467615971.6%
Gemma 3 4B928371575571.6%
Claude Sonnet 4868273595170.3%
GPT-4o, May 13th (temp=0)828077664569.9%
Gemini 2.5 Flash (Reasoning)817564594965.7%
GPT-4.1 Mini767063605464.4%
GPT-4.1 Nano746760584460.5%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
MoonshotAI: Kimi K2.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
GPT-5.110010098989899.0%
GPT-510010098979798.4%
GPT-5.4 (Reasoning, Low)10010098989698.3%
Qwen 3.5 397B A17B10010097969697.8%
GPT-5.4989898989897.8%
GPT-5.4 (Reasoning)1009898989397.2%
GPT-5.21009898959597.1%
Qwen 3.5 Flash10010098969197.0%
ByteDance Seed 2.0 Lite100100100978896.9%
Qwen 3.5 9B100100100978796.8%
o4 Mini High10010096959296.8%
ByteDance Seed 1.61009797979096.3%
Qwen 3.5 27B999797959296.1%
Mistral Medium 3.110010095949095.9%
o4 Mini100100100938695.9%
Gemini 3.1 Pro (Preview)10010097968695.8%
GPT-5.4 Mini (Reasoning, Low)10010095939195.8%
Cohere Command R+ (Aug. 2024)100100100958495.8%
Grok 4 Fast979797969295.7%
GPT-5.4 Mini989895949495.7%
Gemini 3 Flash (Preview)979796959395.6%
Inception Mercury1009999928795.3%
GPT-4o, May 13th (temp=0)10010096928895.2%
Nemotron 3 Super10010094948895.2%
Grok 4.20 (Beta, Reasoning)10010097938695.1%
GPT-4.1979796939294.9%
Inception Mercury 21009796928894.8%
Gemini 3.1 Flash Lite (Preview)979696968994.8%
Grok 4979794949294.6%
GPT-5 Mini979594949294.5%
Stealth: Aurora Alpha1009794929094.4%
GPT-5.4 Mini (Reasoning)989695948994.2%
Qwen 3.5 122B969694939294.2%
Gemini 2.5 Pro10010096908494.0%
Qwen 3 32B1009595948593.9%
Z.AI GLM 51009793938593.5%
Z.AI GLM 5 Turbo1009695928493.4%
Nemotron 3 Nano979695928793.3%
Ministral 3 14B10010093908393.3%
GPT-5.4 Nano (Reasoning)979493909093.1%
Qwen 3.5 Plus (2026-02-15)969393938992.9%
Aion 2.010010093908092.6%
Stealth: Healer Alpha969593918792.4%
Gemini 3 Flash (Preview, Reasoning)1009794888392.3%
Qwen 3.5 35B1009691908492.3%
Arcee AI: Trinity Mini1009392918692.2%
WizardLM 2 8x22b1009494908292.0%
Claude Sonnet 4.51009696927591.7%
Claude Opus 4.6959592898691.4%
Ministral 3 3B10010092837990.9%
Claude Opus 4.6 (Reasoning)1009588878390.7%
Gemini 3 Pro (Preview)949393908490.7%
MiniMax M2.7979291908490.6%
GPT-5.4 Nano (Reasoning, Low)919190909090.4%
Ministral 3B10010094867190.2%
Claude Sonnet 4.6959290878690.1%
Mistral Small 3.2 24B999998876790.1%
Llama 3.1 70B1009894867290.0%
Ministral 3 8B929292908389.9%
GPT-5 Nano919090898889.7%
Mistral Large 31009188888289.7%
GPT-5.4 Nano949090908389.2%
ByteDance Seed 2.0 Mini10010087837589.2%
Mistral Small Creative959486868589.2%
Ministral 8B1009493857589.2%
Mistral NeMO969593857689.0%
Stealth: Hunter Alpha969492867488.3%
Qwen3 235B A22B Instruct 2507929291877988.3%
LFM2 24B10010086837188.0%
Z.AI GLM 4.7 Flash929189868288.0%
Mistral Large949491897287.9%
GPT-4o Mini (temp=0)969189828187.8%
Claude Sonnet 4.6 (Reasoning)959291877487.8%
Z.AI GLM 4.7909089848487.5%
Mistral Large 2949292837787.4%
Claude Opus 4.5929291828087.3%
GPT-4o, Aug. 6th (temp=0)949086848287.3%
Gemini 2.5 Flash969283828287.2%
Writer: Palmyra X5929086868287.2%
Z.AI GLM 4.6969085848087.0%
Claude Sonnet 4929191857586.6%
Mistral Small 4 (Reasoning)949089857586.6%
Hermes 3 405B1009389817086.5%
Gemma 3 27B959491777085.5%
Gemini 2.5 Flash Lite (Reasoning)898686848385.5%
Llama 3.1 8B939288856985.4%
Gemini 2.5 Flash Lite939188856784.8%
DeepSeek V3.2978782817784.8%
GPT-4o Mini (temp=1)1009680757284.6%
Claude 3.5 Sonnet1009393855184.5%
DeepSeek V3.1959382797384.3%
Qwen 2.5 72B888484828183.9%
GPT-4o, Aug. 6th (temp=1)1008481807283.4%
Claude Opus 4928986866383.1%
Hermes 3 70B10010075746582.8%
MiniMax M2.5978782767182.6%
Mistral Small 4898986826682.5%
Grok 4.20 (Beta)888584807482.4%
Arcee AI: Trinity Large (Preview)958483796982.0%
GPT-4.1 Mini898887757081.7%
Gemini 2.5 Flash (Reasoning)978885726581.5%
DeepSeek V3 (2025-03-24)919084727081.2%
DeepSeek-V2 Chat949180766581.1%
GPT-4o, May 13th (temp=1)938779747080.5%
Llama 3.1 Nemotron 70B1008879755880.2%
Rocinante 12B968684755579.1%
Z.AI GLM 4.5919082706379.0%
Claude 3 Haiku868577737378.9%
DeepSeek V3 (2024-12-26)838281777178.7%
Gemma 3 4B847978776977.7%
Claude 3.5 Haiku1008684655177.2%
Claude 3.7 Sonnet928181696277.0%
Gemma 3 12B867774726975.9%
Claude Haiku 4.5887877716575.7%
GPT-4.1 Nano827471685870.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.210010098989898.7%
Grok 4.1 Fast10010097979698.1%
ByteDance Seed 1.6 Flash10010098969697.9%
o4 Mini High100100100969397.9%
GPT-51009898979697.8%
ByteDance Seed 2.0 Lite10010097979597.7%
Qwen 3.5 9B100100100949497.7%
o4 Mini10010096969597.5%
Grok 4 Fast1009797979697.4%
GPT-5.4 (Reasoning, Low)10010096959597.1%
Mistral Large100100100948996.7%
GPT-5.11009896969396.6%
GPT-5.4 (Reasoning)1009896949496.3%
Qwen 3.5 397B A17B1009896949396.2%
Qwen 3.5 Flash979796969496.0%
Inception Mercury10010099998095.7%
GPT-5.4989796949395.4%
Gemini 3 Flash (Preview)979695959495.3%
Qwen 3.5 122B1009795939195.2%
GPT-5 Mini1009696949095.1%
Ministral 3 14B1001001001007494.7%
MoonshotAI: Kimi K2.510010094928794.6%
Grok 4.20 (Beta, Reasoning)979695939294.5%
GPT-5.4 Nano (Reasoning, Low)979694939094.1%
Qwen 3.5 27B1009696918894.1%
Mistral Large 3959594949294.1%
GPT-5.4 Mini (Reasoning)989593939194.0%
GPT-4.11009696908893.8%
ByteDance Seed 1.6979393939293.6%
Gemini 3 Flash (Preview, Reasoning)959494939193.5%
Claude Opus 4.6979793928793.1%
Mistral Medium 3.1969692909092.7%
Z.AI GLM 5 Turbo1009792908492.7%
Qwen 2.5 72B1009591898892.6%
GPT-5.4 Nano (Reasoning)959493928892.4%
Inception Mercury 2959493908992.4%
GPT-5.4 Mini (Reasoning, Low)969494918792.2%
Mistral Large 210010094878092.2%
Gemini 3.1 Flash Lite (Preview)1009292878791.7%
Z.AI GLM 5979391918691.6%
GPT-5.4 Mini979491898691.6%
Qwen 3.5 35B1009790898391.5%
Grok 4979191918691.3%
Claude Sonnet 4.51009692878090.9%
ByteDance Seed 2.0 Mini979191908690.9%
GPT-5.4 Nano959190898990.9%
Claude 3.5 Sonnet10010092867690.8%
Gemini 3 Pro (Preview)959390898790.8%
GPT-5 Nano939290908890.6%
Claude Opus 4.6 (Reasoning)1009792838090.4%
WizardLM 2 8x22b1009190868490.3%
Stealth: Healer Alpha989291908090.3%
Grok 4.20 (Beta)989291868389.8%
MiniMax M2.5939292928089.7%
DeepSeek V3.2969493808088.9%
Stealth: Aurora Alpha949087878588.7%
DeepSeek V3 (2025-03-24)1009594847188.6%
Writer: Palmyra X510010085797988.6%
Mistral Small 4 (Reasoning)949392877688.5%
Nemotron 3 Nano1009492866788.0%
GPT-4o, Aug. 6th (temp=0)959086848387.8%
Mistral Small Creative929190838087.3%
Mistral NeMO959489887187.3%
Claude Haiku 4.5919089848287.3%
Qwen 3.5 Plus (2026-02-15)949187857987.1%
Z.AI GLM 4.6948888858187.1%
Nemotron 3 Super1009485797787.1%
Ministral 8B1009483807686.6%
Claude Opus 4.5919188837785.9%
Cohere Command R+ (Aug. 2024)949184847685.8%
Llama 3.1 Nemotron 70B949388876685.7%
Aion 2.0908887867785.6%
Ministral 3B949493777085.6%
Z.AI GLM 4.7 Flash928986837885.5%
Z.AI GLM 4.7948784828085.3%
MiniMax M2.7949083837585.2%
GPT-4o Mini (temp=0)959085797785.1%
Qwen3 235B A22B Instruct 2507898685848185.0%
Stealth: Hunter Alpha898989827785.0%
Ministral 3 8B948984837585.0%
Mistral Small 4968986807084.4%
Gemma 3 27B1008682807184.0%
Claude Sonnet 4.6 (Reasoning)919181807684.0%
Hermes 3 70B949084826683.6%
DeepSeek V3 (2024-12-26)948985826683.3%
Gemini 2.5 Pro928982807383.2%
Ministral 3 3B1009089795783.1%
Gemini 2.5 Flash958786727182.4%
Llama 3.1 8B928986836282.4%
Gemma 3 12B969283716982.2%
Claude Sonnet 4908882796981.7%
Claude Opus 4888681797181.1%
Llama 3.1 70B1009284804980.9%
Z.AI GLM 4.5928782776480.4%
Arcee AI: Trinity Large (Preview)878383757380.3%
DeepSeek V3.1918581776479.6%
Gemini 2.5 Flash Lite (Reasoning)909081686679.1%
Arcee AI: Trinity Mini848281806979.1%
LFM2 24B928277766778.8%
DeepSeek-V2 Chat858278756977.7%
Qwen 3 32B1009090554976.7%
GPT-4o, May 13th (temp=0)918870676676.2%
GPT-4o, Aug. 6th (temp=1)828076726775.5%
GPT-4o Mini (temp=1)868377755575.3%
Gemini 2.5 Flash (Reasoning)787872726974.0%
Hermes 3 405B1007872655473.7%
Claude 3.7 Sonnet908572655072.5%
Gemini 2.5 Flash Lite757371716771.3%
GPT-4o, May 13th (temp=1)948468624370.1%
Rocinante 12B918678761970.0%
Gemma 3 4B857770595769.5%
Claude Sonnet 4.6807069645868.2%
GPT-4.1 Mini747473595466.9%
GPT-4.1 Nano807264605265.5%
Mistral Small 3.2 24B946850463057.5%
Claude 3 Haiku725958513755.5%
Claude 3.5 Haiku1005844401250.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.4 (Reasoning)100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Qwen 3.5 9B1001001001009999.8%
Inception Mercury1001001001009999.8%
GPT-51001001001009899.6%
GPT-5.4 (Reasoning, Low)1001001001009899.5%
GPT-5.21001001001009899.5%
GPT-5.4 Mini (Reasoning)1001001001009899.5%
Grok 41001001001009799.4%
ByteDance Seed 2.0 Lite1001001001009799.4%
ByteDance Seed 1.61001001001009799.3%
GPT-4.11001001001009699.2%
o4 Mini High1001001001009699.2%
Aion 2.01001001001009699.1%
GPT-5.4100100100989799.1%
o4 Mini1001001001009599.0%
MoonshotAI: Kimi K2.51001001001009599.0%
Grok 4 Fast100100100979698.5%
Inception Mercury 2100100100979598.5%
Stealth: Aurora Alpha100100100979598.5%
Claude 3.5 Sonnet1001001001009198.3%
GPT-5 Mini10010098989598.2%
Qwen 3.5 397B A17B100100100969598.2%
GPT-5.11009898989798.1%
ByteDance Seed 1.6 Flash100100100959498.0%
ByteDance Seed 2.0 Mini100100100969397.7%
GPT-5.4 Mini (Reasoning, Low)1009898989597.7%
Mistral Large100100100959397.5%
Qwen 3.5 122B10010096969597.5%
Arcee AI: Trinity Mini100100100949397.4%
GPT-5.4 Mini1009897969597.1%
Claude Sonnet 4.510010096969396.9%
Z.AI GLM 5 Turbo1009796969596.8%
Grok 4.20 (Beta, Reasoning)10010095959396.7%
Gemini 3.1 Pro (Preview)10010096949396.6%
Stealth: Hunter Alpha1009797969296.5%
Mistral Large 3100100100948896.5%
Gemini 2.5 Flash (Reasoning)10010095949396.5%
Stealth: Healer Alpha10010097978896.4%
Z.AI GLM 510010097939196.0%
Mistral Medium 3.1100100100958495.9%
Ministral 3B100100100928895.9%
GPT-5.4 Nano (Reasoning)1009797949295.9%
Gemini 3 Flash (Preview, Reasoning)979795959595.8%
Gemini 2.5 Pro10010096968795.7%
GPT-5.4 Nano (Reasoning, Low)999895949295.5%
Qwen 3.5 35B10010096958695.4%
GPT-4o Mini (temp=1)1009695959195.4%
Mistral Small Creative10010096958695.2%
Claude Opus 41009695949195.2%
Z.AI GLM 4.71009796968795.1%
Z.AI GLM 4.7 Flash1009696948995.0%
Claude Opus 4.61009595948994.9%
Llama 3.1 8B10010094918994.8%
Qwen 3 32B10010095928794.8%
Claude Opus 4.6 (Reasoning)979794949094.6%
Nemotron 3 Super1009595938994.5%
GPT-4o, Aug. 6th (temp=1)969695959094.3%
WizardLM 2 8x22b1009696928694.1%
Claude Opus 4.5979795918994.0%
Ministral 3 14B1009493939094.0%
Mistral Small 3.2 24B1009991918993.9%
Ministral 3 8B10010091908893.9%
Qwen 3.5 27B100100100967393.7%
Gemini 3 Flash (Preview)969694948893.7%
Ministral 8B100100100878193.6%
Mistral Large 21009493928893.6%
Nemotron 3 Nano989693928993.5%
Claude Sonnet 4.6 (Reasoning)100100100878093.4%
DeepSeek V3.21009795908593.3%
GPT-5 Nano969594919093.3%
Claude Sonnet 4.61009591909093.3%
Qwen 3.5 Flash1009695938293.2%
Claude Haiku 4.5969695928793.1%
Gemini 3 Pro (Preview)1009795947692.7%
Ministral 3 3B10010092927892.5%
Mistral Small 41009590898892.5%
DeepSeek-V2 Chat1009690908592.3%
Gemini 2.5 Flash1009492898692.3%
GPT-4o, May 13th (temp=1)969592918792.3%
Z.AI GLM 4.61009290908892.1%
GPT-5.4 Nano949292928992.1%
Rocinante 12B10010092907892.0%
GPT-4o, Aug. 6th (temp=0)1009590888792.0%
Qwen 3.5 Plus (2026-02-15)1009790898492.0%
Writer: Palmyra X510010095927391.9%
MiniMax M2.5969390908991.7%
Gemini 2.5 Flash Lite949291908991.0%
Gemini 3.1 Flash Lite (Preview)1009290868690.8%
LFM2 24B10010094847690.8%
Llama 3.1 Nemotron 70B1009492868090.5%
GPT-4o Mini (temp=0)1009691907690.4%
Llama 3.1 70B10010085858190.4%
MiniMax M2.7969589888390.2%
Gemma 3 27B969190908389.9%
Qwen 2.5 72B959290898489.9%
Cohere Command R+ (Aug. 2024)959594848189.8%
GPT-4.1 Mini1009586858289.7%
Mistral Small 4 (Reasoning)1009686857989.2%
Qwen3 235B A22B Instruct 2507969392847888.6%
DeepSeek V3 (2024-12-26)929090897887.9%
DeepSeek V3.1959386867987.8%
Claude Sonnet 41009586797887.6%
Gemma 3 4B928988887786.9%
Grok 4.20 (Beta)918988858286.8%
DeepSeek V3 (2025-03-24)1009494826286.5%
Hermes 3 405B929191827185.5%
Gemini 2.5 Flash Lite (Reasoning)909089817885.5%
Mistral NeMO958983837685.4%
Z.AI GLM 4.5939287827084.9%
GPT-4o, May 13th (temp=0)1009677767484.5%
Gemma 3 12B1008782817184.0%
Arcee AI: Trinity Large (Preview)948481777782.8%
Hermes 3 70B938280797782.2%
Claude 3.7 Sonnet978781737081.5%
Claude 3 Haiku838381746677.6%
GPT-4.1 Nano848484686576.9%
Claude 3.5 Haiku827761541257.0%