AI-ism adverb frequency

Test: Bad Writing Habits

Avg. Score
88.3%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Grok 4.1 Fast97.7%$0.001837.8s93%
2ByteDance Seed 1.6 Flash97.0%$0.001327.3s91%
3Grok 4 Fast94.2%$0.001724.1s86%
4ByteDance Seed 2.0 Lite98.0%$0.0122.2m95%
5o4 Mini95.4%$0.01525.7s86%
6Stealth: Aurora Alpha92.6%$0.00009.8s84%
7Inception Mercury 292.6%$0.00327.0s84%
8DeepSeek V4 Flash93.5%$0.000631.6s84%
9o4 Mini High95.8%$0.02547.2s87%
10Qwen 3.5 9B95.3%$0.00111.4m85%
11GPT-5 Mini94.0%$0.010057.4s86%
12Gemini 3 Flash (Preview)92.3%$0.007819.6s84%
13Qwen 3.5 Flash93.2%$0.002547.5s84%
14Gemini 3.1 Flash Lite (Preview)91.5%$0.00308.4s81%
15DeepSeek V4 Flash (Reasoning)92.6%$0.000731.1s82%
16Gemini 3.1 Flash Lite91.3%$0.003012.1s81%
17GPT-4.193.4%$0.01844.7s85%
18Gemini 3 Flash (Preview, Reasoning)91.9%$0.01230.1s84%
19Gemini 3.1 Flash Lite (Reasoning)91.7%$0.003011.9s80%
20Qwen 3.6 Flash93.0%$0.01041.4s82%
21Grok 4.392.0%$0.006930.5s81%
22Mistral Medium 3.191.9%$0.004836.5s81%
23Qwen 3.5 35B93.6%$0.0181.0m84%
24ByteDance Seed 1.696.3%$0.0132.5m89%
25GPT-5.4 Nano (Reasoning, Low)90.1%$0.005520.6s81%
26GPT-5.4 Mini (Reasoning, Low)91.0%$0.01516.8s81%
27GPT-5.4 Nano89.6%$0.005726.3s82%
28GPT-5.4 Mini (Reasoning)92.4%$0.02228.1s81%
29Mistral Large 390.8%$0.003330.3s79%
30Qwen 3.6 35B92.3%$0.00831.0m82%
31GPT-5.4 Mini90.9%$0.01516.8s80%
32GPT-5 Nano90.8%$0.00421.4m85%
33GPT-5.4 Nano (Reasoning)89.7%$0.006124.5s80%
34Qwen 3.5 122B93.2%$0.0251.1m83%
35GPT-OSS 120B92.9%$0.00151.8m83%
36Qwen 3.5 Plus (2026-04-20)93.9%$0.0171.8m85%
37Z.AI GLM 5 Turbo90.9%$0.008133.2s79%
38Gemma 4 26B90.3%$0.000955.1s79%
39Qwen 3.5 Plus (2026-02-15)89.5%$0.006031.5s78%
40Mistral Large90.6%$0.01430.9s78%
41Nemotron 3 Nano90.0%$0.00101.1m80%
42Ministral 3 14B88.8%$0.000711.7s75%
43Gemma 4 31B91.8%$0.00101.6m81%
44Ministral 8B88.5%$0.000410.4s75%
45Qwen 3 32B90.8%$0.001554.6s77%
46Z.AI GLM 5.192.1%$0.0141.5m82%
47Mistral Small Creative87.8%$0.00079.1s75%
48Grok 4.3 (Reasoning)94.3%$0.0212.3m85%
49Qwen 3.5 27B92.8%$0.0201.6m82%
50Stealth: Healer Alpha87.8%$0.000023.7s77%
51Z.AI GLM 590.4%$0.00841.2m80%
52Qwen 2.5 72B88.6%$0.001036.7s76%
53Xiaomi MIMO v2.5 Pro89.8%$0.008553.5s78%
54Arcee AI: Trinity Mini88.3%$0.00039.2s73%
55Grok 4.20 (Reasoning)91.1%$0.0181.5m82%
56Xiaomi MIMO v2.587.8%$0.005431.8s77%
57Nemotron 3 Super90.3%$0.00001.4m78%
58Z.AI GLM 4.7 Flash89.3%$0.00171.2m78%
59Mistral Large 289.3%$0.01329.4s76%
60Stealth: Hunter Alpha88.7%$0.000055.0s77%
61Grok 493.4%$0.0481.7m86%
62Inception Mercury90.6%$0.01117.6s71%
63Ministral 3 8B86.9%$0.000819.6s75%
64Mistral Small 4 (Reasoning)87.0%$0.002230.2s75%
65Gemma 4 31B (Reasoning)91.2%$0.00142.2m80%
66DeepSeek V4 Pro88.9%$0.00481.3m78%
67DeepSeek V3 (2025-03-24)88.7%$0.001439.4s73%
68Ministral 3B87.9%$0.00018.1s70%
69GPT-5.293.5%$0.0561.5m83%
70Qwen 3.5 397B A17B93.6%$0.0143.0m84%
71Ministral 3 3B87.4%$0.000511.1s70%
72Grok 4.20 (Beta, Reasoning)89.5%$0.03934.0s78%
73Grok 4.2086.9%$0.009345.7s77%
74Mistral Small 486.3%$0.001418.2s72%
75Gemma 4 26B (Reasoning)90.1%$0.00132.0m79%
76MiniMax M2.787.4%$0.00401.1m76%
77Aion 2.088.6%$0.00641.3m76%
78Llama 3.1 8B87.9%$0.00031.3m75%
79Gemini 3.5 Flash (Reasoning, Minimal)86.1%$0.01812.0s73%
80MoonshotAI: Kimi K2.593.5%$0.0193.2m83%
81GPT-5.491.7%$0.0491.4m80%
82LFM2 24B87.1%$0.000228.4s69%
83Writer: Palmyra X586.3%$0.01122.0s71%
84GPT-5.4 (Reasoning, Low)92.2%$0.0551.4m79%
85Qwen3 235B A22B Instruct 250786.6%$0.001159.2s73%
86GPT-595.4%$0.0652.8m87%
87Gemini 3 Pro (Preview)89.4%$0.05554.4s79%
88Llama 3.1 70B86.6%$0.001529.4s69%
89Claude Opus 4.692.2%$0.0781.2m82%
90Z.AI GLM 4.787.0%$0.0101.4m76%
91DeepSeek V4 Pro (Reasoning)91.7%$0.0153.1m82%
92Z.AI GLM 4.685.4%$0.006551.5s73%
93MiniMax M2.586.1%$0.00341.3m74%
94Gemini 2.5 Pro87.8%$0.03636.2s74%
95GPT-4o, Aug. 6th (temp=0)85.8%$0.02322.7s72%
96Claude Sonnet 4.588.3%$0.03538.1s73%
97Mistral NeMO83.7%$0.000510.1s68%
98WizardLM 2 8x22b88.0%$0.00261.8m74%
99Claude Opus 4.8 (Reasoning, Low)91.1%$0.07141.9s76%
100Rocinante 12B86.2%$0.001438.4s67%
101Cohere Command R+ (Aug. 2024)86.5%$0.02052.5s72%
102Qwen3.6 Max Preview95.0%$0.0503.5m85%
103GPT-5.191.3%$0.0541.8m78%
104Skyfall 36B V283.9%$0.001923.1s67%
105Claude Haiku 4.583.3%$0.01121.6s69%
106DeepSeek V3.285.8%$0.00141.9m74%
107Grok 4.20 (Beta)82.9%$0.01815.8s70%
108ByteDance Seed 2.0 Mini93.1%$0.00454.9m84%
109Qwen 3.6 27B91.3%$0.0252.3m72%
110GPT-4o, Aug. 6th (temp=1)83.4%$0.01824.4s68%
111Claude Opus 4.8 (Reasoning)89.1%$0.07141.7s73%
112Claude Opus 4.6 (Reasoning)91.0%$0.0881.4m79%
113GPT-5.5 (Reasoning, Low)94.9%$0.1391.8m87%
114Claude Opus 4.7 (Reasoning)88.2%$0.07632.0s74%
115GPT-4o Mini (temp=0)82.7%$0.001234.8s66%
116GPT-5.594.7%$0.1391.7m86%
117Llama 3.1 Nemotron 70B82.9%$0.003831.7s65%
118Gemini 2.5 Flash Lite80.4%$0.00099.5s65%
119DeepSeek V3.184.7%$0.00201.8m71%
120Gemini 3.5 Flash (Reasoning)87.0%$0.07137.6s74%
121Gemini 2.5 Flash80.7%$0.005210.6s65%
122Qwen3.7 Max91.8%$0.0682.3m78%
123Z.AI GLM 4.5 Air83.6%$0.002958.2s66%
124Gemma 3 27B82.6%$0.000652.6s66%
125GPT-5.5 (Reasoning)94.8%$0.1421.8m85%
126DeepSeek V3 (2024-12-26)82.6%$0.002154.6s66%
127Claude Sonnet 483.4%$0.03243.7s70%
128GPT-5.4 (Reasoning)93.2%$0.0892.6m81%
129GPT-4o, May 13th (temp=0)83.5%$0.03514.1s66%
130GPT-4o Mini (temp=1)80.9%$0.001234.8s65%
131Gemini 2.5 Flash Lite (Reasoning)79.3%$0.002830.8s67%
132MiniMax M387.3%$0.00603.1m73%
133Claude Opus 4.586.0%$0.07053.4s73%
134Claude Opus 4.786.4%$0.06930.4s70%
135DeepSeek-V2 Chat82.2%$0.002153.3s64%
136Hermes 3 405B81.9%$0.003253.2s63%
137Arcee AI: Trinity Large (Preview)80.8%$0.000043.6s63%
138Claude 3.5 Sonnet85.0%$0.04835.5s65%
139Hermes 3 70B82.2%$0.00101.2m63%
140GPT-4o, May 13th (temp=1)81.1%$0.03314.4s64%
141Gemma 3 12B78.9%$0.000441.3s62%
142GPT-4.1 Mini78.3%$0.002719.0s60%
143Z.AI GLM 4.578.2%$0.005142.1s62%
144Claude Sonnet 4.6 (Reasoning)84.0%$0.0601.2m67%
145MoonshotAI: Kimi K2.695.7%$0.0586.5m87%
146Gemini 2.5 Flash (Reasoning)77.1%$0.01121.5s60%
147Claude Sonnet 4.680.3%$0.03139.3s61%
148Gemini 3.1 Pro (Preview)88.4%$0.1071.8m70%
149Gemma 3 4B74.0%$0.000220.0s57%
150Claude 3 Haiku74.2%$0.002514.9s54%
151Claude 3.7 Sonnet76.6%$0.04246.7s61%
152Cydonia 24B V4.172.2%$0.001444.8s50%
153GPT-4.1 Nano67.5%$0.000713.3s49%
154Claude Opus 485.5%$0.2091.4m73%
155Mistral Small 3.2 24B79.3%$0.00695.7m49%
88.34%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 122B100100100100100100.0%
Nemotron 3 Super1001001001009799.4%
Qwen 3.6 Flash1001001001009699.2%
GPT-5.5 (Reasoning)10010098989898.9%
Qwen3.6 Max Preview100100100979698.5%
Grok 4.1 Fast1001001001009298.5%
GPT-5.410010098979798.5%
ByteDance Seed 1.6 Flash100100100979398.1%
GPT-5.4 (Reasoning)10010098969597.9%
ByteDance Seed 2.0 Lite100100100969397.8%
GPT-5.5 (Reasoning, Low)10010098969497.6%
ByteDance Seed 1.610010096969597.5%
GPT-5.2989898989697.4%
o4 Mini High1009796969697.1%
GPT-5.4 (Reasoning, Low)10010096969397.1%
Grok 410010096969396.8%
GPT-5.4 Mini (Reasoning)979797979596.8%
GPT-5.510010097958996.3%
Qwen3.7 Max100100100958696.2%
DeepSeek V4 Flash (Reasoning)1009695959596.2%
Gemini 3.1 Flash Lite (Preview)1009696969296.1%
o4 Mini100100100978496.1%
Qwen 3.5 27B100100100909096.0%
Claude Opus 4.6 (Reasoning)10010098919195.9%
GPT-5 Mini989696959495.8%
Claude Opus 4.61009794949495.6%
GPT-5.4 Mini (Reasoning, Low)989796959295.5%
Qwen 3.5 397B A17B10010093939295.5%
Qwen 3.5 Flash100100100948395.3%
Gemini 3 Flash (Preview, Reasoning)1009796939095.3%
GPT-51009796938995.1%
Stealth: Hunter Alpha1009794939295.1%
Qwen 3.5 35B1009796958795.1%
GPT-OSS 120B979696949295.0%
Grok 4.3 (Reasoning)10010095909095.0%
Qwen 3.6 27B1009794929295.0%
GPT-5.4 Nano (Reasoning)1009893929094.6%
Claude Opus 4.8 (Reasoning, Low)10010095918794.6%
Arcee AI: Trinity Mini10010093918994.5%
Mistral Large979796938994.5%
Gemini 3 Pro (Preview)1009793929094.5%
Grok 4 Fast979693939394.4%
Mistral Large 31009595929194.4%
GPT-5.1989494949294.3%
DeepSeek V3 (2024-12-26)100100100898294.2%
Qwen 3.5 Plus (2026-04-20)1009695948594.1%
DeepSeek V4 Flash10010096918494.0%
Z.AI GLM 5.11009795918793.9%
DeepSeek V4 Pro (Reasoning)1009595908893.7%
Gemini 3.5 Flash (Reasoning, Minimal)10010095878593.5%
Mistral Large 21009595918693.5%
GPT-5.4 Nano969695928893.5%
Gemma 4 31B1009692909093.4%
Inception Mercury 21009694898893.4%
Qwen 3 32B999594908993.4%
ByteDance Seed 2.0 Mini1009593908993.3%
Ministral 3 8B1009595928593.2%
GPT-5.4 Mini1009690908993.2%
Gemma 4 31B (Reasoning)1009695908593.1%
Qwen 3.5 Plus (2026-02-15)1009393918892.9%
Xiaomi MIMO v2.5 Pro1009797878392.8%
Gemini 3.1 Flash Lite979695928392.8%
Gemini 3.1 Flash Lite (Reasoning)969695888892.7%
GPT-5.4 Nano (Reasoning, Low)949494918892.4%
GPT-4.11009692898492.3%
Gemma 4 26B (Reasoning)1009591908692.3%
Claude Sonnet 4.510010093868292.2%
Z.AI GLM 5 Turbo1009291908792.1%
GPT-4o, May 13th (temp=0)1009895927491.7%
Gemini 3.1 Pro (Preview)1009690878591.6%
Gemma 4 26B1009693888191.5%
DeepSeek V3 (2025-03-24)10010089868191.2%
Gemini 2.5 Pro1009692878191.0%
Qwen 3.5 9B959593888391.0%
MoonshotAI: Kimi K2.6969291908591.0%
DeepSeek-V2 Chat10010092897290.6%
Qwen 3.6 35B959492928090.6%
Grok 4.20 (Reasoning)939190908990.6%
Cohere Command R+ (Aug. 2024)10010094916890.5%
Mistral Medium 3.1939290908890.5%
GPT-5 Nano939292898790.3%
Gemini 3 Flash (Preview)979489888290.1%
MoonshotAI: Kimi K2.51009288868590.0%
Z.AI GLM 51009392848089.9%
Gemini 3.5 Flash (Reasoning)959389898289.5%
Qwen3 235B A22B Instruct 2507969290898089.4%
DeepSeek V3.2939391917989.3%
Stealth: Aurora Alpha919088888788.9%
GPT-4o, Aug. 6th (temp=1)959488868188.9%
Writer: Palmyra X5959088878388.8%
Grok 4.3979591818088.7%
Mistral Small Creative1009592837488.7%
Claude Opus 4929090868688.7%
Grok 4.20 (Beta, Reasoning)979189887888.5%
DeepSeek V3.1939290878088.4%
Skyfall 36B V21009593906388.4%
Llama 3.1 8B959191858088.3%
LFM2 24B1009491876988.3%
MiniMax M2.7969588837988.2%
Claude Sonnet 4.6 (Reasoning)969386848188.0%
MiniMax M3939191838187.9%
Ministral 8B949291907287.7%
Ministral 3 3B10010088856587.6%
Claude Opus 4.7959590817687.5%
Z.AI GLM 4.7 Flash938887858387.4%
Nemotron 3 Nano929088868087.0%
Claude Opus 4.7 (Reasoning)959590906386.9%
Grok 4.20938887858086.8%
Aion 2.0979288857086.4%
Claude Opus 4.5949286847385.9%
Stealth: Healer Alpha938585848285.7%
Claude Opus 4.8 (Reasoning)959086807785.7%
Claude 3 Haiku959385837185.5%
GPT-4o, Aug. 6th (temp=0)898685858385.5%
DeepSeek V4 Pro1008582817885.3%
Grok 4.20 (Beta)939081818084.9%
Z.AI GLM 4.7969183797384.5%
Claude 3.5 Sonnet10010086696784.4%
WizardLM 2 8x22b938482828084.1%
Qwen 2.5 72B968985767684.1%
Rocinante 12B1008884836383.8%
Z.AI GLM 4.5 Air949382826683.6%
Ministral 3 14B1009278757383.6%
Claude Sonnet 4.6929190766883.4%
Xiaomi MIMO v2.5908784837383.4%
Ministral 3B10010092646183.4%
Arcee AI: Trinity Large (Preview)978983747283.0%
MiniMax M2.5919184786782.3%
Hermes 3 70B938481787481.8%
Gemma 3 27B969177756881.3%
Cydonia 24B V4.11007777767480.9%
Gemini 2.5 Flash Lite (Reasoning)908582786980.8%
GPT-4.1 Nano908781776880.6%
Mistral Small 4 (Reasoning)848181797880.6%
Mistral Small 4908980796380.4%
Claude Haiku 4.5938079757279.8%
GPT-4.1 Mini908674737379.4%
Claude Sonnet 4958380766179.2%
Gemini 2.5 Flash838382756878.2%
Z.AI GLM 4.6908276746978.1%
Gemma 3 12B917875727077.2%
Inception Mercury1009973635076.9%
Gemini 2.5 Flash Lite817876757376.6%
Llama 3.1 70B908078706376.3%
Z.AI GLM 4.5898375706175.5%
Hermes 3 405B1008376704875.5%
GPT-4o Mini (temp=0)838381676375.3%
GPT-4o Mini (temp=1)868175736074.9%
Claude 3.7 Sonnet837674717074.9%
Mistral NeMO978274625574.1%
GPT-4o, May 13th (temp=1)797674726974.1%
Gemini 2.5 Flash (Reasoning)828074666573.1%
Llama 3.1 Nemotron 70B857674725872.8%
Gemma 3 4B877373676272.3%
Mistral Small 3.2 24B817566656169.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
o4 Mini High100100100100100100.0%
Qwen 3.5 35B1001001001009699.2%
ByteDance Seed 2.0 Lite1001001001009699.2%
GPT-5.5 (Reasoning)10010098989898.8%
Qwen3.6 Max Preview1001001001009398.6%
Qwen 3.5 397B A17B10010098969597.8%
Qwen 3.5 9B1001001001008897.5%
GPT-5.4 (Reasoning)100100100959397.5%
GPT-5.5989898969697.4%
Grok 4.1 Fast10010096959597.3%
GPT-5.4 Mini10010098969397.2%
ByteDance Seed 1.6 Flash10010097969397.2%
Z.AI GLM 4.5 Air100100100968997.0%
DeepSeek V3 (2025-03-24)1001001001008597.0%
GPT-5.41009897959496.9%
GPT-510010098949196.7%
Claude Opus 4.610010095949496.7%
GPT-5.5 (Reasoning, Low)10010096949296.5%
ByteDance Seed 1.6100100100968696.5%
LFM2 24B100100100968796.5%
Claude Opus 4.6 (Reasoning)1009797949396.3%
Qwen 3.5 27B1009796959496.2%
MoonshotAI: Kimi K2.6100100100919096.2%
Grok 41009696959296.0%
GPT-5.21009897939195.8%
GPT-5.4 (Reasoning, Low)1009895939395.6%
Qwen 3.5 Flash979796969295.6%
Gemini 3 Flash (Preview, Reasoning)979796949395.4%
Gemma 4 31B (Reasoning)1009695959195.3%
Gemma 4 31B1009695958995.1%
DeepSeek V4 Flash1009695958995.0%
Gemini 3.1 Flash Lite (Preview)1009695939195.0%
GPT-5.4 Mini (Reasoning)979795949294.9%
Grok 4.3 (Reasoning)100100100918394.8%
Gemini 3.1 Pro (Preview)969696969094.8%
Qwen3.7 Max10010095918794.8%
o4 Mini1009696968594.7%
Qwen 3 32B1009493939294.5%
Qwen 3.5 122B969595949294.4%
Qwen 3.6 Flash1009695938794.4%
Mistral Small 3.2 24B100100100928094.4%
Inception Mercury 21009594928994.2%
Qwen 3.5 Plus (2026-02-15)979594929193.8%
MoonshotAI: Kimi K2.51009594908993.8%
GPT-5 Mini989795938793.8%
GPT-5.1989896948393.8%
Qwen 3.6 27B1009595938593.5%
Mistral Medium 3.11009795957993.3%
Writer: Palmyra X51009595918593.2%
Qwen 3.5 Plus (2026-04-20)10010095888293.2%
Ministral 3 8B1009594908793.2%
Nemotron 3 Super1009692918793.2%
Ministral 3B100100100877993.1%
DeepSeek V4 Flash (Reasoning)1009694928493.1%
Z.AI GLM 4.7 Flash1009695898592.9%
Z.AI GLM 4.7969493929092.8%
DeepSeek V4 Pro1009695928092.8%
Z.AI GLM 5 Turbo10010095858492.7%
Ministral 8B1009593908692.7%
Mistral Large 21009594878692.6%
Mistral Large 310010090878592.5%
Stealth: Hunter Alpha949393938992.5%
Mistral Small Creative969694908592.5%
Gemini 3.1 Flash Lite (Reasoning)969692908792.2%
Grok 4.20 (Reasoning)969492918892.1%
GPT-4.1969591918692.1%
Gemini 3.5 Flash (Reasoning, Minimal)969591908992.1%
GPT-5.4 Mini (Reasoning, Low)1009593878592.0%
Gemma 4 26B979692898591.8%
Qwen3 235B A22B Instruct 25071009592908391.8%
Qwen 3.6 35B1009791908191.7%
Stealth: Aurora Alpha939291919191.7%
Gemini 3.5 Flash (Reasoning)1009593888391.7%
Arcee AI: Trinity Large (Preview)1009791878391.5%
Claude Opus 4.7 (Reasoning)10010096867691.4%
Claude Sonnet 4.51009592907891.3%
Claude Sonnet 4.6 (Reasoning)1009595868191.3%
ByteDance Seed 2.0 Mini949292908791.2%
Z.AI GLM 5969392908591.1%
Z.AI GLM 5.11009588888491.1%
GPT-OSS 120B959591878691.0%
GPT-5.4 Nano (Reasoning)989593898091.0%
GPT-5 Nano959390908691.0%
Ministral 3 14B1009493858390.9%
Arcee AI: Trinity Mini1009292918090.9%
Xiaomi MIMO v2.5 Pro979493917990.8%
WizardLM 2 8x22b969693878290.8%
DeepSeek V4 Pro (Reasoning)1009591878090.7%
GPT-4o, May 13th (temp=0)979694858090.4%
Mistral Small 4 (Reasoning)949392888490.3%
Mistral Large969491908090.2%
GPT-5.4 Nano (Reasoning, Low)929191908790.1%
Grok 4.20959289888690.0%
Claude Opus 4.8 (Reasoning)1009691857789.8%
GPT-5.4 Nano919189898889.7%
Claude Opus 4.5929291918289.6%
Gemma 4 26B (Reasoning)959592867989.6%
Claude 3.5 Sonnet1009388877889.5%
Claude Opus 4.8 (Reasoning, Low)959090868589.4%
Claude Sonnet 41009089887989.2%
Grok 4 Fast1009692817688.9%
Cohere Command R+ (Aug. 2024)949291858288.9%
Aion 2.0959090878288.8%
Nemotron 3 Nano978988878288.5%
MiniMax M2.51009391837488.2%
Claude Opus 4969391867688.2%
Gemini 3 Pro (Preview)978988868088.1%
Skyfall 36B V21009188818088.1%
GPT-4o, Aug. 6th (temp=0)1009085858188.1%
Gemini 3 Flash (Preview)949189897788.1%
Gemini 3.1 Flash Lite939391877688.0%
Stealth: Healer Alpha939189838387.9%
MiniMax M2.7929187858287.4%
DeepSeek V3.11009693846487.4%
Mistral Small 41009289886787.3%
Qwen 2.5 72B1008989797786.9%
Llama 3.1 70B999489846886.7%
Grok 4.20 (Beta)948885838286.2%
DeepSeek V3.2898986858286.1%
Grok 4.3928685858085.7%
Claude Opus 4.71009581767685.5%
GPT-4o Mini (temp=0)1008783827485.1%
Xiaomi MIMO v2.5949282807785.0%
Grok 4.20 (Beta, Reasoning)949289876284.7%
Rocinante 12B1009288786584.6%
Claude Haiku 4.5878686837883.9%
Gemini 2.5 Flash Lite (Reasoning)928884847183.8%
Llama 3.1 8B929086846583.5%
Hermes 3 405B1009087805983.4%
Mistral NeMO938885846783.4%
Z.AI GLM 4.6928886777082.5%
GPT-4.1 Mini918383807582.5%
Gemini 2.5 Flash949186865582.5%
DeepSeek V3 (2024-12-26)10010089675582.1%
MiniMax M31008881786382.1%
Gemini 2.5 Flash Lite928584787182.0%
Gemma 3 27B958280777481.6%
Gemini 2.5 Flash (Reasoning)958582776981.5%
Gemma 3 12B929185746681.5%
Claude Sonnet 4.6958576757581.3%
GPT-4o, May 13th (temp=1)868383827281.1%
Gemma 3 4B938178767680.9%
Gemini 2.5 Pro868482797380.8%
Ministral 3 3B1009175736580.7%
GPT-4o, Aug. 6th (temp=1)969079786080.7%
Hermes 3 70B1009783734980.4%
DeepSeek-V2 Chat918779737280.3%
Inception Mercury1008681735679.2%
GPT-4o Mini (temp=1)797876767677.3%
Z.AI GLM 4.5878769676575.0%
Claude 3.7 Sonnet867271706773.0%
GPT-4.1 Nano837568676571.7%
Cydonia 24B V4.1897368655970.6%
Claude 3 Haiku908476593468.9%
Llama 3.1 Nemotron 70B937168565468.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 2.0 Lite1001001001009699.2%
MoonshotAI: Kimi K2.6100100100979698.7%
Claude Opus 4.6100100100979297.8%
GPT-5.5 (Reasoning)1009898979697.7%
Qwen 3.5 9B10010098959597.7%
GPT-5989898979697.4%
Claude Opus 4.6 (Reasoning)100100100969197.4%
GPT-OSS 120B10010097949497.0%
GPT-5.51009896969496.9%
GPT-5.4989898969496.6%
ByteDance Seed 1.6 Flash10010097959196.6%
GPT-5.4 (Reasoning, Low)1009898969196.4%
Qwen3.6 Max Preview1009696959596.4%
Gemini 3.1 Pro (Preview)100100100928896.1%
GPT-5.2989696959495.7%
GPT-5.4 (Reasoning)989896949395.7%
Inception Mercury100100100908995.6%
Qwen 3.5 35B1009795949295.5%
ByteDance Seed 1.610010095948895.5%
Qwen 3.5 Plus (2026-04-20)1009796939195.4%
Grok 4.3 (Reasoning)1009696939195.3%
GPT-5.11009895929295.3%
Grok 4.310010095938795.2%
GPT-5.5 (Reasoning, Low)979595949495.1%
Mistral Medium 3.110010097918895.1%
Qwen 3.6 27B989797978494.8%
GPT-5.4 Mini (Reasoning)989794939194.6%
Qwen3.7 Max10010094918794.3%
Qwen 3.5 397B A17B979695929094.2%
Qwen 3.5 122B1009594919093.9%
Gemini 3.5 Flash (Reasoning)10010096898393.8%
Qwen 3.5 Flash979693939093.8%
Qwen 3.5 27B979695919093.8%
Qwen 3.6 Flash10010092918693.8%
Qwen 3 32B10010097868693.7%
GPT-5.4 Mini969693939093.5%
DeepSeek V4 Pro1009392919093.2%
Qwen 3.6 35B999692918893.1%
Gemini 3 Flash (Preview, Reasoning)1009493908892.9%
DeepSeek V4 Flash1009591918892.9%
GPT-5.4 Mini (Reasoning, Low)969593928892.9%
Stealth: Aurora Alpha979797918192.8%
Grok 4 Fast979797888692.7%
Inception Mercury 2979593918792.6%
Mistral Small Creative1009696937792.5%
GPT-5.4 Nano959592928892.2%
GPT-4.1 Mini1009492888692.0%
Gemini 3.1 Flash Lite (Preview)1009392888791.9%
DeepSeek V3 (2025-03-24)959291919091.7%
Llama 3.1 70B10010093848091.4%
MoonshotAI: Kimi K2.5969695858591.4%
Arcee AI: Trinity Mini1009690868591.4%
ByteDance Seed 2.0 Mini949492898791.3%
GPT-5.4 Nano (Reasoning)949390908991.2%
Claude Opus 4.7 (Reasoning)1009695848191.2%
Claude Opus 4.7969591888691.2%
Gemma 4 31B10010092867791.2%
DeepSeek V4 Pro (Reasoning)959390898891.1%
GPT-5 Nano989292918391.0%
Gemma 4 31B (Reasoning)969695868290.9%
GPT-5 Mini989492868390.8%
Mistral Large 21009791877990.7%
Claude Opus 4.8 (Reasoning)1009595867790.7%
Grok 4949389898890.5%
o4 Mini969392878590.5%
Ministral 8B100100100797490.4%
MiniMax M31009290888290.4%
Xiaomi MIMO v2.5949393918190.4%
Gemma 4 26B (Reasoning)1009691848090.4%
Claude 3.5 Sonnet949392888490.3%
Writer: Palmyra X5969393898090.1%
Z.AI GLM 5979588888290.0%
Ministral 3 3B10010088867690.0%
GPT-4.11009391877990.0%
o4 Mini High1009389868290.0%
Grok 4.20 (Beta, Reasoning)1009587868290.0%
Gemini 3 Flash (Preview)949488878789.9%
Mistral Large969590888189.8%
Claude Opus 4.8 (Reasoning, Low)969590878089.8%
Xiaomi MIMO v2.5 Pro929090888789.4%
Gemini 3.1 Flash Lite (Reasoning)969692837889.2%
Grok 4.20979290897789.2%
GPT-5.4 Nano (Reasoning, Low)969089888289.2%
Nemotron 3 Nano989784848389.0%
Nemotron 3 Super959591838088.9%
Gemini 3.1 Flash Lite969392857988.9%
Grok 4.20 (Reasoning)939188878588.8%
Llama 3.1 8B939088868688.8%
Ministral 3B938988888688.7%
DeepSeek V3.11009389847788.6%
Gemma 4 26B969584838388.4%
Claude Sonnet 4.6968989868188.2%
Cohere Command R+ (Aug. 2024)959491867588.1%
Qwen 2.5 72B968987868188.1%
Z.AI GLM 5.1929088878388.1%
Aion 2.0928988878387.9%
Mistral Small 4 (Reasoning)959088838387.8%
DeepSeek V4 Flash (Reasoning)10010086767687.7%
Stealth: Healer Alpha928987878387.6%
MiniMax M2.7949390827887.4%
Grok 4.20 (Beta)1009485827486.9%
Claude Opus 4969284847686.5%
Claude Sonnet 4.5908987868086.4%
Z.AI GLM 4.7 Flash948685858286.3%
Qwen 3.5 Plus (2026-02-15)939187857686.2%
Claude Sonnet 4.6 (Reasoning)969686817286.2%
GPT-4o Mini (temp=1)938787828286.1%
Ministral 3 14B949087827886.1%
Claude Sonnet 4948787837985.9%
Rocinante 12B1009582807385.9%
Hermes 3 405B969390846585.7%
Stealth: Hunter Alpha1009180797885.5%
Gemini 3 Pro (Preview)908886828185.5%
Skyfall 36B V21008988846485.2%
DeepSeek V3.2908785837985.1%
Z.AI GLM 4.6908885818085.0%
Claude Opus 4.5948989856885.0%
GPT-4o Mini (temp=0)918988857284.9%
Qwen3 235B A22B Instruct 25071009078787684.5%
DeepSeek-V2 Chat1009583776784.5%
Gemini 2.5 Pro969186787184.3%
WizardLM 2 8x22b938984827384.3%
Claude 3 Haiku1009481767084.2%
Ministral 3 8B959090796684.0%
Z.AI GLM 5 Turbo898783797883.3%
Llama 3.1 Nemotron 70B958982757583.2%
Z.AI GLM 4.7909088747282.9%
Mistral Large 3868585847582.8%
Z.AI GLM 4.5 Air1008882806382.5%
GPT-4o, May 13th (temp=1)958080797882.4%
GPT-4o, Aug. 6th (temp=0)958880757382.1%
Mistral Small 4969587686482.0%
Gemini 3.5 Flash (Reasoning, Minimal)928181787781.6%
Claude 3.7 Sonnet888785846381.4%
DeepSeek V3 (2024-12-26)1009084765781.3%
MiniMax M2.5968782766481.0%
Mistral NeMO948482737080.7%
LFM2 24B959088854279.8%
Hermes 3 70B918975746779.3%
Gemini 2.5 Flash Lite898482776078.7%
GPT-4o, May 13th (temp=0)948383696378.2%
Claude Haiku 4.5868077747277.7%
GPT-4o, Aug. 6th (temp=1)858584666476.8%
Gemini 2.5 Flash Lite (Reasoning)868179706075.0%
Gemma 3 27B848170696974.3%
Mistral Small 3.2 24B10010080751473.7%
Gemini 2.5 Flash (Reasoning)838073655972.1%
Gemini 2.5 Flash858279684471.6%
Arcee AI: Trinity Large (Preview)928874535071.3%
Gemma 3 12B797474656471.0%
Cydonia 24B V4.1817471615969.4%
Gemma 3 4B737067625866.3%
Z.AI GLM 4.5797664635166.2%
GPT-4.1 Nano736866654463.1%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen3.7 Max100100100100100100.0%
Inception Mercury1001001001009699.3%
Grok 4.1 Fast1001001001009599.0%
GPT-5.4 (Reasoning)100100100989799.0%
Grok 4.3 (Reasoning)1001001001009498.9%
GPT-5100100100989598.6%
Qwen 3.5 122B100100100979698.6%
GPT-5.4 (Reasoning, Low)1001001001009398.6%
GPT-5.5 (Reasoning, Low)1009898989898.5%
ByteDance Seed 2.0 Lite100100100969698.3%
Qwen 3.5 35B10010098979798.3%
Qwen 3.5 397B A17B100100100959598.0%
GPT-5.4 Mini (Reasoning)10010097979598.0%
GPT-5.5 (Reasoning)1009898989597.8%
GPT-OSS 120B100100100959397.6%
GPT-5.510010097969697.6%
Mistral Small 3.2 24B1009998979397.5%
Claude Opus 4.61009797969697.3%
Qwen 3.6 Flash10010097979297.2%
Inception Mercury 210010097959296.8%
Qwen3.6 Max Preview10010096959296.7%
GPT-5.4989897969596.7%
ByteDance Seed 1.6 Flash10010096959196.5%
Claude Opus 4.8 (Reasoning, Low)10010095959196.3%
Qwen 3.5 Plus (2026-02-15)1009797949396.1%
Qwen 3.6 35B100100100988296.0%
o4 Mini High979796959495.7%
Grok 4 Fast1009796968995.6%
Gemini 3.1 Pro (Preview)1009695939395.6%
Claude Opus 4.8 (Reasoning)10010096958695.5%
GPT-5.21009694949395.4%
Qwen 3.5 Flash1009696968895.4%
DeepSeek-V2 Chat10010096918995.3%
GPT-4.1969696969395.3%
Gemini 3 Flash (Preview)979695949495.2%
Z.AI GLM 4.5 Air10010096948595.1%
Claude Opus 4.7 (Reasoning)10010095909095.1%
GPT-5 Mini989694949395.1%
Claude Opus 4.6 (Reasoning)1009797948594.7%
Qwen 3.5 27B10010095928694.7%
Qwen 3.5 9B10010095908894.5%
ByteDance Seed 1.610010095888894.2%
Qwen 3.5 Plus (2026-04-20)1009796898994.2%
Stealth: Aurora Alpha1009695908994.2%
GPT-5.1989695938893.9%
Nemotron 3 Nano969594929293.7%
MoonshotAI: Kimi K2.51009595898993.6%
Mistral Medium 3.11009594918793.5%
MoonshotAI: Kimi K2.610010094928293.5%
GPT-4o, Aug. 6th (temp=1)1009594908993.5%
o4 Mini1009793908793.5%
GPT-5.4 Mini (Reasoning, Low)989593938993.4%
Z.AI GLM 5.1969595948593.2%
Aion 2.01009695888793.2%
Xiaomi MIMO v2.5979594948693.2%
Writer: Palmyra X5969695918893.2%
Qwen 2.5 72B10010094898293.1%
Claude Opus 4.71009591918993.1%
Grok 41009795908393.0%
GPT-4o, Aug. 6th (temp=0)100100100957092.9%
Z.AI GLM 5 Turbo1009594928392.9%
GPT-5.4 Mini1009392908992.9%
Ministral 8B959592929192.7%
Qwen 3 32B1009695947792.5%
DeepSeek V4 Pro (Reasoning)959592908992.4%
Gemma 4 31B (Reasoning)1009695908192.4%
Mistral Small 4 (Reasoning)1009694908292.3%
Ministral 3 14B10010091878492.3%
MiniMax M31009794918092.3%
Z.AI GLM 5979392908892.1%
Gemini 3.1 Flash Lite (Reasoning)10010091907992.0%
GPT-5.4 Nano (Reasoning)969692918491.9%
ByteDance Seed 2.0 Mini10010095877791.8%
Gemini 3 Flash (Preview, Reasoning)969593898691.7%
Rocinante 12B949492908891.7%
Qwen3 235B A22B Instruct 2507969595957891.7%
Claude Sonnet 4.51009691878391.6%
Z.AI GLM 4.7 Flash1009291898691.5%
Gemini 3.1 Flash Lite (Preview)1009291898491.4%
GPT-5 Nano959391908791.4%
Gemini 3.5 Flash (Reasoning)1009290898691.3%
Ministral 3 8B1009592868491.2%
Xiaomi MIMO v2.5 Pro979491908491.2%
Gemma 4 26B1009392868491.1%
LFM2 24B10010093837891.0%
DeepSeek V4 Flash1009593848290.9%
Grok 4.31009692877990.9%
Ministral 3 3B1009089878690.6%
Mistral Small 41009793867690.4%
DeepSeek V3.11009686858590.2%
DeepSeek V4 Pro10010092807890.0%
Gemini 3 Pro (Preview)1009290907890.0%
Grok 4.20 (Reasoning)979089888689.9%
Llama 3.1 70B10010092837489.8%
GPT-4o, May 13th (temp=0)10010095787589.7%
Claude Sonnet 4.61009591818189.6%
Gemini 2.5 Pro969590878089.6%
Claude 3.5 Sonnet939292878289.4%
Qwen 3.6 27B979692867689.3%
Gemini 3.1 Flash Lite1009089878189.3%
Skyfall 36B V210010086867489.2%
Z.AI GLM 4.71009491827889.1%
Arcee AI: Trinity Mini1009292827989.1%
DeepSeek V3 (2025-03-24)10010091846988.9%
Claude Sonnet 4.6 (Reasoning)1009190827988.7%
GPT-5.4 Nano (Reasoning, Low)949190897988.6%
GPT-5.4 Nano949088878388.5%
Mistral NeMO929088858587.9%
Ministral 3B919190887987.8%
DeepSeek V4 Flash (Reasoning)1009589847187.8%
Grok 4.20908989868587.8%
Mistral Small Creative959491867387.6%
Gemma 4 31B959388867587.6%
Hermes 3 70B969487837887.6%
Mistral Large959384838387.5%
Grok 4.20 (Beta, Reasoning)969686807887.4%
Claude Opus 4929188878087.4%
WizardLM 2 8x22b959388817786.9%
GPT-4o, May 13th (temp=1)919189877686.7%
Stealth: Hunter Alpha939185838186.7%
GPT-4o Mini (temp=0)1009184797986.5%
Nemotron 3 Super1008682827985.8%
Claude 3 Haiku959585817385.7%
Gemma 4 26B (Reasoning)918685838185.3%
Llama 3.1 Nemotron 70B939285847185.1%
Stealth: Healer Alpha919189847184.9%
Claude Haiku 4.51009085797084.8%
DeepSeek V3.2918685837784.4%
Mistral Large 3918988847084.4%
Claude Opus 4.5969683806684.2%
Cohere Command R+ (Aug. 2024)958886787384.1%
Gemini 2.5 Flash928685807683.9%
MiniMax M2.7929280807483.7%
Gemini 3.5 Flash (Reasoning, Minimal)968584777683.6%
Llama 3.1 8B949390756783.5%
Gemini 2.5 Flash Lite1009278747182.9%
Z.AI GLM 4.6958983776581.9%
Mistral Large 2918684826481.4%
Arcee AI: Trinity Large (Preview)1009075756681.1%
Claude Sonnet 4888382777581.1%
Grok 4.20 (Beta)908281777480.7%
MiniMax M2.5888782786580.0%
Gemma 3 4B918675747379.8%
Cydonia 24B V4.1958378727179.6%
DeepSeek V3 (2024-12-26)848381797079.5%
Gemma 3 27B878480776778.8%
Gemma 3 12B888078786778.1%
GPT-4o Mini (temp=1)1008276686478.0%
Claude 3.7 Sonnet947674727177.3%
Gemini 2.5 Flash (Reasoning)888279746377.3%
GPT-4.1 Mini907771716975.7%
Hermes 3 405B878476696075.1%
Z.AI GLM 4.5827776716674.3%
Gemini 2.5 Flash Lite (Reasoning)848072706273.5%
GPT-4.1 Nano757067555263.9%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100100100100.0%
o4 Mini1001001001009799.4%
Inception Mercury100100100999899.4%
Qwen 3.6 Flash1001001001009699.2%
GPT-5.5 (Reasoning)10010098989898.8%
GPT-5.5 (Reasoning, Low)100100100979698.6%
ByteDance Seed 1.6 Flash100100100979698.4%
Qwen3.6 Max Preview10010097979698.0%
GPT-5.4 Mini (Reasoning)10010098979598.0%
Grok 4.3100100100979297.8%
GPT-5.51009898989597.8%
Grok 4.3 (Reasoning)10010097979197.1%
GPT-5.4 (Reasoning, Low)989898969596.9%
MoonshotAI: Kimi K2.610010097969296.9%
GPT-5.21009896969496.7%
Qwen 3.6 27B1009898959396.6%
Grok 4 Fast1009796969496.6%
GPT-5.4 (Reasoning)1009896969396.5%
Z.AI GLM 5 Turbo10010097939296.5%
Qwen 3.6 35B10010096968996.3%
Qwen 3.5 122B10010094949396.2%
Xiaomi MIMO v2.5 Pro10010098958896.2%
ByteDance Seed 2.0 Lite1009696969295.9%
Qwen 3.5 9B10010096948995.7%
ByteDance Seed 2.0 Mini100100100928695.7%
Claude Opus 4.6 (Reasoning)989797949295.6%
ByteDance Seed 1.610010095929195.6%
GPT-510010096948795.5%
GPT-4.110010096958795.5%
DeepSeek V4 Flash10010095948995.5%
GPT-5.41009695939395.5%
Claude Opus 4.6979695949495.3%
Qwen 3.5 Plus (2026-02-15)1009795929295.3%
Qwen 3.5 27B1009796958794.9%
Qwen 3.5 Flash10010097898894.9%
GPT-5.4 Mini (Reasoning, Low)979794939394.8%
Claude Opus 4.8 (Reasoning)10010095918694.4%
Qwen 3.5 35B979696948994.3%
DeepSeek V4 Flash (Reasoning)1009494939094.0%
o4 Mini High1009796928593.9%
Z.AI GLM 5.1969695948893.8%
Claude Opus 4.7 (Reasoning)10010095868593.3%
Grok 4979493929093.1%
GPT-5.4 Mini1009693888893.1%
Inception Mercury 210097