AI-ism word frequency

Test: Bad Writing Habits

Avg. Score
39.0%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1ByteDance Seed 1.6 Flash69.0%$0.001327.3s51%
2GPT-5.4 Mini (Reasoning)71.2%$0.02228.1s55%
3GPT-5.4 Mini68.1%$0.01516.8s53%
4ByteDance Seed 2.0 Lite76.3%$0.0122.2m63%
5GPT-5.4 Mini (Reasoning, Low)68.0%$0.01516.8s51%
6Claude Opus 4.779.2%$0.06930.4s60%
7Claude Opus 4.7 (Reasoning)78.1%$0.07632.0s58%
8GPT-5 Mini66.0%$0.010057.4s49%
9Claude Sonnet 4.670.0%$0.03139.3s48%
10GPT-5.473.9%$0.0491.4m57%
11GPT-5.4 (Reasoning, Low)73.0%$0.0551.4m58%
12GPT-5.4 Nano (Reasoning, Low)57.9%$0.005520.6s42%
13Claude Opus 4.8 (Reasoning)71.6%$0.07141.7s53%
14GPT-5.4 Nano55.5%$0.005726.3s42%
15GPT-5.4 Nano (Reasoning)56.3%$0.006124.5s40%
16MiniMax M371.2%$0.00603.1m54%
17MiniMax M2.560.5%$0.00341.3m44%
18Claude Opus 4.8 (Reasoning, Low)71.5%$0.07141.9s49%
19Z.AI GLM 5 Turbo60.3%$0.008133.2s37%
20Claude Sonnet 4.6 (Reasoning)71.3%$0.0601.2m50%
21Claude Haiku 4.557.4%$0.01121.6s38%
22MiniMax M2.758.4%$0.00401.1m40%
23DeepSeek V4 Flash54.9%$0.000631.6s35%
24GPT-574.2%$0.0652.8m61%
25DeepSeek V4 Pro57.6%$0.00481.3m40%
26Grok 4.20 (Beta)51.0%$0.01815.8s38%
27DeepSeek V4 Flash (Reasoning)52.8%$0.000731.1s33%
28Z.AI GLM 554.8%$0.00841.2m39%
29Grok 4.2050.1%$0.009345.7s37%
30GPT-5.261.5%$0.0561.5m48%
31GPT-5.163.4%$0.0541.8m48%
32GPT-5.4 (Reasoning)73.5%$0.0892.6m58%
33Grok 4.348.7%$0.006930.5s33%
34Qwen 3.5 Flash47.5%$0.002547.5s34%
35GPT-5.574.4%$0.1391.7m62%
36Claude Opus 4.663.6%$0.0781.2m47%
37Qwen 3.5 9B55.4%$0.00111.4m32%
38Claude Sonnet 4.554.6%$0.03538.1s35%
39Writer: Palmyra X546.6%$0.01122.0s32%
40Grok 4.20 (Beta, Reasoning)53.1%$0.03934.0s37%
41GPT-5.5 (Reasoning)74.8%$0.1421.8m62%
42GPT-5.5 (Reasoning, Low)74.4%$0.1391.8m60%
43Gemini 3.5 Flash (Reasoning, Minimal)44.5%$0.01812.0s31%
44GPT-5 Nano46.6%$0.00421.4m36%
45Z.AI GLM 4.7 Flash45.8%$0.00171.2m33%
46Gemini 3 Flash (Preview)41.1%$0.007819.6s30%
47Claude Opus 4.558.0%$0.07053.4s41%
48Z.AI GLM 5.152.7%$0.0141.5m34%
49ByteDance Seed 2.0 Mini68.6%$0.00454.9m51%
50Grok 4.20 (Reasoning)49.4%$0.0181.5m36%
51Z.AI GLM 4.747.8%$0.0101.4m34%
52Skyfall 36B V245.3%$0.001923.1s23%
53Claude Opus 4.6 (Reasoning)62.7%$0.0881.4m44%
54Z.AI GLM 4.543.0%$0.005142.1s28%
55Mistral Small 4 (Reasoning)41.5%$0.002230.2s27%
56Qwen3 235B A22B Instruct 250743.8%$0.001159.2s28%
57Xiaomi MIMO v2.544.0%$0.005431.8s25%
58Claude 3.7 Sonnet49.1%$0.04246.7s33%
59Xiaomi MIMO v2.5 Pro45.5%$0.008553.5s26%
60Mistral Medium 3.139.0%$0.004836.5s27%
61Mistral Small Creative35.9%$0.00079.1s24%
62Mistral Small 437.0%$0.001418.2s25%
63ByteDance Seed 1.654.6%$0.0132.5m34%
64Qwen 3.5 122B46.4%$0.0251.1m31%
65Qwen 3.5 Plus (2026-02-15)38.7%$0.006031.5s26%
66Qwen 3.5 35B47.0%$0.0181.0m27%
67Qwen 3 32B40.4%$0.001554.6s25%
68Stealth: Healer Alpha37.4%$0.000023.7s22%
69Cydonia 24B V4.138.8%$0.001444.8s25%
70Rocinante 12B43.1%$0.001438.4s20%
71Stealth: Hunter Alpha39.5%$0.000055.0s24%
72Gemini 3 Flash (Preview, Reasoning)37.3%$0.01230.1s26%
73DeepSeek V4 Pro (Reasoning)54.2%$0.0153.1m37%
74Aion 2.040.3%$0.00641.3m28%
75Grok 4.3 (Reasoning)50.3%$0.0212.3m33%
76Ministral 3 14B33.3%$0.000711.7s20%
77Llama 3.1 8B38.7%$0.00031.3m26%
78Z.AI GLM 4.5 Air38.0%$0.002958.2s23%
79GPT-4.138.0%$0.01844.7s25%
80DeepSeek V3 (2025-03-24)35.9%$0.001439.4s20%
81Mistral Large 236.8%$0.01329.4s21%
82Qwen 3.6 Flash36.3%$0.01041.4s22%
83Mistral Large 334.9%$0.003330.3s20%
84Qwen 3.5 27B44.5%$0.0201.6m28%
85Gemini 3 Pro (Preview)45.0%$0.05554.4s31%
86Mistral Large36.5%$0.01430.9s21%
87Arcee AI: Trinity Large (Preview)35.7%$0.000043.6s19%
88DeepSeek V3.239.2%$0.00141.9m26%
89Claude Sonnet 439.2%$0.03243.7s24%
90Ministral 8B27.1%$0.000410.4s15%
91Qwen 3.5 397B A17B46.8%$0.0143.0m31%
92Qwen 3.6 35B33.5%$0.00831.0m21%
93Gemini 3.5 Flash (Reasoning)42.3%$0.07137.6s29%
94Z.AI GLM 4.631.3%$0.006551.5s18%
95Ministral 3 8B28.3%$0.000819.6s14%
96Gemini 3.1 Flash Lite (Preview)23.8%$0.00308.4s15%
97MoonshotAI: Kimi K2.546.4%$0.0193.2m31%
98Qwen 3.5 Plus (2026-04-20)35.5%$0.0171.8m24%
99Hermes 3 70B29.9%$0.00101.2m18%
100Hermes 3 405B31.9%$0.003253.2s14%
101Gemini 3.1 Flash Lite23.3%$0.003012.1s13%
102Gemini 3.1 Flash Lite (Reasoning)22.5%$0.003011.9s13%
103Claude 3.5 Sonnet33.9%$0.04835.5s19%
104WizardLM 2 8x22b33.8%$0.00261.8m17%
105DeepSeek V3.130.3%$0.00201.8m19%
106Gemma 3 27B21.0%$0.000652.6s16%
107DeepSeek V3 (2024-12-26)23.8%$0.002154.6s13%
108Gemini 2.5 Pro26.5%$0.03636.2s16%
109DeepSeek-V2 Chat21.9%$0.002153.3s12%
110Gemma 4 31B24.8%$0.00101.6m16%
111Ministral 3B17.3%$0.00018.1s7%
112Qwen 3.6 27B33.6%$0.0252.3m22%
113Grok 4 Fast15.1%$0.001724.1s11%
114Gemma 4 26B19.6%$0.000955.1s12%
115o4 Mini17.6%$0.01525.7s12%
116LFM2 24B18.4%$0.000228.4s7%
117Grok 4.1 Fast16.9%$0.001837.8s10%
118Ministral 3 3B13.5%$0.000511.1s5%
119Qwen3.7 Max38.8%$0.0682.3m25%
120Gemma 4 31B (Reasoning)23.8%$0.00142.2m14%
121Arcee AI: Trinity Mini12.9%$0.00039.2s0%
122Gemini 2.5 Flash12.6%$0.005210.6s2%
123Nemotron 3 Super16.4%$0.00001.4m10%
124Gemma 3 12B13.0%$0.000441.3s5%
125Mistral NeMO10.6%$0.000510.1s1%
126o4 Mini High16.1%$0.02547.2s9%
127Qwen3.6 Max Preview39.6%$0.0503.5m25%
128GPT-4.1 Mini11.7%$0.002719.0s0%
129Llama 3.1 70B13.3%$0.001529.4s0%
130Cohere Command R+ (Aug. 2024)19.9%$0.02052.5s4%
131Claude 3 Haiku9.6%$0.002514.9s0%
132Gemini 2.5 Flash Lite7.9%$0.00099.5s0%
133GPT-4.1 Nano8.0%$0.000713.3s0%
134Inception Mercury12.7%$0.01117.6s0%
135Claude Opus 454.6%$0.2091.4m37%
136Gemma 3 4B8.2%$0.000220.0s0%
137Gemini 2.5 Flash (Reasoning)11.1%$0.01121.5s0%
138Gemma 4 26B (Reasoning)15.3%$0.00132.0m10%
139Stealth: Aurora Alpha4.5%$0.00009.8s0%
140Inception Mercury 25.1%$0.00327.0s0%
141Gemini 2.5 Flash Lite (Reasoning)8.9%$0.002830.8s0%
142Gemini 3.1 Pro (Preview)34.5%$0.1071.8m23%
143Qwen 2.5 72B7.9%$0.001036.7s0%
144Grok 419.2%$0.0481.7m13%
145Llama 3.1 Nemotron 70B4.9%$0.003831.7s0%
146GPT-4o Mini (temp=1)2.3%$0.001234.8s0%
147MoonshotAI: Kimi K2.652.1%$0.0586.5m35%
148GPT-4o Mini (temp=0)0.8%$0.001234.8s0%
149GPT-4o, May 13th (temp=0)8.9%$0.03514.1s0%
150Nemotron 3 Nano6.2%$0.00101.1m0%
151GPT-OSS 120B9.9%$0.00151.8m4%
152GPT-4o, May 13th (temp=1)7.4%$0.03314.4s0%
153GPT-4o, Aug. 6th (temp=1)2.8%$0.01824.4s0%
154GPT-4o, Aug. 6th (temp=0)3.1%$0.02322.7s0%
155Mistral Small 3.2 24B11.9%$0.00695.7m0%
39.03%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 2.0 Lite897977757078.0%
Claude Opus 4.7818167666271.3%
GPT-5.5787774695871.1%
GPT-5.4777269666169.0%
Claude Opus 4.7 (Reasoning)827770634868.1%
ByteDance Seed 1.6 Flash827769554866.1%
Claude Opus 4.8 (Reasoning)777563544562.8%
GPT-5.4 Mini716665585362.7%
GPT-5756363565662.4%
GPT-5.4 Mini (Reasoning)726562605162.0%
GPT-5.4 (Reasoning)716459584960.3%
Claude Sonnet 4.6 (Reasoning)827862463260.1%
GPT-5.5 (Reasoning, Low)746865493858.9%
GPT-5.4 Mini (Reasoning, Low)666261555058.9%
GPT-5.4 (Reasoning, Low)716158525258.8%
GPT-5 Mini766059514858.7%
GPT-5.5 (Reasoning)605958575658.1%
GPT-5.2676659514457.4%
ByteDance Seed 2.0 Mini786059423554.8%
Claude Opus 4.8 (Reasoning, Low)716856403554.0%
Claude Sonnet 4.6656550502851.7%
GPT-5.1686043413749.8%
Claude Opus 4.6644946454449.5%
Grok 4.20 (Beta, Reasoning)615249443648.5%
MiniMax M3565252433748.2%
Qwen 3.5 35B736436322947.0%
Claude Opus 4.5645453391545.1%
Claude Opus 4615147353245.0%
Claude Opus 4.6 (Reasoning)635753272144.4%
Qwen 3.5 9B66545049244.1%
ByteDance Seed 1.6635037363043.0%
MiniMax M2.5676234272542.8%
Grok 4.20 (Reasoning)625541341941.9%
GPT-5.4 Nano (Reasoning)544342383141.6%
Qwen 3.5 Flash575141391641.0%
Claude 3.7 Sonnet533939383440.7%
Z.AI GLM 4.7534640333040.2%
GPT-5.4 Nano (Reasoning, Low)494444352940.2%
Z.AI GLM 4.7 Flash59544641040.0%
Writer: Palmyra X563585421039.2%
Z.AI GLM 4.5545448201939.1%
Qwen 3.5 397B A17B565143252038.9%
Grok 4.20 (Beta)593837342638.9%
Grok 4.20514543411338.8%
Z.AI GLM 5.1565443281138.3%
Qwen3 235B A22B Instruct 2507504330292836.1%
Qwen 3.5 Plus (2026-02-15)694632241036.0%
GPT-5 Nano414136342735.7%
DeepSeek V4 Pro (Reasoning)464535351535.3%
Z.AI GLM 5 Turbo57503130835.3%
Gemini 3.5 Flash (Reasoning, Minimal)565146111034.8%
Grok 4.3 (Reasoning)514841231034.7%
Qwen 3.5 Plus (2026-04-20)644539131234.6%
Cydonia 24B V4.1614232181734.0%
Claude Haiku 4.5453532322433.6%
MiniMax M2.7463934341533.5%
DeepSeek V3 (2025-03-24)54433732033.3%
Rocinante 12B58573911033.0%
Qwen 3.5 27B454430242232.7%
GPT-5.4 Nano433632322132.6%
Z.AI GLM 4.5 Air6041409631.5%
Qwen 3.5 122B484327201530.6%
Grok 4.344413729030.2%
Z.AI GLM 5433527251829.4%
Xiaomi MIMO v2.5 Pro483432211129.4%
Gemini 3 Flash (Preview)453928201529.2%
MoonshotAI: Kimi K2.5622423211328.7%
Hermes 3 70B58332825028.7%
Claude Sonnet 4.5403530241328.1%
DeepSeek V4 Pro443728211028.1%
Qwen 3 32B4846377027.7%
DeepSeek V4 Flash46342922727.6%
Gemini 3 Pro (Preview)523131131127.4%
Llama 3.1 8B5443380027.0%
Mistral Small 4 (Reasoning)52372714226.4%
Gemini 3.5 Flash (Reasoning)42382522225.7%
Qwen 3.6 Flash50282622025.1%
Aion 2.0443417161324.8%
Mistral Medium 3.1353226181224.7%
WizardLM 2 8x22b5338310024.4%
Gemini 3.1 Pro (Preview)43402610023.8%
Mistral Large 3343318161322.9%
Ministral 3 14B4943211022.7%
Qwen3.7 Max33313017022.1%
DeepSeek V3.247242118022.0%
DeepSeek V4 Flash (Reasoning)33292016821.5%
Mistral Large41232114821.5%
Mistral Small 44440200020.9%
Hermes 3 405B39282414020.9%
DeepSeek V3 (2024-12-26)4139250020.9%
Xiaomi MIMO v2.5252220191520.1%
MoonshotAI: Kimi K2.64634180019.6%
GPT-4.1302019141319.5%
Mistral Large 241221915019.5%
Qwen3.6 Max Preview4230223019.2%
Stealth: Healer Alpha37291413219.0%
Inception Mercury94000018.9%
Llama 3.1 70B88000017.6%
Gemini 2.5 Pro33281413017.5%
Ministral 3B672000017.4%
Claude 3.5 Sonnet27241616116.7%
Skyfall 36B V233191714016.6%
Gemini 3.1 Flash Lite383740015.8%
Mistral Small Creative24231414015.0%
Qwen 3.6 27B3121185014.9%
Stealth: Hunter Alpha4415130014.4%
Arcee AI: Trinity Large (Preview)403120014.4%
Ministral 3 8B2826150013.9%
Gemma 4 26B3210101010.5%
Z.AI GLM 4.61914125010.1%
Cohere Command R+ (Aug. 2024)50000010.1%
Gemini 3.1 Flash Lite (Preview)232080010.0%
Gemini 3.1 Flash Lite (Reasoning)21148439.9%
Claude Sonnet 43753009.0%
DeepSeek-V2 Chat30140008.7%
Grok 4.1 Fast17129007.7%
Arcee AI: Trinity Mini3700007.3%
DeepSeek V3.118127007.3%
Gemini 3 Flash (Preview, Reasoning)13129207.2%
Ministral 8B2950006.8%
Gemma 4 31B1486205.9%
Mistral NeMO11100004.3%
Gemma 3 27B1370004.1%
Qwen 3.6 35B1080003.8%
LFM2 24B1700003.4%
Nemotron 3 Super1500003.1%
Ministral 3 3B1200002.4%
Grok 4 Fast1010002.2%
Gemini 2.5 Flash500001.1%
GPT-4o, May 13th (temp=1)400000.8%
Gemma 3 4B300000.7%
Mistral Small 3.2 24B300000.6%
Nemotron 3 Nano200000.4%
Gemma 4 31B (Reasoning)000000.0%
Gemma 4 26B (Reasoning)000000.0%
o4 Mini High000000.0%
Grok 4000000.0%
o4 Mini000000.0%
GPT-OSS 120B000000.0%
GPT-4o, May 13th (temp=0)000000.0%
Gemini 2.5 Flash (Reasoning)000000.0%
Stealth: Aurora Alpha000000.0%
Gemini 2.5 Flash Lite (Reasoning)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Inception Mercury 2000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
Gemini 2.5 Flash Lite000000.0%
GPT-4o Mini (temp=1)000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemma 3 12B000000.0%
Llama 3.1 Nemotron 70B000000.0%
Qwen 2.5 72B000000.0%
Claude 3 Haiku000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 2.0 Mini878480716176.5%
GPT-5.5 (Reasoning)757171707071.6%
GPT-5747370696570.1%
GPT-5.4767675665870.0%
GPT-5.5757368685968.9%
Claude Opus 4.7 (Reasoning)777170675568.1%
GPT-5.5 (Reasoning, Low)777164626066.5%
GPT-5.4 (Reasoning, Low)797666574364.3%
ByteDance Seed 2.0 Lite797158574461.8%
GPT-5.4 (Reasoning)717058555361.4%
GPT-5.4 Mini786661494760.2%
ByteDance Seed 1.6 Flash756358544859.5%
MiniMax M3756060524959.3%
Claude Opus 4.8 (Reasoning)827458453759.2%
GPT-5.4 Mini (Reasoning, Low)686159535158.5%
Claude Opus 4.7766651474356.7%
Rocinante 12B1007553361455.4%
Claude Opus 4.6706649464154.6%
Claude Opus 4.8 (Reasoning, Low)635755504353.6%
GPT-5.1735950493653.6%
GPT-5.4 Mini (Reasoning)775250493853.2%
Qwen 3.5 27B675452404050.8%
Grok 4.20 (Beta)625049483649.0%
GPT-5.2655150501947.0%
GPT-5 Mini626049352546.4%
MiniMax M2.7684949382846.4%
Qwen 3.5 9B824943431446.3%
MiniMax M2.5665754272746.3%
GPT-5.4 Nano (Reasoning)575247373645.7%
Claude Opus 4646037362945.4%
DeepSeek V4 Pro565042403945.3%
Qwen 3.5 397B A17B585344403145.3%
Claude Sonnet 4.5745247341845.2%
Claude Opus 4.6 (Reasoning)615245402745.0%
Claude Haiku 4.5694941353144.9%
DeepSeek V4 Flash695643292744.9%
Grok 4.3554945452443.7%
Qwen 3 32B625145352343.2%
Grok 4.20535137373242.2%
Qwen 3.5 Flash624943371541.3%
Z.AI GLM 5 Turbo735951121041.2%
Qwen 3.5 122B555242342341.2%
ByteDance Seed 1.6464540353540.2%
GPT-5.4 Nano (Reasoning, Low)665430272540.2%
Z.AI GLM 4.7555244351340.0%
Grok 4.20 (Reasoning)444242383339.7%
Arcee AI: Trinity Large (Preview)724632232239.0%
Claude Opus 4.5484440313138.8%
GPT-5 Nano494038353038.4%
Z.AI GLM 4.7 Flash515140321838.3%
Claude Sonnet 4.6603736322638.1%
DeepSeek V4 Pro (Reasoning)564137332237.9%
Claude Sonnet 4.6 (Reasoning)554734332037.9%
Z.AI GLM 5634038252137.3%
Z.AI GLM 5.1643533302336.8%
MoonshotAI: Kimi K2.5613831292336.2%
Qwen 3.5 35B444240322336.2%
Gemini 3.5 Flash (Reasoning)443634333135.6%
MoonshotAI: Kimi K2.6433936332735.5%
Gemini 3 Flash (Preview, Reasoning)403836352534.8%
Skyfall 36B V271403328034.4%
GPT-5.4 Nano464432252233.7%
Qwen 3.5 Plus (2026-02-15)523732311533.6%
Gemini 3 Pro (Preview)484432291433.3%
Hermes 3 405B60543913033.2%
DeepSeek V4 Flash (Reasoning)64522422433.0%
DeepSeek V3 (2025-03-24)6653397032.9%
Gemini 3 Flash (Preview)513829282032.9%
Qwen3 235B A22B Instruct 2507513930281131.8%
Qwen3.7 Max413131272430.9%
Grok 4.3 (Reasoning)60333129030.6%
Qwen 3.6 Flash68462312029.9%
Claude 3.7 Sonnet413935181429.6%
Xiaomi MIMO v2.5 Pro433725231528.4%
Qwen3.6 Max Preview41353422928.2%
Grok 4.20 (Beta, Reasoning)473718171527.0%
WizardLM 2 8x22b45412715025.8%
Writer: Palmyra X5383025191425.2%
Aion 2.034342920925.1%
DeepSeek V3 (2024-12-26)675200023.8%
Mistral Small Creative5029259423.5%
Gemini 3.1 Flash Lite (Reasoning)68191312423.3%
Gemini 3.5 Flash (Reasoning, Minimal)34322320422.9%
DeepSeek V3.236342716022.5%
Mistral Medium 3.132302316922.1%
GPT-4.1332619181422.0%
Claude Sonnet 44637260021.9%
Gemini 3.1 Pro (Preview)4640193021.7%
Stealth: Healer Alpha48261311520.6%
Cydonia 24B V4.132272018019.4%
Claude 3.5 Sonnet4035210019.3%
Llama 3.1 8B5121170018.1%
Gemini 3.1 Flash Lite2927176516.8%
Hermes 3 70B611900016.2%
Z.AI GLM 4.5 Air3926140015.8%
Mistral Large 2491990015.4%
DeepSeek-V2 Chat75000015.0%
Qwen 3.5 Plus (2026-04-20)2928107014.6%
Mistral Large421785014.6%
Xiaomi MIMO v2.53915124114.3%
Mistral Small 42822180013.5%
Z.AI GLM 4.53416151013.3%
Llama 3.1 70B62000012.4%
Qwen 3.6 27B3219100012.3%
Gemma 4 31B23151210012.0%
Mistral Small 4 (Reasoning)331864011.9%
Stealth: Hunter Alpha341291011.2%
Ministral 3 8B271990010.8%
Gemini 2.5 Pro272600010.6%
Qwen 3.6 35B291860010.5%
Grok 4.1 Fast2713100010.0%
Gemini 3.1 Flash Lite (Preview)24158009.3%
Gemma 3 27B28120008.1%
Gemma 3 12B2500005.1%
Mistral Small 3.2 24B1790005.1%
Ministral 3 3B1860004.7%
DeepSeek V3.11760004.6%
Ministral 3 14B13100004.5%
Z.AI GLM 4.62200004.5%
GPT-OSS 120B1600003.1%
Stealth: Aurora Alpha900001.8%
Gemma 4 26B620001.7%
Nemotron 3 Nano800001.5%
Mistral Large 3510001.2%
Cohere Command R+ (Aug. 2024)500001.0%
Gemini 2.5 Flash500001.0%
Gemma 4 31B (Reasoning)220000.8%
LFM2 24B300000.7%
Gemma 3 4B200000.4%
Gemini 2.5 Flash (Reasoning)100000.1%
Gemma 4 26B (Reasoning)000000.0%
o4 Mini High000000.0%
Grok 4000000.0%
o4 Mini000000.0%
Grok 4 Fast000000.0%
GPT-4o, May 13th (temp=0)000000.0%
Gemini 2.5 Flash Lite (Reasoning)000000.0%
GPT-4o, May 13th (temp=1)000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Inception Mercury 2000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
Gemini 2.5 Flash Lite000000.0%
Inception Mercury000000.0%
GPT-4o Mini (temp=1)000000.0%
Nemotron 3 Super000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 Nemotron 70B000000.0%
Qwen 2.5 72B000000.0%
Claude 3 Haiku000000.0%
GPT-4.1 Nano000000.0%
Arcee AI: Trinity Mini000000.0%
Mistral NeMO000000.0%
Ministral 8B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.71009695918593.4%
Claude Opus 4.7 (Reasoning)929291858188.3%
GPT-5928887847986.2%
ByteDance Seed 2.0 Lite929089856584.2%
ByteDance Seed 1.6 Flash949189767284.2%
GPT-5.4918685817483.6%
Claude Opus 4.8 (Reasoning, Low)969581756883.1%
GPT-5.5 (Reasoning, Low)938382817482.4%
Claude Opus 4.8 (Reasoning)918684777281.8%
Claude Sonnet 4.6 (Reasoning)979179726781.1%
GPT-5.5 (Reasoning)838282817881.0%
MiniMax M3938978746980.7%
GPT-5.4 (Reasoning)838282807480.4%
GPT-5.4 (Reasoning, Low)888478776979.3%
GPT-5.4 Mini (Reasoning)828279777278.3%
GPT-5.5818180777378.2%
Qwen 3.5 27B837978777077.4%
GPT-5.1828282726977.3%
Claude Haiku 4.5878077746676.9%
GPT-5.4 Mini (Reasoning, Low)868577716576.8%
GPT-5.4 Mini898377696376.3%
Grok 4.3 (Reasoning)848282695874.9%
MiniMax M2.7937978636174.9%
GPT-5.2827878696674.5%
Claude Opus 4.6 (Reasoning)818075736374.1%
GPT-5 Mini887671686072.9%
Claude Sonnet 4.6857773715772.4%
Claude Opus 4.5857767666271.6%
Claude Sonnet 4.5897969595670.5%
MoonshotAI: Kimi K2.6797271646369.8%
Z.AI GLM 5 Turbo787575645769.7%
Claude Opus 4.6787772616169.7%
Hermes 3 405B937768545268.7%
Xiaomi MIMO v2.5 Pro767269675968.6%
Claude Opus 4817968635168.5%
Z.AI GLM 4.7817373645168.4%
ByteDance Seed 2.0 Mini746965656467.4%
GPT-5.4 Nano (Reasoning, Low)787369585867.3%
GPT-5.4 Nano716867646266.5%
Qwen 3.5 35B848071653266.4%
Qwen 3.5 397B A17B807169585366.4%
Z.AI GLM 5727067635866.1%
Grok 4.20 (Beta, Reasoning)866463615565.9%
Gemini 3 Pro (Preview)756967595665.3%
GPT-5.4 Nano (Reasoning)736763626064.9%
Qwen 3.5 122B848072444464.8%
DeepSeek V4 Pro806867565464.7%
DeepSeek V4 Flash (Reasoning)787766584063.9%
Grok 4.3726562605763.3%
Z.AI GLM 5.1716861585763.0%
MiniMax M2.5806969672862.6%
Qwen 3.5 Flash756362585462.4%
Z.AI GLM 4.7 Flash797065544061.6%
Grok 4.20 (Reasoning)726761584560.9%
DeepSeek V4 Pro (Reasoning)777459563860.7%
Writer: Palmyra X5747261583960.7%
ByteDance Seed 1.6776966474460.7%
Qwen 3.5 9B767359544060.4%
Stealth: Healer Alpha787849444358.3%
Z.AI GLM 4.5 Air976452423557.7%
DeepSeek V3.2676362564057.6%
MoonshotAI: Kimi K2.5686058564557.2%
Claude 3.7 Sonnet767452414056.6%
Gemma 4 31B (Reasoning)676752524556.5%
Gemini 3 Flash (Preview)676357573656.0%
Skyfall 36B V2756561423755.9%
Qwen3 235B A22B Instruct 2507746153464455.7%
Z.AI GLM 4.5705555514655.6%
Qwen3.7 Max696860463355.3%
Aion 2.0736362383854.8%
Gemini 3.5 Flash (Reasoning, Minimal)666159503654.4%
Gemini 3.5 Flash (Reasoning)795750473854.2%
Mistral Small 4 (Reasoning)676261423854.0%
DeepSeek V4 Flash85846240054.0%
Qwen 3.5 Plus (2026-02-15)695754513753.7%
Mistral Small 4615856504353.5%
GPT-4.1796553393053.2%
Claude Sonnet 4676553513153.2%
Grok 4.20 (Beta)675852513853.2%
WizardLM 2 8x22b665753474253.1%
DeepSeek V3 (2025-03-24)716655541953.0%
Mistral Large 3665858413852.0%
Mistral Large 2696853432050.7%
Claude 3.5 Sonnet706460471250.5%
Xiaomi MIMO v2.5815349482150.4%
Qwen 3.6 35B605856403650.0%
Gemini 3 Flash (Preview, Reasoning)545050484549.7%
Cydonia 24B V4.1575750473549.4%
Ministral 8B775150363349.3%
Stealth: Hunter Alpha555248484349.1%
Qwen3.6 Max Preview696149343048.7%
Qwen 3.6 27B674845414048.3%
Qwen 3.5 Plus (2026-04-20)624745454348.2%
Qwen 3 32B716363222047.5%
Gemini 3.1 Pro (Preview)605344413546.6%
Grok 4.20585349462646.4%
Rocinante 12B100563934246.3%
GPT-5 Nano504845454446.3%
Mistral Medium 3.1695943313046.1%
Ministral 3 8B59585454045.1%
DeepSeek V3.1565248352743.7%
Llama 3.1 8B565650302343.1%
Mistral Large584342373442.8%
Gemini 3.1 Flash Lite544844363242.7%
Qwen 3.6 Flash714640342242.7%
Gemini 2.5 Pro644845321741.3%
DeepSeek V3 (2024-12-26)786036201040.9%
Hermes 3 70B85443736040.5%
Gemma 3 27B453938373639.0%
Gemma 4 31B544537322638.9%
Gemma 4 26B574640262338.4%
Ministral 3 14B593834342537.9%
Arcee AI: Trinity Large (Preview)673431302637.7%
Grok 4544238282136.8%
Z.AI GLM 4.6723030301835.8%
o4 Mini453433332233.4%
Gemini 3.1 Flash Lite (Reasoning)664626171133.2%
Mistral Small Creative50504021132.2%
Inception Mercury8348280031.9%
Grok 4.1 Fast54413920031.0%
Gemini 3.1 Flash Lite (Preview)533929261031.0%
Nemotron 3 Super48452121027.3%
Grok 4 Fast333224211424.8%
Arcee AI: Trinity Mini5945200024.8%
DeepSeek-V2 Chat34322520022.4%
Gemma 4 26B (Reasoning)4436225422.3%
Cohere Command R+ (Aug. 2024)594400020.7%
Gemma 3 12B37222118320.3%
Gemini 2.5 Flash (Reasoning)473890018.9%
Llama 3.1 70B712000018.3%
GPT-4.1 Mini5221161018.1%
Gemini 2.5 Flash Lite29261812016.9%
Ministral 3B523100016.6%
Gemini 2.5 Flash443250016.4%
Mistral NeMO551193015.7%
LFM2 24B4817120015.6%
GPT-4o, May 13th (temp=1)3217158014.5%
GPT-OSS 120B312875114.4%
Nemotron 3 Nano323240013.5%
Gemini 2.5 Flash Lite (Reasoning)3915120013.2%
Mistral Small 3.2 24B62000012.5%
Claude 3 Haiku52500011.4%
o4 Mini High381080011.1%
Ministral 3 3B3890009.5%
Gemma 3 4B3800007.6%
Qwen 2.5 72B1400002.9%
Inception Mercury 21300002.7%
GPT-4o, Aug. 6th (temp=0)1100002.2%
GPT-4o, Aug. 6th (temp=1)740002.2%
GPT-4.1 Nano810002.0%
GPT-4o Mini (temp=1)720001.8%
GPT-4o, May 13th (temp=0)000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 Nemotron 70B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.7 (Reasoning)959592878089.8%
Claude Opus 4.7959290867287.2%
GPT-5.4 Mini (Reasoning)888786868085.4%
Claude Sonnet 4.6959086866784.9%
GPT-5.4 (Reasoning)978879797683.8%
GPT-5.5 (Reasoning, Low)878381797981.8%
GPT-5.4938881737381.8%
GPT-5.4 (Reasoning, Low)828181807880.5%
GPT-5.5 (Reasoning)858481777680.4%
GPT-5.5938476706878.4%
ByteDance Seed 2.0 Lite848475757378.2%
Claude Opus 4.8 (Reasoning, Low)868177757278.2%
GPT-5838276727277.0%
Claude Sonnet 4.6 (Reasoning)967979745676.8%
GPT-5.4 Mini (Reasoning, Low)838375756876.8%
GPT-5.4 Mini848179786076.3%
DeepSeek V4 Flash (Reasoning)848272716875.6%
GPT-5.4 Nano (Reasoning)857676696674.5%
Claude Opus 4.6 (Reasoning)837776686674.1%
ByteDance Seed 1.6 Flash928679525071.6%
Claude Haiku 4.5807773695570.8%
GPT-5.2787471656470.5%
Claude Opus 4.6827973625670.4%
GPT-5.4 Nano (Reasoning, Low)767271686570.2%
Claude Opus 4.8 (Reasoning)868265635470.0%
Qwen 3.5 9B857771605770.0%
GPT-5 Mini727170706469.4%
DeepSeek V4 Pro (Reasoning)807966635468.3%
GPT-5.1757467636268.2%
GPT-5.4 Nano787670605467.6%
Claude Opus 4727171645867.2%
DeepSeek V4 Pro717069636267.1%
MiniMax M2.5877668535267.1%
Z.AI GLM 5717067655966.4%
Skyfall 36B V2898472483565.7%
Claude Sonnet 4.5787170654465.4%
Claude 3.7 Sonnet817965525065.2%
Qwen 3.5 35B826564615364.9%
Claude Opus 4.5858459504464.5%
Mistral Large 2696464615762.9%
Z.AI GLM 5.1787773463862.6%
Z.AI GLM 5 Turbo81817169761.8%
Grok 4.20 (Beta)726461595161.2%
ByteDance Seed 1.6767167474260.8%
WizardLM 2 8x22b736556555059.8%
Grok 4.20736654535259.7%
ByteDance Seed 2.0 Mini756562544059.1%
MiniMax M2.7856665443559.1%
Hermes 3 405B926060542958.9%
Writer: Palmyra X5726363544258.8%
Claude Sonnet 4727058474758.4%
MiniMax M3766564503758.3%
Stealth: Hunter Alpha807574332958.2%
Mistral Large716858563557.5%
DeepSeek V4 Flash766659533457.5%
Rocinante 12B736557523756.6%
Mistral Small 4 (Reasoning)696256514656.5%
Cydonia 24B V4.1766757532555.6%
Qwen 3.5 Flash665853524454.5%
Z.AI GLM 4.5 Air836559471754.5%
Xiaomi MIMO v2.5 Pro695655504054.0%
Stealth: Healer Alpha646354454454.0%
Z.AI GLM 4.7645754474753.7%
Qwen 3.5 122B665555494453.7%
Gemini 3.5 Flash (Reasoning, Minimal)666349484153.5%
Qwen3 235B A22B Instruct 2507796361432053.3%
Grok 4.20 (Reasoning)696053503453.2%
Xiaomi MIMO v2.5696554453052.5%
Arcee AI: Trinity Large (Preview)875447413252.3%
Z.AI GLM 4.5825349492952.3%
Grok 4.20 (Beta, Reasoning)725350473952.1%
MoonshotAI: Kimi K2.5605950433749.8%
Gemini 3 Pro (Preview)635346444049.4%
Grok 4.3 (Reasoning)705949392848.9%
Cohere Command R+ (Aug. 2024)625854402948.6%
Z.AI GLM 4.6755049402948.4%
Gemini 3.5 Flash (Reasoning)625745403748.1%
Grok 4.3656450313048.0%
Qwen 3.5 Plus (2026-04-20)685651432047.7%
GPT-5 Nano535244444347.2%
Mistral Medium 3.1575150443247.1%
Z.AI GLM 4.7 Flash555148433946.9%
MoonshotAI: Kimi K2.6625544383446.6%
Mistral Large 3646449292646.4%
DeepSeek V3.1565147443346.2%
Gemini 3 Flash (Preview)654944422945.7%
Mistral Small Creative524444413944.2%
Qwen 3.5 27B784935302843.7%
Gemma 3 27B604646352843.0%
Ministral 3 14B836527271242.6%
Qwen 3.5 397B A17B604542372742.4%
Gemini 3 Flash (Preview, Reasoning)535143422342.3%
Mistral Small 3.2 24B9777360041.9%
Mistral Small 4603938372640.2%
DeepSeek V3.2675041231839.8%
Qwen 3 32B454237363438.9%
Aion 2.0545341242038.6%
Llama 3.1 8B65553932038.1%
Qwen3.7 Max545140301437.7%
GPT-4.1524038372137.7%
Gemini 3.1 Pro (Preview)563836262235.8%
Qwen 3.5 Plus (2026-02-15)463933292634.5%
Gemini 2.5 Pro443836351734.2%
Gemini 3.1 Flash Lite (Preview)463939341033.5%
Qwen 3.6 Flash464237212133.2%
Gemma 3 12B504329232133.1%
Hermes 3 70B8940340032.4%
Gemini 2.5 Flash453030292832.3%
Qwen 3.6 35B66423311030.4%
Grok 4523131171729.6%
Ministral 3B79371811029.1%
Gemini 3.1 Flash Lite6253199028.6%
GPT-4o, May 13th (temp=1)49452416828.3%
Qwen 3.6 27B43423814027.3%
DeepSeek-V2 Chat473027181327.2%
Ministral 8B42413016426.6%
Gemma 4 31B (Reasoning)38333030026.1%
Ministral 3 8B6933280025.8%
Gemma 4 26B (Reasoning)46262525725.6%
Gemma 4 31B45322622225.4%
Claude 3 Haiku5531276124.0%
Qwen3.6 Max Preview342926201124.0%
o4 Mini High362523171723.5%
DeepSeek V3 (2024-12-26)302926171523.5%
o4 Mini40332019022.6%
Inception Mercury733800022.2%
Grok 4.1 Fast34262222021.0%
DeepSeek V3 (2025-03-24)49281413020.9%
Gemini 3.1 Flash Lite (Reasoning)4124239019.6%
Ministral 3 3B33242318019.6%
Gemma 4 26B3827245419.3%
Nemotron 3 Super5521174019.3%
Arcee AI: Trinity Mini4726230019.3%
Gemini 2.5 Flash Lite513700017.6%
Gemini 2.5 Flash (Reasoning)3623148116.3%
GPT-4.1 Mini4226120016.0%
Gemini 2.5 Flash Lite (Reasoning)4912100014.1%
Grok 4 Fast2323187014.1%
GPT-4.1 Nano3717150014.0%
Mistral NeMO3319170013.8%
Claude 3.5 Sonnet3617141013.5%
GPT-4o, May 13th (temp=0)262475012.5%
Gemma 3 4B21151010211.6%
Llama 3.1 70B32150009.3%
Nemotron 3 Nano17167509.1%
GPT-4o Mini (temp=0)26190008.9%
Inception Mercury 23522007.8%
Qwen 2.5 72B3020006.4%
LFM2 24B2341005.6%
GPT-OSS 120B1790005.2%
GPT-4o, Aug. 6th (temp=1)1743004.9%
GPT-4o Mini (temp=1)1500003.0%
Stealth: Aurora Alpha752002.9%
GPT-4o, Aug. 6th (temp=0)1020002.5%
Llama 3.1 Nemotron 70B1020002.3%