Cliché density

Test: Bad Writing Habits

Avg. Score
92.0%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 3.1 Flash Lite (Preview)98.9%$0.00308.4s84%
2Gemini 3 Flash (Preview)98.9%$0.007819.6s88%
3Claude 3.5 Haiku98.1%$0.003510.8s85%
4Z.AI GLM 5 Turbo99.3%$0.008133.2s90%
5ByteDance Seed 1.6 Flash98.5%$0.001327.3s86%
6Gemini 3 Flash (Preview, Reasoning)99.3%$0.01230.1s90%
7Claude Sonnet 4.5100.0%$0.03538.1s100%
8Ministral 3 14B97.4%$0.000711.7s82%
9Claude Haiku 4.598.9%$0.01121.6s84%
10Mistral Small 497.0%$0.001418.2s81%
11Mistral Large 298.5%$0.01329.4s86%
12DeepSeek V3 (2025-03-24)98.5%$0.001439.4s83%
13GPT-5 Mini99.3%$0.010057.4s90%
14Writer: Palmyra X597.8%$0.01122.0s83%
15Claude Sonnet 4.699.6%$0.03139.3s93%
16Qwen 3 32B98.5%$0.001554.6s86%
17GPT-4.1 Mini96.3%$0.002719.0s79%
18GPT-4o, Aug. 6th (temp=1)97.8%$0.01824.4s83%
19Arcee AI: Trinity Mini95.6%$0.00039.2s75%
20LFM2 24B96.3%$0.000228.4s79%
21Grok 4.20 (Beta)97.0%$0.01815.8s81%
22Stealth: Healer Alpha96.3%$0.000023.7s77%
23Claude Sonnet 499.3%$0.03243.7s90%
24Qwen3 235B A22B Instruct 250797.8%$0.001159.2s83%
25Claude 3.5 Sonnet99.6%$0.04835.5s93%
26Mistral Large 396.7%$0.003330.3s78%
27MiniMax M2.598.5%$0.00341.3m86%
28GPT-5.4 Mini96.3%$0.01516.8s79%
29MiniMax M2.798.1%$0.00401.1m85%
30Z.AI GLM 598.5%$0.00841.2m86%
31GPT-4.197.8%$0.01844.7s83%
32Grok 4.1 Fast95.9%$0.001837.8s76%
33GPT-5.4 Nano (Reasoning)94.8%$0.006124.5s76%
34Mistral Small 4 (Reasoning)96.3%$0.002230.2s73%
35GPT-5.4 Mini (Reasoning)96.7%$0.02228.1s80%
36Qwen 3.5 9B97.8%$0.00111.4m83%
37Qwen 3.5 Flash96.3%$0.002547.5s77%
38Grok 4.20 (Beta, Reasoning)98.1%$0.03934.0s85%
39Grok 4 Fast94.8%$0.001724.1s72%
40Mistral Medium 3.197.0%$0.004836.5s73%
41Claude Sonnet 4.6 (Reasoning)100.0%$0.0601.2m100%
42Mistral Large95.9%$0.01430.9s76%
43Gemini 2.5 Flash Lite (Reasoning)94.4%$0.002830.8s73%
44Qwen 3.5 Plus (2026-02-15)94.1%$0.006031.5s73%
45GPT-5.4 Mini (Reasoning, Low)94.4%$0.01516.8s71%
46Aion 2.096.7%$0.00641.3m80%
47Z.AI GLM 4.594.1%$0.005142.1s75%
48Rocinante 12B95.6%$0.001438.4s68%
49Z.AI GLM 4.7 Flash96.7%$0.00171.2m76%
50Gemma 3 4B93.3%$0.000220.0s65%
51o4 Mini94.1%$0.01525.7s69%
52Qwen 3.5 35B96.7%$0.0181.0m76%
53Mistral Small Creative92.2%$0.00079.1s61%
54Z.AI GLM 4.694.8%$0.006551.5s70%
55Z.AI GLM 4.796.7%$0.0101.4m78%
56DeepSeek V3.196.7%$0.00201.8m80%
57Gemini 2.5 Pro96.3%$0.03636.2s75%
58Gemma 3 27B93.7%$0.000652.6s69%
59DeepSeek V3.296.7%$0.00141.9m80%
60ByteDance Seed 2.0 Lite98.5%$0.0122.2m86%
61Stealth: Hunter Alpha94.4%$0.000055.0s65%
62GPT-5 Nano95.6%$0.00421.4m73%
63Ministral 8B91.1%$0.000410.4s59%
64GPT-5.498.5%$0.0491.4m86%
65o4 Mini High95.2%$0.02547.2s71%
66GPT-5.4 Nano (Reasoning, Low)90.4%$0.005520.6s64%
67Gemini 2.5 Flash Lite91.1%$0.00099.5s57%
68Ministral 3B90.7%$0.00018.1s57%
69Gemini 3 Pro (Preview)97.0%$0.05554.4s81%
70Qwen 3.5 122B95.2%$0.0251.1m75%
71Claude Opus 4.699.3%$0.0781.2m90%
72Hermes 3 405B92.6%$0.003253.2s64%
73GPT-5.4 (Reasoning, Low)98.1%$0.0551.4m85%
74Claude Opus 4.597.8%$0.07053.4s83%
75Claude Opus 4.6 (Reasoning)99.6%$0.0881.4m93%
76Gemini 2.5 Flash (Reasoning)90.0%$0.01121.5s60%
77Gemma 3 12B90.7%$0.000441.3s58%
78Nemotron 3 Super92.2%$0.00001.4m67%
79GPT-4o Mini (temp=1)88.9%$0.001234.8s59%
80MoonshotAI: Kimi K2.599.3%$0.0193.2m90%
81GPT-5.4 Nano87.0%$0.005726.3s61%
82Ministral 3 8B90.7%$0.000819.6s49%
83Grok 496.3%$0.0481.7m79%
84GPT-4.1 Nano86.3%$0.000713.3s51%
85WizardLM 2 8x22b93.3%$0.00261.8m65%
86GPT-5.196.7%$0.0541.8m80%
87ByteDance Seed 1.695.9%$0.0132.5m74%
88Qwen 3.5 27B93.0%$0.0201.6m68%
89Claude 3.7 Sonnet91.5%$0.04246.7s63%
90GPT-599.3%$0.0652.8m90%
91Gemini 3.1 Pro (Preview)99.3%$0.1071.8m90%
92Gemini 2.5 Flash84.4%$0.005210.6s46%
93GPT-4o, May 13th (temp=1)87.0%$0.03314.4s54%
94Arcee AI: Trinity Large (Preview)86.7%$0.000043.6s48%
95Ministral 3 3B85.6%$0.000511.1s40%
96DeepSeek-V2 Chat86.3%$0.002153.3s48%
97Llama 3.1 Nemotron 70B84.4%$0.003831.7s46%
98Cohere Command R+ (Aug. 2024)86.7%$0.02052.5s55%
99Hermes 3 70B87.0%$0.00101.2m49%
100DeepSeek V3 (2024-12-26)86.3%$0.002154.6s46%
101GPT-5.4 (Reasoning)98.9%$0.0892.6m88%
102Llama 3.1 8B85.9%$0.00031.3m48%
103Qwen 3.5 397B A17B93.0%$0.0143.0m65%
104Mistral NeMO78.1%$0.000510.1s36%
105Nemotron 3 Nano80.4%$0.00101.1m44%
106GPT-5.288.5%$0.0561.5m59%
107ByteDance Seed 2.0 Mini96.3%$0.00454.9m71%
108Claude Opus 499.3%$0.2091.4m90%
109Llama 3.1 70B70.4%$0.001529.4s24%
110Stealth: Aurora Alpha66.3%$0.00009.8s23%
111Inception Mercury 264.1%$0.00327.0s24%
112Claude 3 Haiku64.4%$0.002514.9s21%
113GPT-4o, Aug. 6th (temp=0)68.1%$0.02322.7s24%
114GPT-4o Mini (temp=0)63.7%$0.001234.8s18%
115Qwen 2.5 72B61.1%$0.001036.7s16%
116Inception Mercury63.0%$0.01117.6s11%
117GPT-4o, May 13th (temp=0)53.0%$0.03514.1s18%
118Mistral Small 3.2 24B52.4%$0.00685.6m9%
92.02%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
o4 Mini100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Stealth: Hunter Alpha100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
Nemotron 3 Super100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral Large 2100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Small 4100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Ministral 8B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Gemini 3.1 Pro (Preview)1001001001006793.3%
GPT-5.4 (Reasoning, Low)1001001001006793.3%
MoonshotAI: Kimi K2.51001001001006793.3%
o4 Mini High1001001001006793.3%
Grok 4.1 Fast1001001001006793.3%
Aion 2.01001001001006793.3%
Grok 41001001001006793.3%
Qwen 3.5 Flash1001001001006793.3%
Grok 4 Fast1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
GPT-4o, Aug. 6th (temp=1)1001001001006793.3%
DeepSeek V3.11001001001006793.3%
Gemini 2.5 Flash1001001001006793.3%
Mistral Large1001001001006793.3%
GPT-5.4 Nano (Reasoning, Low)1001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Nemotron 3 Nano1001001001006793.3%
Qwen 2.5 72B1001001001006793.3%
ByteDance Seed 1.6 Flash1001001001006793.3%
Ministral 3 14B1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
LFM2 24B1001001001006793.3%
Qwen 3.5 9B100100100676786.7%
Mistral Large 3100100100676786.7%
Grok 4.20 (Beta)100100100676786.7%
DeepSeek V3 (2024-12-26)1001001001003386.7%
Hermes 3 405B1001001001003386.7%
Mistral Small 4 (Reasoning)100100100676786.7%
Llama 3.1 70B1001001001003386.7%
GPT-5.4 Nano100100100676786.7%
Arcee AI: Trinity Large (Preview)1001001001003386.7%
Cohere Command R+ (Aug. 2024)1001001001003386.7%
Qwen 3.5 27B1001001001003386.7%
ByteDance Seed 2.0 Mini100100100100080.0%
GPT-4o, Aug. 6th (temp=0)100100100673380.0%
GPT-4o Mini (temp=0)10010010067073.3%
Claude 3 Haiku1006767676773.3%
GPT-4o, May 13th (temp=1)1001006767066.7%
Inception Mercury 21006767333360.0%
GPT-4o, May 13th (temp=0)1006767333360.0%
Mistral NeMO100676767060.0%
Inception Mercury10067330040.0%
Stealth: Aurora Alpha67333333033.3%
Mistral Small 3.2 24B3300006.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Aion 2.0100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
o4 Mini100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Stealth: Hunter Alpha100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Small 4100100100100100100.0%
GPT-5.4 Nano100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
LFM2 24B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)1001001001006793.3%
Z.AI GLM 4.61001001001006793.3%
MiniMax M2.71001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
Nemotron 3 Super1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
GPT-5.4 Mini1001001001006793.3%
Mistral Large 21001001001006793.3%
Qwen 3 32B1001001001006793.3%
Gemini 2.5 Flash1001001001006793.3%
GPT-5.4 Nano (Reasoning, Low)1001001001006793.3%
Llama 3.1 70B1001001001006793.3%
Nemotron 3 Nano1001001001006793.3%
Hermes 3 70B1001001001006793.3%
WizardLM 2 8x22b1001001001006793.3%
Qwen 3.5 Flash1001001001003386.7%
Qwen 3.5 Plus (2026-02-15)100100100676786.7%
GPT-4o, May 13th (temp=1)1001001001003386.7%
GPT-4o Mini (temp=0)100100100676786.7%
Llama 3.1 8B100100100676786.7%
Rocinante 12B1001001001003386.7%
GPT-4o, May 13th (temp=0)100100100673380.0%
Inception Mercury 2100100100673380.0%
Stealth: Aurora Alpha100100100673380.0%
Inception Mercury10010067673373.3%
Mistral NeMO100100100333373.3%
GPT-4o, Aug. 6th (temp=0)10010010033066.7%
Llama 3.1 Nemotron 70B10010010033066.7%
GPT-4.1 Nano1001006767066.7%
Claude 3 Haiku1001006767066.7%
Qwen 2.5 72B1006767333360.0%
Mistral Small 3.2 24B100100330046.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.1100100100100100100.0%
GPT-5100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Stealth: Hunter Alpha100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Mistral Large100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Mistral NeMO100100100100100100.0%
Z.AI GLM 5 Turbo1001001001006793.3%
Claude Opus 4.61001001001006793.3%
Qwen 3.5 122B1001001001006793.3%
Qwen 3.5 27B1001001001006793.3%
Aion 2.01001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
GPT-4.11001001001006793.3%
o4 Mini1001001001006793.3%
Qwen 3.5 Plus (2026-02-15)1001001001006793.3%
Z.AI GLM 4.7 Flash1001001001006793.3%
Nemotron 3 Super1001001001006793.3%
Grok 4.20 (Beta)1001001001006793.3%
GPT-4o, May 13th (temp=1)1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
GPT-5 Nano1001001001006793.3%
DeepSeek V3.21001001001006793.3%
Qwen 3 32B1001001001006793.3%
Gemini 2.5 Flash Lite1001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Mistral Small 41001001001006793.3%
GPT-5.4 Nano1001001001006793.3%
Ministral 3 14B1001001001006793.3%
Llama 3.1 8B1001001001006793.3%
LFM2 24B1001001001006793.3%
GPT-5.2100100100676786.7%
Gemini 2.5 Flash (Reasoning)100100100676786.7%
Qwen 3.5 Flash100100100676786.7%
Z.AI GLM 4.5100100100676786.7%
DeepSeek-V2 Chat100100100676786.7%
GPT-5.4 Nano (Reasoning)100100100676786.7%
GPT-5.4 Nano (Reasoning, Low)1001001001003386.7%
Gemma 3 27B100100100676786.7%
GPT-4.1 Nano100100100676786.7%
Cohere Command R+ (Aug. 2024)1001001001003386.7%
Ministral 8B100100100676786.7%
Llama 3.1 Nemotron 70B1001001001003386.7%
Arcee AI: Trinity Mini10010067676780.0%
Z.AI GLM 4.610010067676780.0%
GPT-4o, May 13th (temp=0)100100100673380.0%
DeepSeek V3 (2024-12-26)100100100673380.0%
Mistral Small Creative100100100100080.0%
Hermes 3 70B100100100100080.0%
Ministral 3 8B100100100100080.0%
Rocinante 12B100100100100080.0%
Gemini 2.5 Flash Lite (Reasoning)1006767676773.3%
Arcee AI: Trinity Large (Preview)100100100333373.3%
Ministral 3B100100100333373.3%
Qwen 3.5 397B A17B1001006767066.7%
o4 Mini High10010067333366.7%
GPT-4o, Aug. 6th (temp=0)10010010033066.7%
Nemotron 3 Nano1001006767066.7%
Gemma 3 4B1001006767066.7%
Stealth: Aurora Alpha676767673360.0%
Mistral Small 3.2 24B100100670053.3%
Qwen 2.5 72B100676733053.3%
Ministral 3 3B1001003333053.3%
Llama 3.1 70B100333333040.0%
Inception Mercury 267333333033.3%
Inception Mercury333300013.3%
GPT-4o Mini (temp=0)333300013.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Aion 2.0100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Stealth: Hunter Alpha100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Mistral Large100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Nemotron 3 Nano100100100100100100.0%
Mistral Small 4100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Mistral NeMO100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
ByteDance Seed 1.61001001001006793.3%
GPT-5.21001001001006793.3%
Z.AI GLM 4.61001001001006793.3%
MiniMax M2.51001001001006793.3%
GPT-4.11001001001006793.3%
Gemini 2.5 Pro1001001001006793.3%
o4 Mini1001001001006793.3%
Qwen 3.5 35B1001001001006793.3%
ByteDance Seed 2.0 Mini1001001001006793.3%
Gemini 2.5 Flash (Reasoning)1001001001006793.3%
Z.AI GLM 4.51001001001006793.3%
Stealth: Healer Alpha1001001001006793.3%
GPT-5.4 Mini (Reasoning, Low)1001001001006793.3%
Claude Haiku 4.51001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
GPT-5.4 Mini1001001001006793.3%
Mistral Large 21001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Llama 3.1 Nemotron 70B1001001001006793.3%
Hermes 3 70B1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
Arcee AI: Trinity Mini1001001001006793.3%
LFM2 24B1001001001006793.3%
Grok 4100100100676786.7%
Nemotron 3 Super100100100676786.7%
Grok 4.20 (Beta)100100100676786.7%
Stealth: Aurora Alpha100100100676786.7%
GPT-5 Nano100100100676786.7%
GPT-5.4 Nano100100100676786.7%
ByteDance Seed 1.6 Flash100100100676786.7%
Qwen 3.5 397B A17B1001001001003386.7%
DeepSeek V3 (2024-12-26)1001001001003386.7%
Grok 4.1 Fast10010067676780.0%
o4 Mini High100100100673380.0%
GPT-4o, May 13th (temp=1)100100100673380.0%
Claude 3.7 Sonnet10010067676780.0%
Grok 4 Fast10010067673373.3%
GPT-4o, Aug. 6th (temp=0)1006767676773.3%
GPT-5.4 Nano (Reasoning, Low)10010067673373.3%
Cohere Command R+ (Aug. 2024)10010067673373.3%
Inception Mercury 21001006767066.7%
Llama 3.1 70B1006767673366.7%
Llama 3.1 8B10010010033066.7%
GPT-4o Mini (temp=0)67673333040.0%
Qwen 2.5 72B100333333040.0%
Claude 3 Haiku10067330040.0%
GPT-4o, May 13th (temp=0)6767330033.3%
Inception Mercury676700026.7%
Mistral Small 3.2 24B1003300026.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5.1100100100100100100.0%
GPT-5100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Stealth: Hunter Alpha100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
Nemotron 3 Super100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Rocinante 12B100100100100100100.0%
GPT-5 Mini1001001001006793.3%
Claude Opus 4.61001001001006793.3%
Qwen 3.5 122B1001001001006793.3%
ByteDance Seed 1.61001001001006793.3%
GPT-5.4 Mini (Reasoning)1001001001006793.3%
o4 Mini High1001001001006793.3%
Aion 2.01001001001006793.3%
MiniMax M2.51001001001006793.3%
GPT-4.11001001001006793.3%
Claude Opus 41001001001006793.3%
Stealth: Healer Alpha1001001001006793.3%
GPT-5.41001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
Hermes 3 405B1001001001006793.3%
GPT-4o, Aug. 6th (temp=1)1001001001006793.3%
GPT-5 Nano1001001001006793.3%
Mistral Large 21001001001006793.3%
DeepSeek V3.21001001001006793.3%
GPT-5.4 Nano (Reasoning)1001001001006793.3%
Mistral Medium 3.11001001001006793.3%
Mistral Small 41001001001006793.3%
Ministral 3 14B1001001001006793.3%
Cohere Command R+ (Aug. 2024)1001001001006793.3%
Ministral 3B1001001001006793.3%
Qwen 3.5 27B100100100676786.7%
MiniMax M2.7100100100676786.7%
Qwen 3.5 35B1001001001003386.7%
Qwen 3.5 Flash100100100676786.7%
Z.AI GLM 4.5100100100676786.7%
Qwen 3.5 Plus (2026-02-15)100100100676786.7%
GPT-4o, May 13th (temp=1)100100100676786.7%
Mistral Large100100100676786.7%
GPT-4o Mini (temp=1)100100100676786.7%
GPT-5.4 Nano100100100676786.7%
Gemma 3 4B100100100676786.7%
Mistral NeMO1001001001003386.7%
Llama 3.1 8B1001001001003386.7%
LFM2 24B100100100676786.7%
Qwen 3.5 397B A17B1001001001003386.7%
o4 Mini100100100673380.0%
Gemini 2.5 Flash Lite (Reasoning)100100100673380.0%
Mistral Large 3100100100673380.0%
DeepSeek V3 (2024-12-26)100100100673380.0%
Gemini 2.5 Flash100100100673380.0%
Hermes 3 70B100100100673380.0%
GPT-4o, Aug. 6th (temp=0)10010067673373.3%
GPT-5.210010067673373.3%
DeepSeek-V2 Chat10010067673373.3%
Z.AI GLM 4.7 Flash100100100333373.3%
Inception Mercury10010010067073.3%
Qwen 2.5 72B10010010067073.3%
Mistral Small Creative10010010067073.3%
Ministral 3 3B10010010067073.3%
Inception Mercury 210010067333366.7%
Gemma 3 12B10010067333366.7%
Gemma 3 27B10010067333366.7%
Arcee AI: Trinity Large (Preview)10010067333366.7%
GPT-4.1 Nano10010010033066.7%
Ministral 8B1006767673366.7%
Llama 3.1 70B1006767333360.0%
Llama 3.1 Nemotron 70B1006767333360.0%
Gemini 2.5 Flash (Reasoning)1006767333360.0%
Ministral 3 8B1001001000060.0%
Claude 3 Haiku100676767060.0%
Stealth: Aurora Alpha100676733053.3%
Nemotron 3 Nano100676733053.3%
Gemini 2.5 Flash Lite100673333046.7%
GPT-4o, May 13th (temp=0)673300020.0%
Mistral Small 3.2 24B67000013.3%
GPT-4o Mini (temp=0)67000013.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Aion 2.0100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Stealth: Hunter Alpha100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Nemotron 3 Super100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Small 4100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Hermes 3 70B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Gemma 3 4B100100100100100100.0%
LFM2 24B100100100100100100.0%
Rocinante 12B100100100100100100.0%
ByteDance Seed 1.61001001001006793.3%
Claude Sonnet 41001001001006793.3%
o4 Mini1001001001006793.3%
Qwen 3.5 35B1001001001006793.3%
Z.AI GLM 4.51001001001006793.3%
Grok 4 Fast1001001001006793.3%
Qwen 3.5 9B1001001001006793.3%
Qwen 3.5 Plus (2026-02-15)1001001001006793.3%
Stealth: Healer Alpha1001001001006793.3%
Gemini 2.5 Flash Lite (Reasoning)1001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
ByteDance Seed 2.0 Lite1001001001006793.3%
Grok 4.20 (Beta)1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
GPT-4o, Aug. 6th (temp=1)1001001001006793.3%
GPT-5 Nano1001001001006793.3%
DeepSeek V3.21001001001006793.3%
Qwen 3 32B1001001001006793.3%
Gemini 2.5 Flash Lite1001001001006793.3%
Gemini 2.5 Flash1001001001006793.3%
Mistral Large1001001001006793.3%
Qwen3 235B A22B Instruct 25071001001001006793.3%
GPT-5.4 Nano (Reasoning, Low)1001001001006793.3%
GPT-5.4 Nano1001001001006793.3%
Ministral 3 14B1001001001006793.3%
Ministral 3 3B1001001001006793.3%
GPT-5.2100100100676786.7%
Gemma 3 12B1001001001003386.7%
Gemma 3 27B1001001001003386.7%
Arcee AI: Trinity Large (Preview)100100100676786.7%
Ministral 8B1001001001003386.7%
Ministral 3 8B1001001001003386.7%
Ministral 3B1001001001003386.7%
GPT-4o, Aug. 6th (temp=0)10010067676780.0%
GPT-4o Mini (temp=1)100100100100080.0%
GPT-4.1 Nano10010067676780.0%
Inception Mercury10010067673373.3%
Nemotron 3 Nano10010010033066.7%
Cohere Command R+ (Aug. 2024)1001006767066.7%
Mistral Small 3.2 24B1001006733060.0%
GPT-4o Mini (temp=0)1001006733060.0%
Claude 3 Haiku100676767060.0%
Llama 3.1 8B100676767060.0%
Stealth: Aurora Alpha100676733053.3%
Inception Mercury 210067670046.7%
Llama 3.1 70B10067670046.7%
GPT-4o, May 13th (temp=0)6767670040.0%
Qwen 2.5 72B10033330033.3%
Mistral NeMO1003300026.7%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
o4 Mini High100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Stealth: Hunter Alpha100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
Nemotron 3 Super100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Mistral Large 2100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Mistral Large100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
GPT-5.4 Nano100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Rocinante 12B100100100100100100.0%
GPT-5.11001001001006793.3%
Qwen 3.5 122B1001001001006793.3%
GPT-5.4 (Reasoning, Low)1001001001006793.3%
Z.AI GLM 51001001001006793.3%
Claude Sonnet 4.61001001001006793.3%
GPT-5.4 Mini (Reasoning)1001001001006793.3%
Gemini 3 Flash (Preview, Reasoning)1001001001006793.3%
GPT-5.21001001001006793.3%
Aion 2.01001001001006793.3%
Z.AI GLM 4.61001001001006793.3%
o4 Mini1001001001006793.3%
Z.AI GLM 4.51001001001006793.3%
Grok 4 Fast1001001001006793.3%
Stealth: Healer Alpha1001001001006793.3%
GPT-5.4 Mini (Reasoning, Low)1001001001006793.3%
Mistral Large 31001001001006793.3%
Gemini 3 Flash (Preview)1001001001006793.3%
GPT-4o, May 13th (temp=1)1001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
Hermes 3 405B1001001001006793.3%
GPT-4o, Aug. 6th (temp=1)1001001001006793.3%
GPT-5.4 Mini1001001001006793.3%
Qwen 3 32B1001001001006793.3%
GPT-5.4 Nano (Reasoning, Low)1001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Mistral Small 3.2 24B1001001001006793.3%
Llama 3.1 70B1001001001006793.3%
Mistral Small 41001001001006793.3%
Mistral Small Creative1001001001006793.3%
Ministral 3 14B1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
WizardLM 2 8x22b1001001001006793.3%
Llama 3.1 8B1001001001006793.3%
Grok 4.20 (Beta, Reasoning)100100100676786.7%
Mistral Small 4 (Reasoning)1001001001003386.7%
Qwen3 235B A22B Instruct 2507100100100676786.7%
Hermes 3 70B1001001001003386.7%
Cohere Command R+ (Aug. 2024)1001001001003386.7%
Ministral 8B100100100676786.7%
Ministral 3B100100100676786.7%
LFM2 24B100100100676786.7%
GPT-4o Mini (temp=0)10010067676780.0%
ByteDance Seed 2.0 Mini100100100673380.0%
Gemini 2.5 Flash (Reasoning)100100100673380.0%
Qwen 2.5 72B100100100673380.0%
Mistral NeMO10010067676780.0%
Inception Mercury 210010010067073.3%
DeepSeek V3 (2024-12-26)1006767676773.3%
GPT-4o, Aug. 6th (temp=0)10010010067073.3%
Nemotron 3 Nano10010067673373.3%
Ministral 3 8B10010010067073.3%
Ministral 3 3B10010010067073.3%
Stealth: Aurora Alpha1006767673366.7%
Inception Mercury10010010033066.7%
Claude 3 Haiku10010067333366.7%
GPT-4o, May 13th (temp=0)1001006733060.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Aion 2.0100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
GPT-4.1100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Stealth: Hunter Alpha100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral Large 2100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Small 4100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Gemini 3.1 Pro (Preview)1001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
Gemini 2.5 Pro1001001001006793.3%
o4 Mini1001001001006793.3%
Stealth: Healer Alpha1001001001006793.3%
GPT-5.4 Mini (Reasoning, Low)1001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
Nemotron 3 Super1001001001006793.3%
Inception Mercury 21001001001006793.3%
GPT-4o, May 13th (temp=1)1001001001006793.3%
Mistral Small 4 (Reasoning)1001001001006793.3%
Gemini 2.5 Flash1001001001006793.3%
Gemma 3 12B1001001001006793.3%
Llama 3.1 Nemotron 70B1001001001006793.3%
GPT-5.4 Nano1001001001006793.3%
Hermes 3 70B1001001001006793.3%
Ministral 3B1001001001006793.3%
LFM2 24B1001001001006793.3%
GPT-5.1100100100676786.7%
GPT-5100100100676786.7%
GPT-4o, May 13th (temp=0)100100100676786.7%
Mistral Small 3.2 24B100100100676786.7%
Cohere Command R+ (Aug. 2024)100100100676786.7%
Llama 3.1 8B100100100676786.7%
Mistral Small Creative1001001001003386.7%
GPT-4o Mini (temp=0)10010067676780.0%
Nemotron 3 Nano10010067676780.0%
Inception Mercury100100100100080.0%
Qwen 2.5 72B100100100673380.0%
GPT-4o, Aug. 6th (temp=0)10010067673373.3%
Ministral 3 3B10010010067073.3%
Mistral NeMO100100100333373.3%
Llama 3.1 70B1006767673366.7%
GPT-4.1 Nano1006767333360.0%
Stealth: Aurora Alpha100676733053.3%
Claude 3 Haiku100676733053.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
GPT-4.1100100100100100100.0%
o4 Mini100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Mistral NeMO100100100100100100.0%
Ministral 8B100100100100100100.0%
LFM2 24B100100100100100100.0%
Claude Opus 4.6 (Reasoning)1001001001006793.3%
GPT-5.4 (Reasoning)1001001001006793.3%
GPT-5.11001001001006793.3%
Qwen 3.5 122B1001001001006793.3%
GPT-5.4 (Reasoning, Low)1001001001006793.3%
Claude Opus 4.51001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
Gemini 2.5 Pro1001001001006793.3%
Gemini 2.5 Flash (Reasoning)1001001001006793.3%
GPT-5.4 Mini (Reasoning, Low)1001001001006793.3%
Gemini 3 Flash (Preview)1001001001006793.3%
ByteDance Seed 2.0 Lite1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
GPT-4o, Aug. 6th (temp=1)1001001001006793.3%
Writer: Palmyra X51001001001006793.3%
Gemma 3 27B1001001001006793.3%
Mistral Small 41001001001006793.3%
Mistral Small Creative1001001001006793.3%
Cohere Command R+ (Aug. 2024)1001001001006793.3%
Ministral 3B1001001001006793.3%
Qwen 3.5 397B A17B100100100676786.7%
GPT-5.4 Mini (Reasoning)100100100676786.7%
Aion 2.0100100100676786.7%
Qwen 3.5 35B100100100676786.7%
Qwen 3.5 Plus (2026-02-15)1001001001003386.7%
Gemini 2.5 Flash Lite (Reasoning)100100100676786.7%
Nemotron 3 Super100100100676786.7%
Stealth: Aurora Alpha100100100676786.7%
DeepSeek V3.1100100100676786.7%
GPT-5.4 Nano (Reasoning, Low)100100100676786.7%
GPT-4o Mini (temp=1)1001001001003386.7%
Llama 3.1 Nemotron 70B100100100676786.7%
WizardLM 2 8x22b100100100676786.7%
Gemma 3 4B100100100676786.7%
Ministral 3 3B1001001001003386.7%
Rocinante 12B100100100676786.7%
Z.AI GLM 4.61001001001003386.7%
Mistral Medium 3.11001001001003386.7%
Qwen 3.5 27B10010067676780.0%
Gemini 3.1 Flash Lite (Preview)100100100673380.0%
Inception Mercury 2100100100673380.0%
GPT-4o, May 13th (temp=1)100100100673380.0%
Hermes 3 405B100100100673380.0%
GPT-5.4 Nano (Reasoning)10010067676780.0%
Nemotron 3 Nano100100100673380.0%
GPT-4.1 Nano100100100100080.0%
Claude 3 Haiku100100100673380.0%
GPT-4o, May 13th (temp=0)10010067673373.3%
GPT-4o, Aug. 6th (temp=0)10010010067073.3%
GPT-5.4 Nano1006767676773.3%
Arcee AI: Trinity Large (Preview)100100100333373.3%
Llama 3.1 8B10010010067073.3%
GPT-5.2676767676766.7%
Stealth: Hunter Alpha1001006767066.7%
Gemma 3 12B1006767673366.7%
GPT-4o Mini (temp=0)1001006767066.7%
Hermes 3 70B1001006767066.7%
Inception Mercury1001001000060.0%
Mistral Small 3.2 24B1001001000060.0%
Gemini 2.5 Flash67673333040.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Aion 2.0100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
GPT-4.1100100100100100100.0%
o4 Mini100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Stealth: Hunter Alpha100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 3B100100100100100100.0%
LFM2 24B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)1001001001006793.3%
GPT-5.4 (Reasoning, Low)1001001001006793.3%
MoonshotAI: Kimi K2.51001001001006793.3%
Qwen 3.5 27B1001001001006793.3%
o4 Mini High1001001001006793.3%
Grok 4.1 Fast1001001001006793.3%
Gemini 2.5 Flash (Reasoning)1001001001006793.3%
Grok 4 Fast1001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
ByteDance Seed 2.0 Lite1001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
Mistral Large1001001001006793.3%
Qwen3 235B A22B Instruct 25071001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Gemma 3 12B1001001001006793.3%
Mistral Small 41001001001006793.3%
Hermes 3 70B1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
Mistral NeMO1001001001006793.3%
GPT-5.21001001001003386.7%
Z.AI GLM 4.5100100100676786.7%
Grok 4.20 (Beta)100100100676786.7%
Claude 3.5 Haiku100100100676786.7%
Hermes 3 405B100100100676786.7%
GPT-5 Nano1001001001003386.7%
GPT-5.4 Mini100100100676786.7%
DeepSeek V3.1100100100676786.7%
GPT-5.4 Nano (Reasoning)100100100676786.7%
Gemini 2.5 Flash100100100676786.7%
Writer: Palmyra X5100100100676786.7%
Arcee AI: Trinity Large (Preview)100100100676786.7%
Mistral Small Creative100100100676786.7%
Nemotron 3 Nano10010067676780.0%
ByteDance Seed 1.6100100100673380.0%
GPT-5.4 Mini (Reasoning)10010067676780.0%
Gemini 2.5 Pro100100100673380.0%
GPT-5.4 Mini (Reasoning, Low)100100100673380.0%
Nemotron 3 Super100100100673380.0%
DeepSeek V3 (2024-12-26)100100100100080.0%
Claude 3.7 Sonnet100100100673380.0%
GPT-4o, Aug. 6th (temp=0)10010067676780.0%
GPT-5.4 Nano (Reasoning, Low)10010067676780.0%
GPT-5.4 Nano10010067676780.0%
Cohere Command R+ (Aug. 2024)10010067676780.0%
Ministral 3 3B100100100673380.0%
Llama 3.1 70B10010067673373.3%
GPT-4o Mini (temp=0)1006767676773.3%
Llama 3.1 Nemotron 70B10010010067073.3%
WizardLM 2 8x22b10010067673373.3%
Ministral 8B100100100333373.3%
Llama 3.1 8B1001006767066.7%
Inception Mercury1001006733060.0%
Inception Mercury 267676767053.3%
Stealth: Aurora Alpha100676733053.3%
Qwen 2.5 72B100676733053.3%
Claude 3 Haiku100673333046.7%
Mistral Small 3.2 24B10033330033.3%
GPT-4o, May 13th (temp=0)673300020.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Aion 2.0100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
GPT-4.1100100100100100100.0%
o4 Mini100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
LFM2 24B100100100100100100.0%
Qwen 3.5 122B1001001001006793.3%
Qwen 3.5 27B1001001001006793.3%
Z.AI GLM 4.61001001001006793.3%
MiniMax M2.71001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
MiniMax M2.51001001001006793.3%
Grok 41001001001006793.3%
Qwen 3.5 35B1001001001006793.3%
Stealth: Hunter Alpha1001001001006793.3%
Z.AI GLM 4.51001001001006793.3%
Qwen 3.5 9B1001001001006793.3%
Stealth: Healer Alpha1001001001006793.3%
Mistral Large 31001001001006793.3%
Z.AI GLM 4.7 Flash1001001001006793.3%
Nemotron 3 Super1001001001006793.3%
DeepSeek V3 (2024-12-26)1001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
Mistral Large 21001001001006793.3%
DeepSeek V3.11001001001006793.3%
Gemini 2.5 Flash Lite1001001001006793.3%
Writer: Palmyra X51001001001006793.3%
GPT-5.4 Nano (Reasoning, Low)1001001001006793.3%
GPT-4o Mini (temp=0)1001001001006793.3%
Nemotron 3 Nano1001001001006793.3%
Mistral Small 41001001001006793.3%
GPT-5.4 Nano1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
Arcee AI: Trinity Mini1001001001006793.3%
Gemma 3 4B1001001001006793.3%
Rocinante 12B1001001001006793.3%
GPT-5.1100100100676786.7%
Gemini 2.5 Flash Lite (Reasoning)100100100676786.7%
Claude 3.7 Sonnet100100100676786.7%
Mistral Large1001001001003386.7%
Mistral Small 3.2 24B100100100676786.7%
Llama 3.1 Nemotron 70B100100100676786.7%
Cohere Command R+ (Aug. 2024)100100100676786.7%
Ministral 3 3B100100100676786.7%
Llama 3.1 8B100100100676786.7%
Ministral 3B1001001001003386.7%
GPT-5.21001001001003386.7%
Z.AI GLM 510010067676780.0%
Claude Opus 4.510010067676780.0%
Z.AI GLM 4.710010067676780.0%
Gemini 2.5 Pro100100100673380.0%
Gemini 2.5 Flash (Reasoning)100100100673380.0%
Qwen 3.5 Plus (2026-02-15)10010067676780.0%
Inception Mercury100100100100080.0%
Gemma 3 27B10010067676780.0%
Mistral Small Creative10010067676780.0%
Claude 3 Haiku100100100673380.0%
GPT-4o, May 13th (temp=1)10010010067073.3%
Gemma 3 12B10010010067073.3%
Llama 3.1 70B10010067673373.3%
Arcee AI: Trinity Large (Preview)10010010067073.3%
Ministral 3 8B10010067673373.3%
WizardLM 2 8x22b10010067673373.3%
Stealth: Aurora Alpha10010067333366.7%
GPT-4o, Aug. 6th (temp=0)10010067333366.7%
GPT-4o Mini (temp=1)10010067333366.7%
Ministral 8B1001006767066.7%
GPT-4o, May 13th (temp=0)1006767333360.0%
DeepSeek-V2 Chat100676733053.3%
Gemini 2.5 Flash67676767053.3%
Mistral NeMO1006733333353.3%
Inception Mercury 2676733333346.7%
Qwen 2.5 72B6767330033.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Aion 2.0100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
o4 Mini100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Rocinante 12B100100100100100100.0%
GPT-5.11001001001006793.3%
GPT-5.21001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
Grok 41001001001006793.3%
Gemini 2.5 Flash (Reasoning)1001001001006793.3%
Qwen 3.5 Flash1001001001006793.3%
Stealth: Healer Alpha1001001001006793.3%
GPT-5.4 Mini (Reasoning, Low)1001001001006793.3%
Mistral Large 31001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
Nemotron 3 Super1001001001006793.3%
DeepSeek V3 (2024-12-26)1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
GPT-5 Nano1001001001006793.3%
DeepSeek V3.11001001001006793.3%
DeepSeek V3.21001001001006793.3%
Gemini 2.5 Flash Lite1001001001006793.3%
Gemini 2.5 Flash1001001001006793.3%
Mistral Large1001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Gemma 3 27B1001001001006793.3%
Mistral Medium 3.11001001001006793.3%
Mistral Small 41001001001006793.3%
Llama 3.1 Nemotron 70B1001001001006793.3%
GPT-5.4 Nano1001001001006793.3%
Mistral Small Creative1001001001006793.3%
Ministral 3 14B1001001001006793.3%
Ministral 3 8B1001001001006793.3%
LFM2 24B1001001001006793.3%
ByteDance Seed 1.61001001001003386.7%
Stealth: Hunter Alpha1001001001003386.7%
Grok 4 Fast1001001001003386.7%
GPT-4o, May 13th (temp=1)100100100676786.7%
Hermes 3 405B1001001001003386.7%
Gemma 3 12B1001001001003386.7%
GPT-4.1 Nano100100100676786.7%
Cohere Command R+ (Aug. 2024)100100100676786.7%
GPT-4o Mini (temp=0)10010067676780.0%
Ministral 3B10010067676780.0%
Nemotron 3 Nano10010067676780.0%
Arcee AI: Trinity Large (Preview)100100100100080.0%
Hermes 3 70B100100100673380.0%
Inception Mercury 21006767676773.3%
Stealth: Aurora Alpha10010067673373.3%
Llama 3.1 70B100100100333373.3%
WizardLM 2 8x22b10010010067073.3%
Ministral 3 3B10010010067073.3%
Mistral NeMO10010010033066.7%
Inception Mercury1001006733060.0%
GPT-4o, Aug. 6th (temp=0)10067670046.7%
Mistral Small 3.2 24B10067670046.7%
GPT-4o, May 13th (temp=0)10033330033.3%
Qwen 2.5 72B10033330033.3%
Claude 3 Haiku6767330033.3%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Aion 2.0100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
o4 Mini100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Mistral Large 2100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Mistral Large100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small 4100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Ministral 8B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
LFM2 24B100100100100100100.0%
Rocinante 12B100100100100100100.0%
GPT-5.4 (Reasoning)1001001001006793.3%
Qwen 3.5 397B A17B1001001001006793.3%
Claude Opus 4.51001001001006793.3%
GPT-4.11001001001006793.3%
Gemini 2.5 Flash (Reasoning)1001001001006793.3%
Qwen 3.5 Flash1001001001006793.3%
Qwen 3.5 9B1001001001006793.3%
GPT-5.4 Mini (Reasoning, Low)1001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
GPT-5.41001001001006793.3%
Claude 3.5 Haiku1001001001006793.3%
GPT-5.4 Mini1001001001006793.3%
GPT-5.4 Nano (Reasoning)1001001001006793.3%
Writer: Palmyra X51001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
GPT-4o Mini (temp=0)1001001001006793.3%
Mistral Medium 3.11001001001006793.3%
Arcee AI: Trinity Large (Preview)1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
WizardLM 2 8x22b1001001001006793.3%
Stealth: Hunter Alpha1001001001003386.7%
Nemotron 3 Super1001001001003386.7%
GPT-4o, May 13th (temp=1)1001001001003386.7%
Hermes 3 405B100100100676786.7%
GPT-4o, Aug. 6th (temp=0)100100100676786.7%
Mistral Small 4 (Reasoning)1001001001003386.7%
GPT-5.4 Nano (Reasoning, Low)100100100676786.7%
GPT-5.4 Nano100100100676786.7%
Cohere Command R+ (Aug. 2024)1001001001003386.7%
Gemma 3 4B100100100676786.7%
Inception Mercury 210010067676780.0%
GPT-5.2100100100673380.0%
DeepSeek V3 (2024-12-26)100100100673380.0%
Llama 3.1 70B100100100673380.0%
Qwen 2.5 72B10010067676780.0%
Hermes 3 70B10010067676780.0%
Claude 3 Haiku100100100673380.0%
Stealth: Aurora Alpha1006767676773.3%
Qwen 3.5 27B10010067333366.7%
GPT-4o, May 13th (temp=0)1006767673366.7%
Inception Mercury1001006767066.7%
Mistral NeMO1001006767066.7%
Nemotron 3 Nano1001006733060.0%
Mistral Small 3.2 24B100100330046.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Aion 2.0100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
o4 Mini100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Stealth: Hunter Alpha100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
Nemotron 3 Super100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Mistral Large100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Small 4100100100100100100.0%
GPT-5.4 Nano100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
LFM2 24B100100100100100100.0%
Rocinante 12B100100100100100100.0%
GPT-5.11001001001006793.3%
Grok 4.20 (Beta, Reasoning)1001001001006793.3%
GPT-5.4 (Reasoning, Low)1001001001006793.3%
ByteDance Seed 1.61001001001006793.3%
GPT-5.4 Mini (Reasoning)1001001001006793.3%
o4 Mini High1001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
Qwen 3.5 Plus (2026-02-15)1001001001006793.3%
Hermes 3 405B1001001001006793.3%
DeepSeek V3.21001001001006793.3%
Llama 3.1 Nemotron 70B1001001001006793.3%
Arcee AI: Trinity Large (Preview)1001001001006793.3%
Arcee AI: Trinity Mini1001001001006793.3%
Mistral NeMO1001001001006793.3%
Qwen 3.5 122B100100100676786.7%
DeepSeek-V2 Chat1001001001003386.7%
GPT-4o Mini (temp=1)100100100676786.7%
Llama 3.1 70B1001001001003386.7%
Qwen 3.5 397B A17B10010067676780.0%
GPT-5.2100100100673380.0%
Inception Mercury100100100100080.0%
GPT-4.1 Nano100100100673380.0%
Ministral 3 3B100100100100080.0%
Nemotron 3 Nano10010010067073.3%
Qwen 2.5 72B10010010067073.3%
Claude 3 Haiku10010067673373.3%
Stealth: Aurora Alpha1001006767066.7%
Mistral Small 3.2 24B10010067066.7%
GPT-4o, Aug. 6th (temp=0)1006733333353.3%
GPT-4o, May 13th (temp=0)10067330040.0%
Inception Mercury 26767330033.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
o4 Mini100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
Nemotron 3 Super100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
Qwen 3 32B100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Small 4100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3B100100100100100100.0%
LFM2 24B100100100100100100.0%
GPT-5.4 (Reasoning)1001001001006793.3%
GPT-5 Mini1001001001006793.3%
GPT-5.11001001001006793.3%
Qwen 3.5 397B A17B1001001001006793.3%
Qwen 3.5 27B1001001001006793.3%
GPT-5.4 Mini (Reasoning)1001001001006793.3%
Grok 4.1 Fast1001001001006793.3%
GPT-4.11001001001006793.3%
Gemini 2.5 Pro1001001001006793.3%
Grok 41001001001006793.3%
ByteDance Seed 2.0 Mini1001001001006793.3%
Z.AI GLM 4.51001001001006793.3%
GPT-5.4 Mini (Reasoning, Low)1001001001006793.3%
Gemini 2.5 Flash Lite (Reasoning)1001001001006793.3%
Z.AI GLM 4.7 Flash1001001001006793.3%
GPT-5.41001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
GPT-5.4 Mini1001001001006793.3%
DeepSeek V3.11001001001006793.3%
DeepSeek V3 (2025-03-24)1001001001006793.3%
Mistral Large1001001001006793.3%
Inception Mercury1001001001006793.3%
Gemma 3 27B1001001001006793.3%
Hermes 3 70B1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
Cohere Command R+ (Aug. 2024)1001001001006793.3%
Gemma 3 4B1001001001006793.3%
Llama 3.1 8B1001001001006793.3%
Rocinante 12B1001001001006793.3%
Aion 2.0100100100676786.7%
Z.AI GLM 4.7100100100676786.7%
Stealth: Hunter Alpha1001001001003386.7%
DeepSeek-V2 Chat1001001001003386.7%
Inception Mercury 2100100100676786.7%
GPT-4o, May 13th (temp=1)100100100676786.7%
GPT-5 Nano1001001001003386.7%
DeepSeek V3.2100100100676786.7%
Gemini 2.5 Flash Lite100100100676786.7%
Mistral NeMO100100100676786.7%
Ministral 8B1001001001003386.7%
Arcee AI: Trinity Large (Preview)1001001001003386.7%
GPT-4o, May 13th (temp=0)10010067676780.0%
DeepSeek V3 (2024-12-26)100100100100080.0%
GPT-5.4 Nano (Reasoning)10010067676780.0%
Llama 3.1 Nemotron 70B100100100100080.0%
GPT-5.4 Nano10010067676780.0%
Ministral 3 8B100100100100080.0%
Ministral 3 3B100100100100080.0%
GPT-4o, Aug. 6th (temp=0)10010067673373.3%
Stealth: Aurora Alpha10010010067073.3%
GPT-4o Mini (temp=1)10010067673373.3%
Nemotron 3 Nano10010067673373.3%
Qwen 2.5 72B10010010067073.3%
GPT-5.210010067333366.7%
GPT-5.4 Nano (Reasoning, Low)10010067333366.7%
Gemini 2.5 Flash1001001000060.0%
Gemini 2.5 Flash (Reasoning)1006733333353.3%
Llama 3.1 70B100676733053.3%
Mistral Small 3.2 24B10010000040.0%
Claude 3 Haiku1006700033.3%
GPT-4o Mini (temp=0)67000013.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Aion 2.0100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
GPT-5.4100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Mistral Large100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Small 4100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Ministral 3 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Ministral 8B100100100100100100.0%
LFM2 24B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Qwen 3.5 122B1001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
Stealth: Hunter Alpha1001001001006793.3%
ByteDance Seed 2.0 Mini1001001001006793.3%
Qwen 3.5 9B1001001001006793.3%
Mistral Large 31001001001006793.3%
ByteDance Seed 2.0 Lite1001001001006793.3%
Nemotron 3 Super1001001001006793.3%
Claude 3.5 Sonnet1001001001006793.3%
GPT-4o, May 13th (temp=1)1001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
Hermes 3 405B1001001001006793.3%
GPT-4o, Aug. 6th (temp=1)1001001001006793.3%
GPT-5.4 Nano (Reasoning)1001001001006793.3%
Gemini 2.5 Flash Lite1001001001006793.3%
Gemini 2.5 Flash1001001001006793.3%
Qwen3 235B A22B Instruct 25071001001001006793.3%
GPT-5.4 Nano (Reasoning, Low)1001001001006793.3%
Nemotron 3 Nano1001001001006793.3%
Cohere Command R+ (Aug. 2024)1001001001006793.3%
Mistral NeMO1001001001006793.3%
Ministral 3B1001001001006793.3%
o4 Mini1001001001003386.7%
Grok 4100100100676786.7%
Z.AI GLM 4.5100100100676786.7%
Grok 4 Fast100100100676786.7%
GPT-5.4 Mini (Reasoning, Low)100100100676786.7%
Claude 3.5 Haiku100100100676786.7%
Arcee AI: Trinity Large (Preview)100100100676786.7%
Hermes 3 70B1001001001003386.7%
Arcee AI: Trinity Mini100100100676786.7%
Grok 4.1 Fast100100100673380.0%
Stealth: Aurora Alpha100100100673380.0%
Claude 3.7 Sonnet100100100673380.0%
GPT-4o, Aug. 6th (temp=0)100100100673380.0%
GPT-5.4 Mini10010067676780.0%
GPT-4o Mini (temp=1)10010067676780.0%
GPT-5.4 Nano100100100673380.0%
Qwen 2.5 72B10010067673373.3%
DeepSeek V3 (2024-12-26)10010067673373.3%
Mistral Small 3.2 24B10010067673373.3%
GPT-4o Mini (temp=0)10010067673373.3%
Llama 3.1 Nemotron 70B100100100333373.3%
Claude 3 Haiku1006767676773.3%
Inception Mercury 21006767673366.7%
Llama 3.1 70B10010067333366.7%
Llama 3.1 8B1006767333360.0%
DeepSeek-V2 Chat100676767060.0%
Inception Mercury100100330046.7%
GPT-4o, May 13th (temp=0)10067330040.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
GPT-5.2100100100100100100.0%
Aion 2.0100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Mistral Large 3100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral Large 2100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Qwen 3 32B100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Inception Mercury100100100100100100.0%
Mistral Small 4100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
LFM2 24B100100100100100100.0%
Z.AI GLM 5 Turbo1001001001006793.3%
Qwen 3.5 27B1001001001006793.3%
Gemini 3 Flash (Preview, Reasoning)1001001001006793.3%
o4 Mini High1001001001006793.3%
Claude Opus 4.51001001001006793.3%
Grok 4.1 Fast1001001001006793.3%
MiniMax M2.71001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
Claude Sonnet 41001001001006793.3%
GPT-4.11001001001006793.3%
Claude Opus 41001001001006793.3%
Stealth: Hunter Alpha1001001001006793.3%
Gemini 2.5 Flash (Reasoning)1001001001006793.3%
Grok 4 Fast1001001001006793.3%
Gemini 2.5 Flash Lite (Reasoning)1001001001006793.3%
Gemini 3 Flash (Preview)1001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
GPT-5.41001001001006793.3%
GPT-4o, May 13th (temp=1)1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
GPT-5.4 Nano (Reasoning)1001001001006793.3%
Mistral Large1001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Gemma 3 27B1001001001006793.3%
Nemotron 3 Nano1001001001006793.3%
Arcee AI: Trinity Large (Preview)1001001001006793.3%
ByteDance Seed 1.6 Flash1001001001006793.3%
Ministral 3 14B1001001001006793.3%
Ministral 3 8B1001001001006793.3%
Ministral 3 3B1001001001006793.3%
Ministral 8B1001001001006793.3%
Llama 3.1 8B1001001001006793.3%
ByteDance Seed 1.6100100100676786.7%
Z.AI GLM 4.5100100100676786.7%
Stealth: Healer Alpha1001001001003386.7%
Claude Haiku 4.51001001001003386.7%
Z.AI GLM 4.7 Flash100100100676786.7%
GPT-5 Nano100100100676786.7%
Mistral Small 4 (Reasoning)1001001001003386.7%
GPT-5.4 Nano (Reasoning, Low)100100100676786.7%
Llama 3.1 Nemotron 70B100100100676786.7%
Rocinante 12B1001001001003386.7%
Qwen 3.5 35B1001001001003386.7%
Hermes 3 70B1001001001003386.7%
Qwen 3.5 Plus (2026-02-15)10010067676780.0%
Mistral NeMO10010067676780.0%
Qwen 3.5 397B A17B10010067676780.0%
o4 Mini100100100673380.0%
GPT-5.4 Mini (Reasoning, Low)100100100673380.0%
DeepSeek V3 (2024-12-26)100100100100080.0%
Hermes 3 405B100100100673380.0%
DeepSeek V3 (2025-03-24)100100100673380.0%
Mistral Medium 3.1100100100100080.0%
Mistral Small Creative10010067676780.0%
Arcee AI: Trinity Mini100100100673380.0%
Gemma 3 4B100100100673380.0%
Qwen 3.5 122B10010067673373.3%
Z.AI GLM 4.710010067673373.3%
Claude 3 Haiku10010010067073.3%
Cohere Command R+ (Aug. 2024)10010067673373.3%
Z.AI GLM 4.610010067333366.7%
GPT-4o, May 13th (temp=0)10010067333366.7%
Nemotron 3 Super1006767673366.7%
Inception Mercury 210010067333366.7%
Mistral Small 3.2 24B10010010033066.7%
Gemma 3 12B1006767673366.7%
GPT-5.4 Nano10010067333366.7%
Llama 3.1 70B1001006733060.0%
GPT-4o Mini (temp=0)1006767333360.0%
Gemini 2.5 Flash67676767053.3%
Ministral 3B100100670053.3%
Qwen 2.5 72B100673333046.7%
Gemini 2.5 Flash Lite67673333040.0%
GPT-4o, Aug. 6th (temp=0)6767330033.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral Large 2100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Qwen 3.5 122B1001001001006793.3%
Grok 4.1 Fast1001001001006793.3%
Aion 2.01001001001006793.3%
MiniMax M2.51001001001006793.3%
Stealth: Hunter Alpha1001001001006793.3%
ByteDance Seed 2.0 Mini1001001001006793.3%
Qwen 3.5 Flash1001001001006793.3%
Z.AI GLM 4.51001001001006793.3%
Grok 4 Fast1001001001006793.3%
Qwen 3.5 Plus (2026-02-15)1001001001006793.3%
Stealth: Healer Alpha1001001001006793.3%
Gemini 2.5 Flash Lite (Reasoning)1001001001006793.3%
Nemotron 3 Super1001001001006793.3%
Mistral Small 4 (Reasoning)1001001001006793.3%
DeepSeek V3.11001001001006793.3%
Mistral Large1001001001006793.3%
Qwen3 235B A22B Instruct 25071001001001006793.3%
Writer: Palmyra X51001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Gemma 3 27B1001001001006793.3%
Nemotron 3 Nano1001001001006793.3%
Mistral Small 41001001001006793.3%
Mistral Small Creative1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
Ministral 3 8B1001001001006793.3%
WizardLM 2 8x22b1001001001006793.3%
Arcee AI: Trinity Mini1001001001006793.3%
Ministral 3 3B1001001001006793.3%
Ministral 8B1001001001006793.3%
Ministral 3B1001001001006793.3%
LFM2 24B1001001001006793.3%
Rocinante 12B1001001001006793.3%
Qwen 3.5 27B100100100676786.7%
Grok 4100100100676786.7%
Hermes 3 405B1001001001003386.7%
DeepSeek V3.2100100100676786.7%
Gemini 2.5 Flash100100100676786.7%
Gemma 3 4B1001001001003386.7%
Cohere Command R+ (Aug. 2024)1001001001003386.7%
o4 Mini10010067676780.0%
GPT-4.1 Mini10010067676780.0%
Arcee AI: Trinity Large (Preview)100100100100080.0%
Claude 3.7 Sonnet10010010067073.3%
GPT-5.4 Nano100100100333373.3%
Mistral NeMO10010010067073.3%
GPT-4o, May 13th (temp=1)1006767673366.7%
DeepSeek V3 (2024-12-26)10010010033066.7%
Claude 3 Haiku1006767673366.7%
DeepSeek-V2 Chat1001001000060.0%
Hermes 3 70B1001003333053.3%
Inception Mercury 2100673333046.7%
GPT-4o, Aug. 6th (temp=0)676733333346.7%
Llama 3.1 70B100673333046.7%
GPT-4o Mini (temp=0)100673333046.7%
Llama 3.1 Nemotron 70B676733333346.7%
Inception Mercury10067330040.0%
Mistral Small 3.2 24B10067330040.0%
Stealth: Aurora Alpha10033330033.3%
Qwen 2.5 72B100000020.0%
GPT-4o, May 13th (temp=0)333300013.3%