Contains a count of nouns

Test: Novel outline

Avg. Score
78.4%
Scenarios
5

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1GPT-5.4 Mini (Reasoning, Low)100.0%$0.00152.6s100%
2Gemini 2.5 Flash (Reasoning)100.0%$0.00152.8s100%
3GPT-5.4 (Reasoning, Low)100.0%$0.00252.6s100%
4Gemini 3 Flash (Preview, Reasoning)100.0%$0.00243.6s100%
5o4 Mini100.0%$0.00235.1s100%
6Grok 4.20 (Beta, Reasoning)100.0%$0.00452.8s100%
7DeepSeek V4 Flash (Reasoning)100.0%$0.00018.4s100%
8GPT-5.4 (Reasoning)100.0%$0.00286.0s100%
9o4 Mini High100.0%$0.00286.5s100%
10Grok 4.20 (Reasoning)100.0%$0.00306.6s100%
11Qwen 3.6 35B100.0%$0.00178.8s100%
12GPT-5.5 (Reasoning, Low)100.0%$0.00932.7s100%
13GPT-5 Nano100.0%$0.000413.0s100%
14GPT-5.5 (Reasoning)100.0%$0.00993.5s100%
15Gemini 3.5 Flash (Reasoning)100.0%$0.00973.9s100%
16Gemini 2.5 Flash Lite (Reasoning)98.0%$0.00032.0s72%
17Qwen 3.5 Flash100.0%$0.000814.3s100%
18Qwen 3.5 35B100.0%$0.003911.7s100%
19Gemini 2.5 Pro100.0%$0.00896.4s100%
20Aion 2.0100.0%$0.002013.7s100%
21Qwen 3.5 122B100.0%$0.005010.8s100%
22MoonshotAI: Kimi K2.5100.0%$0.002013.9s100%
23Claude Opus 4.6100.0%$0.0124.1s100%
24Gemini 3 Pro (Preview)100.0%$0.0106.5s100%
25GPT-5.198.0%$0.00174.1s72%
26GPT-5 Mini98.0%$0.00106.9s72%
27Gemini 3.1 Pro (Preview)100.0%$0.0119.2s100%
28MoonshotAI: Kimi K2.6100.0%$0.002918.7s100%
29Claude Opus 4.6 (Reasoning)100.0%$0.0156.0s100%
30GPT-5.4 Mini (Reasoning)96.0%$0.00163.1s61%
31Claude Opus 4.7 (Reasoning)100.0%$0.0193.1s100%
32Qwen 3.5 Plus (2026-04-20)100.0%$0.003419.9s100%
33Grok 4100.0%$0.01112.0s100%
34Z.AI GLM 5.1100.0%$0.004520.6s100%
35Qwen 3.5 27B100.0%$0.003621.8s100%
36Qwen 3.6 Flash96.0%$0.00215.9s61%
37Grok 4.1 Fast94.0%$0.00043.3s53%
38Inception Mercury 292.0%$0.0003513ms46%
39Claude Sonnet 4.6 (Reasoning)98.0%$0.00945.7s72%
40MiniMax M2.596.0%$0.00079.1s61%
41Qwen3.7 Max100.0%$0.01215.8s100%
42GPT-5.294.0%$0.00252.9s53%
43GPT-OSS 120B96.0%$0.000310.1s61%
44Gemma 4 26B (Reasoning)100.0%$0.000328.2s100%
45Xiaomi MIMO v2.5 Pro94.0%$0.00146.8s53%
46Gemma 4 31B (Reasoning)100.0%$0.000430.7s100%
47Stealth: Aurora Alpha94.0%1.4s53%
48Llama 3.1 Nemotron 70B94.0%$0.00069.2s53%
49Z.AI GLM 5 Turbo92.0%$0.00194.6s46%
50DeepSeek V4 Pro (Reasoning)98.0%$0.001520.5s72%
51Claude Sonnet 494.0%$0.00754.2s53%
52Gemini 3.1 Flash Lite (Reasoning)88.0%$0.00071.2s35%
53Gemini 3.1 Flash Lite88.0%$0.00061.4s35%
54Qwen3.6 Max Preview100.0%$0.009026.1s100%
55Qwen 3.6 27B96.0%$0.004416.7s61%
56GPT-5.4 Nano (Reasoning, Low)86.0%$0.00042.9s31%
57Qwen 3.5 397B A17B100.0%$0.005734.3s100%
58Gemini 3.1 Flash Lite (Preview)84.0%$0.00061.1s27%
59Nemotron 3 Super90.0%$0.000011.3s40%
60GPT-4.186.0%$0.00223.5s31%
61DeepSeek V3.290.0%$0.000411.9s40%
62Z.AI GLM 4.7 Flash92.0%$0.000415.7s46%
63Qwen3 235B A22B Instruct 250790.0%$0.000312.5s40%
64MiniMax M2.786.0%$0.00077.2s31%
65Inception Mercury80.0%$0.0000565ms20%
66GPT-5.4 Nano (Reasoning)82.0%$0.00052.8s23%
67Mistral Small 4 (Reasoning)82.0%$0.00043.4s23%
68Claude Opus 4.590.0%$0.0133.8s40%
69Claude Sonnet 4.586.0%$0.00743.4s31%
70Z.AI GLM 590.0%$0.002415.8s40%
71Stealth: Hunter Alpha86.0%$0.000012.8s31%
72ByteDance Seed 1.6 Flash80.0%$0.00045.2s20%
73Z.AI GLM 4.696.0%$0.001931.1s61%
74Qwen 3 32B86.0%$0.000515.3s31%
75Z.AI GLM 4.796.0%$0.001732.6s61%
76Ministral 3 14B76.0%$0.00033.0s15%
77Mistral Large 282.0%$0.00446.2s23%
78Gemma 4 31B78.0%$0.00026.6s17%
79Mistral NeMO74.0%$0.00022.4s12%
80Claude Haiku 4.576.0%$0.00262.4s15%
81DeepSeek V3 (2025-03-24)82.0%$0.000612.1s23%
82Gemini 3.5 Flash (Reasoning, Minimal)74.0%$0.00351.4s12%
83Ministral 3 8B70.0%$0.00021.7s8%
84Grok 4.2072.0%$0.00182.2s10%
85Grok 4.3 (Reasoning)82.0%$0.003013.1s23%
86Qwen 3.5 9B96.0%$0.000639.5s61%
87ByteDance Seed 1.680.0%$0.001513.5s20%
88Claude Opus 4.784.0%$0.0172.3s27%
89Stealth: Healer Alpha72.0%$0.00006.5s10%
90Qwen 3.5 Plus (2026-02-15)78.0%$0.001611.7s17%
91Mistral Medium 3.168.0%$0.00092.9s7%
92GPT-4.1 Mini66.0%$0.00021.8s5%
93Claude Sonnet 4.672.0%$0.00642.1s10%
94DeepSeek V3 (2024-12-26)70.0%$0.00076.4s8%
95Claude 3.7 Sonnet74.0%$0.00684.4s12%
96Xiaomi MIMO v2.568.0%$0.00094.5s7%
97DeepSeek V4 Flash64.0%$0.00012.6s4%
98Grok 4.20 (Beta)66.0%$0.00143.3s5%
99Arcee AI: Trinity Mini64.0%$0.00013.4s4%
100Arcee AI: Trinity Large (Preview)64.0%$0.00003.8s4%
101Gemma 3 12B64.0%$0.00013.8s4%
102Mistral Small Creative62.0%$0.00022.0s3%
103Z.AI GLM 4.5 Air74.0%$0.000613.6s12%
104Ministral 3 3B60.0%$0.00021.2s2%
105Llama 3.1 70B66.0%$0.00115.8s5%
106Z.AI GLM 4.566.0%$0.00077.2s5%
107ByteDance Seed 2.0 Lite80.0%$0.002121.0s20%
108Llama 3.1 8B58.0%$0.00021.8s1%
109GPT-5.4 Nano58.0%$0.00042.3s1%
110DeepSeek-V2 Chat64.0%$0.00038.0s4%
111Nemotron 3 Nano66.0%$0.000210.1s5%
112Gemini 3 Flash (Preview)56.0%$0.00121.6s1%
113Mistral Small 454.0%$0.00021.5s0%
114Claude 3 Haiku54.0%$0.00051.3s0%
115Writer: Palmyra X566.0%$0.00259.1s5%
116GPT-5.456.0%$0.00221.6s1%
117Ministral 8B52.0%$0.00021.4s0%
118Gemma 4 26B60.0%$0.00027.5s2%
119DeepSeek V3.164.0%$0.000410.9s4%
120Mistral Large 354.0%$0.00102.6s0%
121Claude 3.5 Sonnet64.0%$0.00674.6s4%
122Ministral 3B50.0%$0.00011.2s0%
123Qwen 2.5 72B58.0%$0.00066.5s1%
124Gemini 2.5 Flash Lite48.0%$0.0001482ms0%
125Gemini 2.5 Flash48.0%$0.0002535ms0%
126DeepSeek V4 Pro60.0%$0.00109.4s2%
127GPT-4o Mini (temp=1)46.0%$0.0002919ms0%
128Hermes 3 70B50.0%$0.00063.6s0%
129GPT-4o, May 13th (temp=1)62.0%$0.00953.4s3%
130Mistral Large54.0%$0.00403.3s0%
131Gemma 3 4B42.0%$0.00011.1s0%
132WizardLM 2 8x22b52.0%$0.00156.3s0%
133Gemma 3 27B46.0%$0.00024.4s0%
134GPT-4o Mini (temp=0)40.0%$0.0002811ms0%
135GPT-558.0%$0.00429.0s1%
136GPT-4.1 Nano40.0%$0.00011.3s0%
137GPT-4o, Aug. 6th (temp=1)46.0%$0.00381.3s0%
138GPT-5.4 Mini40.0%$0.0011651ms0%
139Mistral Small 3.2 24B42.0%$0.00023.0s0%
140GPT-5.550.0%$0.00731.2s0%
141Grok 4 Fast40.0%$0.00032.1s0%
142GPT-4o, Aug. 6th (temp=0)44.0%$0.00371.2s0%
143Grok 4.340.0%$0.00141.7s0%
144GPT-4o, May 13th (temp=0)56.0%$0.00964.1s1%
145LFM2 24B42.0%$0.00014.9s0%
146Rocinante 12B52.0%$0.000411.8s0%
147Cohere Command R+ (Aug. 2024)44.0%$0.00492.9s0%
148Hermes 3 405B44.0%$0.00008.1s0%
149Claude Opus 484.0%$0.0379.8s27%
150ByteDance Seed 2.0 Mini76.0%$0.000839.1s15%
78.37%

Individual Scenarios

outline-count

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Rocinante 12B100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
Grok 4.3100100100100100100100100100090.0%
Grok 4 Fast1001001001001001001001000080.0%
GPT-5100100100100100100000060.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Rocinante 12B100100100100100100100100100100100.0%
GPT-5100100100100100100100100100090.0%
Z.AI GLM 4.7 Flash100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
Nemotron 3 Nano100100100100100100100100100090.0%
ByteDance Seed 1.6 Flash100100100100100100100100100090.0%
Ministral 3B1001001001001001001001000080.0%
ByteDance Seed 2.0 Mini10010010010000000040.0%
Grok 4 Fast10010010010000000040.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100090.0%
Qwen 3.6 Flash100100100100100100100100100090.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100090.0%
Z.AI GLM 4.6100100100100100100100100100090.0%
MiniMax M2.5100100100100100100100100100090.0%
Z.AI GLM 4.7100100100100100100100100100090.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100090.0%
Qwen 3.5 9B100100100100100100100100100090.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100090.0%
Z.AI GLM 4.7 Flash100100100100100100100100100090.0%
Mistral Small 4 (Reasoning)100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
Inception Mercury100100100100100100100100100090.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100090.0%
ByteDance Seed 1.6 Flash100100100100100100100100100090.0%
Mistral NeMO100100100100100100100100100090.0%
Claude Opus 4.51001001001001001001001000080.0%
Claude Sonnet 41001001001001001001001000080.0%
Claude Sonnet 4.51001001001001001001001000080.0%
Stealth: Hunter Alpha1001001001001001001001000080.0%
GPT-OSS 120B1001001001001001001001000080.0%
Grok 4.20 (Beta)1001001001001001001001000080.0%
Mistral Large 21001001001001001001001000080.0%
Grok 4.201001001001001001001001000080.0%
Qwen3 235B A22B Instruct 25071001001001001001001001000080.0%
Llama 3.1 Nemotron 70B1001001001001001001001000080.0%
Z.AI GLM 510010010010010010010000070.0%
ByteDance Seed 1.610010010010010010010000070.0%
GPT-5.210010010010010010010000070.0%
MiniMax M2.710010010010010010010000070.0%
Grok 4 Fast10010010010010010010000070.0%
Gemini 3.1 Flash Lite10010010010010010010000070.0%
Stealth: Aurora Alpha10010010010010010010000070.0%
DeepSeek V3.210010010010010010010000070.0%
Z.AI GLM 5 Turbo100100100100100100000060.0%
GPT-4.1100100100100100100000060.0%
ByteDance Seed 2.0 Mini100100100100100100000060.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100000060.0%
Inception Mercury 2100100100100100100000060.0%
Claude Haiku 4.51001001001001000000050.0%
Xiaomi MIMO v2.51001001001001000000050.0%
GPT-5.4 Nano (Reasoning)1001001001001000000050.0%
Nemotron 3 Nano1001001001001000000050.0%
Ministral 3 3B1001001001001000000050.0%
Gemma 4 31B10010010010000000040.0%
Stealth: Healer Alpha10010010010000000040.0%
Gemini 3.1 Flash Lite (Preview)10010010010000000040.0%
Claude 3.5 Sonnet10010010010000000040.0%
DeepSeek V3 (2024-12-26)10010010010000000040.0%
Claude Sonnet 4.6100100100000000030.0%
Claude Opus 4100100100000000030.0%
DeepSeek-V2 Chat100100100000000030.0%
ByteDance Seed 2.0 Lite100100100000000030.0%
Claude 3.7 Sonnet100100100000000030.0%
Z.AI GLM 4.5 Air100100100000000030.0%
Mistral Small 4100100100000000030.0%
GPT-5.4 Nano100100100000000030.0%
Claude Opus 4.71001000000000020.0%
Z.AI GLM 4.51001000000000020.0%
Gemma 4 26B1001000000000020.0%
GPT-4o, May 13th (temp=1)1001000000000020.0%
DeepSeek V4 Flash1001000000000020.0%
Arcee AI: Trinity Large (Preview)1001000000000020.0%
Ministral 3 14B1001000000000020.0%
Claude 3 Haiku1001000000000020.0%
WizardLM 2 8x22b1001000000000020.0%
GPT-5.510000000000010.0%
Qwen 3.5 Plus (2026-02-15)10000000000010.0%
Mistral Large 310000000000010.0%
GPT-5.410000000000010.0%
GPT-4.1 Mini10000000000010.0%
DeepSeek V4 Pro10000000000010.0%
DeepSeek V3.110000000000010.0%
Mistral Large10000000000010.0%
Writer: Palmyra X510000000000010.0%
GPT-4o Mini (temp=1)10000000000010.0%
Gemma 3 12B10000000000010.0%
Llama 3.1 70B10000000000010.0%
Mistral Medium 3.110000000000010.0%
Mistral Small Creative10000000000010.0%
Hermes 3 70B10000000000010.0%
Ministral 3 8B10000000000010.0%
Cohere Command R+ (Aug. 2024)10000000000010.0%
Ministral 8B10000000000010.0%
Llama 3.1 8B10000000000010.0%
LFM2 24B10000000000010.0%
Gemini 3.5 Flash (Reasoning, Minimal)00000000000.0%
GPT-4o, May 13th (temp=0)00000000000.0%
Gemini 3 Flash (Preview)00000000000.0%
Hermes 3 405B00000000000.0%
GPT-4o, Aug. 6th (temp=1)00000000000.0%
GPT-4o, Aug. 6th (temp=0)00000000000.0%
GPT-5.4 Mini00000000000.0%
Gemini 2.5 Flash Lite00000000000.0%
Gemini 2.5 Flash00000000000.0%
Grok 4.300000000000.0%
Mistral Small 3.2 24B00000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Gemma 3 27B00000000000.0%
Qwen 2.5 72B00000000000.0%
GPT-4.1 Nano00000000000.0%
Gemma 3 4B00000000000.0%
Ministral 3B00000000000.0%
Rocinante 12B00000000000.0%

pov-count

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100090.0%
ByteDance Seed 1.6100100100100100100100100100090.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100090.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100090.0%
Qwen 3.6 27B100100100100100100100100100090.0%
Grok 4.1 Fast100100100100100100100100100090.0%
MiniMax M2.5100100100100100100100100100090.0%
Z.AI GLM 4.7100100100100100100100100100090.0%
Gemini 3.1 Flash Lite100100100100100100100100100090.0%
Claude Haiku 4.5100100100100100100100100100090.0%
Xiaomi MIMO v2.5100100100100100100100100100090.0%
DeepSeek V3.1100100100100100100100100100090.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100090.0%
Mistral Small Creative100100100100100100100100100090.0%
Z.AI GLM 51001001001001001001001000080.0%
Xiaomi MIMO v2.5 Pro1001001001001001001001000080.0%
Gemini 3.5 Flash (Reasoning, Minimal)1001001001001001001001000080.0%
Qwen 3.5 Plus (2026-02-15)1001001001001001001001000080.0%
Z.AI GLM 4.7 Flash1001001001001001001001000080.0%
Claude 3.5 Sonnet1001001001001001001001000080.0%
GPT-4o, May 13th (temp=1)1001001001001001001001000080.0%
Claude 3.7 Sonnet1001001001001001001001000080.0%
Qwen 3 32B1001001001001001001001000080.0%
Writer: Palmyra X51001001001001001001001000080.0%
Inception Mercury1001001001001001001001000080.0%
ByteDance Seed 1.6 Flash1001001001001001001001000080.0%
Claude Opus 4.510010010010010010010000070.0%
MiniMax M2.710010010010010010010000070.0%
Stealth: Hunter Alpha10010010010010010010000070.0%
Z.AI GLM 4.510010010010010010010000070.0%
Stealth: Healer Alpha10010010010010010010000070.0%
DeepSeek-V2 Chat10010010010010010010000070.0%
GPT-5.410010010010010010010000070.0%
Z.AI GLM 4.5 Air10010010010010010010000070.0%
Mistral Large 210010010010010010010000070.0%
Mistral Small 4 (Reasoning)10010010010010010010000070.0%
DeepSeek V4 Flash10010010010010010010000070.0%
GPT-5.4 Nano (Reasoning)10010010010010010010000070.0%
Llama 3.1 70B10010010010010010010000070.0%
Arcee AI: Trinity Large (Preview)10010010010010010010000070.0%
Grok 4.3 (Reasoning)100100100100100100000060.0%
Gemma 4 31B100100100100100100000060.0%
Mistral Large 3100100100100100100000060.0%
Nemotron 3 Super100100100100100100000060.0%
DeepSeek V3 (2024-12-26)100100100100100100000060.0%
Grok 4.20100100100100100100000060.0%
Mistral Large100100100100100100000060.0%
Nemotron 3 Nano100100100100100100000060.0%
Ministral 3 14B100100100100100100000060.0%
Ministral 3 8B100100100100100100000060.0%
Gemma 4 26B1001001001001000000050.0%
Gemini 3 Flash (Preview)1001001001001000000050.0%
DeepSeek V3 (2025-03-24)1001001001001000000050.0%
GPT-5.510010010010000000040.0%
Gemini 2.5 Flash Lite10010010010000000040.0%
Mistral Small 410010010010000000040.0%
Qwen 2.5 72B10010010010000000040.0%
Claude 3 Haiku10010010010000000040.0%
Ministral 3 3B10010010010000000040.0%
Mistral NeMO10010010010000000040.0%
Ministral 8B10010010010000000040.0%
Rocinante 12B10010010010000000040.0%
Claude Sonnet 4.6100100100000000030.0%
Grok 4.20 (Beta)100100100000000030.0%
GPT-4.1 Mini100100100000000030.0%
DeepSeek V4 Pro100100100000000030.0%
GPT-4o, Aug. 6th (temp=1)100100100000000030.0%
GPT-5.4 Nano100100100000000030.0%
Llama 3.1 8B100100100000000030.0%
Ministral 3B100100100000000030.0%
GPT-51001000000000020.0%
GPT-4o, Aug. 6th (temp=0)1001000000000020.0%
Hermes 3 70B1001000000000020.0%
WizardLM 2 8x22b1001000000000020.0%
Hermes 3 405B10000000000010.0%
Grok 4.310000000000010.0%
Arcee AI: Trinity Mini10000000000010.0%
Gemma 3 4B10000000000010.0%
Grok 4 Fast00000000000.0%
GPT-4o, May 13th (temp=0)00000000000.0%
GPT-5.4 Mini00000000000.0%
Gemini 2.5 Flash00000000000.0%
GPT-4o Mini (temp=1)00000000000.0%
Mistral Small 3.2 24B00000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Gemma 3 27B00000000000.0%
GPT-4.1 Nano00000000000.0%
Cohere Command R+ (Aug. 2024)00000000000.0%
LFM2 24B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100090.0%
Qwen 3.6 Flash100100100100100100100100100090.0%
Qwen 3.6 27B100100100100100100100100100090.0%
Z.AI GLM 4.6100100100100100100100100100090.0%
MiniMax M2.7100100100100100100100100100090.0%
Claude Sonnet 4100100100100100100100100100090.0%
Claude Opus 4100100100100100100100100100090.0%
Gemma 4 31B100100100100100100100100100090.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100090.0%
Qwen 3.5 9B100100100100100100100100100090.0%
Nemotron 3 Super100100100100100100100100100090.0%
GPT-4.1 Mini100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100090.0%
Llama 3.1 Nemotron 70B100100100100100100100100100090.0%
Grok 4.1 Fast1001001001001001001001000080.0%
Stealth: Hunter Alpha1001001001001001001001000080.0%
ByteDance Seed 2.0 Mini1001001001001001001001000080.0%
Gemini 3.1 Flash Lite (Reasoning)1001001001001001001001000080.0%
Gemini 3.1 Flash Lite (Preview)1001001001001001001001000080.0%
Gemini 3.1 Flash Lite1001001001001001001001000080.0%
GPT-4o, May 13th (temp=0)1001001001001001001001000080.0%
DeepSeek V3.21001001001001001001001000080.0%
Ministral 3 8B1001001001001001001001000080.0%
GPT-4.110010010010010010010000070.0%
ByteDance Seed 2.0 Lite10010010010010010010000070.0%
Z.AI GLM 4.5 Air10010010010010010010000070.0%
Qwen3 235B A22B Instruct 250710010010010010010010000070.0%
Grok 4.3 (Reasoning)100100100100100100000060.0%
Claude 3.7 Sonnet100100100100100100000060.0%
DeepSeek V4 Pro100100100100100100000060.0%
Mistral Large 2100100100100100100000060.0%
Claude Sonnet 4.51001001001001000000050.0%
Stealth: Healer Alpha1001001001001000000050.0%
DeepSeek V3 (2024-12-26)1001001001001000000050.0%
Mistral Small 4 (Reasoning)1001001001001000000050.0%
Qwen 3 32B1001001001001000000050.0%
GPT-5.4 Nano (Reasoning, Low)1001001001001000000050.0%
Llama 3.1 70B1001001001001000000050.0%
Qwen 2.5 72B1001001001001000000050.0%
Llama 3.1 8B1001001001001000000050.0%
ByteDance Seed 1.610010010010000000040.0%
Z.AI GLM 4.510010010010000000040.0%
Claude Haiku 4.510010010010000000040.0%
Gemini 2.5 Flash10010010010000000040.0%
Writer: Palmyra X510010010010000000040.0%
ByteDance Seed 1.6 Flash10010010010000000040.0%
Mistral NeMO10010010010000000040.0%
Ministral 3B10010010010000000040.0%
Gemma 4 26B100100100000000030.0%
Gemini 3 Flash (Preview)100100100000000030.0%
DeepSeek V4 Flash100100100000000030.0%
Inception Mercury100100100000000030.0%
Gemma 3 27B100100100000000030.0%
Mistral Medium 3.1100100100000000030.0%
Nemotron 3 Nano100100100000000030.0%
GPT-5.4 Nano100100100000000030.0%
Arcee AI: Trinity Large (Preview)100100100000000030.0%
GPT-51001000000000020.0%
DeepSeek-V2 Chat1001000000000020.0%
Grok 4.20 (Beta)1001000000000020.0%
DeepSeek V3.11001000000000020.0%
Grok 4.201001000000000020.0%
GPT-4o Mini (temp=1)1001000000000020.0%
Hermes 3 70B1001000000000020.0%
WizardLM 2 8x22b1001000000000020.0%
Rocinante 12B1001000000000020.0%
Grok 4 Fast10000000000010.0%
GPT-4o, May 13th (temp=1)10000000000010.0%
Hermes 3 405B10000000000010.0%
Mistral Small 3.2 24B10000000000010.0%
Gemma 3 12B10000000000010.0%
Mistral Small Creative10000000000010.0%
Claude 3 Haiku10000000000010.0%
Arcee AI: Trinity Mini10000000000010.0%
Cohere Command R+ (Aug. 2024)10000000000010.0%
Ministral 3 3B10000000000010.0%
Ministral 8B10000000000010.0%
GPT-5.500000000000.0%
Mistral Large 300000000000.0%
Xiaomi MIMO v2.500000000000.0%
GPT-5.400000000000.0%
Claude 3.5 Sonnet00000000000.0%
GPT-4o, Aug. 6th (temp=1)00000000000.0%
GPT-4o, Aug. 6th (temp=0)00000000000.0%
GPT-5.4 Mini00000000000.0%
Gemini 2.5 Flash Lite00000000000.0%
Mistral Large00000000000.0%
Grok 4.300000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Mistral Small 400000000000.0%
GPT-4.1 Nano00000000000.0%
Gemma 3 4B00000000000.0%
LFM2 24B00000000000.0%