Contains a count of nouns

Test: Language Comprehension

Avg. Score
88.0%
Scenarios
2

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Inception Mercury100.0%$0.0000481ms100%
2Ministral 3 3B100.0%$0.0000855ms100%
3Mistral NeMO100.0%$0.0000870ms100%
4GPT-5.4 Nano100.0%$0.0001874ms100%
5Ministral 3 8B100.0%$0.00001.1s100%
6Mistral Small Creative100.0%$0.00011.1s100%
7Mistral Small 4100.0%$0.00011.1s100%
8Stealth: Aurora Alpha100.0%1.2s100%
9Ministral 3 14B100.0%$0.00001.4s100%
10GPT-4.1 Nano100.0%$0.00001.9s100%
11GPT-4.1 Mini100.0%$0.00011.8s100%
12Mistral Small 3.2 24B100.0%$0.00002.5s100%
13Mistral Medium 3.1100.0%$0.00022.0s100%
14Arcee AI: Trinity Large (Preview)100.0%$0.00003.3s100%
15GPT-4o, Aug. 6th (temp=0)100.0%$0.00051.5s100%
16Hermes 3 70B100.0%$0.00013.5s100%
17GPT-4o, Aug. 6th (temp=1)100.0%$0.00051.3s100%
18Stealth: Healer Alpha100.0%$0.00003.9s100%
19Grok 4 Fast100.0%$0.00022.9s100%
20Qwen 2.5 72B100.0%$0.00013.7s100%
21Claude Haiku 4.5100.0%$0.00051.8s100%
22Mistral Large 3100.0%$0.00033.1s100%
23LFM2 24B100.0%$0.00004.4s100%
24Gemma 3 27B100.0%$0.00004.8s100%
25GPT-5.4 Mini (Reasoning)100.0%$0.00062.7s100%
26Mistral Large 2100.0%$0.00062.6s100%
27Qwen 3.5 Plus (2026-02-15)100.0%$0.00034.2s100%
28Grok 4.1 Fast100.0%$0.00024.4s100%
29DeepSeek-V2 Chat100.0%$0.00005.8s100%
30DeepSeek V3 (2024-12-26)100.0%$0.00025.1s100%
31DeepSeek V3 (2025-03-24)100.0%$0.00025.3s100%
32Mistral Large100.0%$0.00082.5s100%
33Rocinante 12B100.0%$0.00016.0s100%
34GPT-4o, May 13th (temp=0)100.0%$0.00111.9s100%
35GPT-4o, May 13th (temp=1)100.0%$0.00132.0s100%
36WizardLM 2 8x22b100.0%$0.00037.1s100%
37Claude Sonnet 4.6100.0%$0.00162.8s100%
38Qwen 3 32B100.0%$0.000210.8s100%
39Claude 3.7 Sonnet100.0%$0.00183.2s100%
40Stealth: Hunter Alpha100.0%$0.000012.2s100%
41Hermes 3 405B100.0%$0.000012.2s100%
42Qwen 3.6 Flash100.0%$0.00156.1s100%
43GPT-OSS 120B100.0%$0.000112.5s100%
44MiniMax M2.5100.0%$0.000710.3s100%
45Claude Opus 4.5100.0%$0.00263.6s100%
46GPT-5 Mini100.0%$0.001310.3s100%
47Claude Opus 4.6100.0%$0.00273.8s100%
48MiniMax M2.7100.0%$0.000813.3s100%
49GPT-5.5 (Reasoning, Low)100.0%$0.00283.9s100%
50Z.AI GLM 4.5100.0%$0.001112.9s100%
51Aion 2.0100.0%$0.001013.9s100%
52GPT-5.5 (Reasoning)100.0%$0.00344.6s100%
53DeepSeek V4 Flash (Reasoning)100.0%$0.000120.7s100%
54MoonshotAI: Kimi K2.5100.0%$0.002012.7s100%
55Qwen 3.5 Plus (2026-04-20)100.0%$0.002013.8s100%
56ByteDance Seed 1.6100.0%$0.001417.1s100%
57Z.AI GLM 5 Turbo100.0%$0.003310.0s100%
58Nemotron 3 Super100.0%$0.000026.0s100%
59Grok 4.20 (Beta, Reasoning)100.0%$0.00513.6s100%
60Claude Opus 4.7 (Reasoning)100.0%$0.00543.4s100%
61Claude Sonnet 4.6 (Reasoning)100.0%$0.00506.8s100%
62ByteDance Seed 2.0 Lite100.0%$0.001923.3s100%
63Qwen 3.5 35B100.0%$0.004013.7s100%
64Qwen 3.5 122B100.0%$0.004611.2s100%
65Claude Opus 4.6 (Reasoning)100.0%$0.00605.7s100%
66Qwen 3.5 27B100.0%$0.003717.8s100%
67Qwen 3.5 Flash100.0%$0.000938.0s100%
68Gemini 3.1 Flash Lite (Reasoning)90.0%$0.0001904ms40%
69Gemini 3.1 Flash Lite (Preview)90.0%$0.0001975ms40%
70Inception Mercury 290.0%$0.0002663ms40%
71GPT-5.4 Mini90.0%$0.0002746ms40%
72Gemma 3 12B90.0%$0.00002.8s40%
73Grok 4.2090.0%$0.00041.8s40%
74Grok 4.20 (Beta)90.0%$0.0006803ms40%
75Gemini 3 Flash (Preview)90.0%$0.00051.7s40%
76GPT-4.190.0%$0.00061.9s40%
77DeepSeek V4 Pro90.0%$0.00024.4s40%
78Qwen3 235B A22B Instruct 250790.0%$0.00015.4s40%
79Z.AI GLM 4.6100.0%$0.002341.1s100%
80Llama 3.1 70B90.0%$0.00018.3s40%
81DeepSeek V3.190.0%$0.00018.5s40%
82Xiaomi MIMO v2.5 Pro90.0%$0.00085.5s40%
83Gemini 3.5 Flash (Reasoning, Minimal)90.0%$0.00171.7s40%
84Claude Opus 4100.0%$0.00959.3s100%
85GPT-5.190.0%$0.00124.3s40%
86Gemini 3 Flash (Preview, Reasoning)90.0%$0.00153.5s40%
87Claude Sonnet 4.590.0%$0.00153.2s40%
88Writer: Palmyra X590.0%$0.00088.3s40%
89GPT-5.4 (Reasoning)90.0%$0.00194.0s40%
90MoonshotAI: Kimi K2.6100.0%$0.003740.1s100%
91Claude 3.5 Sonnet90.0%$0.00255.4s40%
92Qwen3.6 Max Preview100.0%$0.007028.7s100%
93Grok 4100.0%$0.009516.8s100%
94Llama 3.1 8B80.0%$0.00001.2s20%
95Gemini 3.1 Flash Lite80.0%$0.0001832ms20%
96GPT-5.4 Nano (Reasoning, Low)80.0%$0.00011.6s20%
97Z.AI GLM 4.7 Flash90.0%$0.000419.9s40%
98Z.AI GLM 4.5 Air90.0%$0.000619.3s40%
99Arcee AI: Trinity Mini80.0%$0.00013.0s20%
100Grok 4.380.0%$0.00032.3s20%
101Qwen3.7 Max100.0%$0.009722.2s100%
102Gemma 4 26B (Reasoning)90.0%$0.000223.7s40%
103DeepSeek V4 Flash90.0%$0.000124.4s40%
104Gemma 4 26B80.0%$0.00005.0s20%
105Grok 4.20 (Reasoning)90.0%$0.002812.7s40%
106DeepSeek V4 Pro (Reasoning)100.0%$0.00251.0m100%
107GPT-5.4 Nano (Reasoning)80.0%$0.00027.4s20%
108GPT-5.280.0%$0.00113.3s20%
109Llama 3.1 Nemotron 70B80.0%$0.00018.3s20%
110Claude Sonnet 480.0%$0.00152.7s20%
111Grok 4.3 (Reasoning)90.0%$0.002518.2s40%
112GPT-5.580.0%$0.00172.3s20%
113Gemini 2.5 Flash Lite70.0%$0.0000568ms8%
114Qwen 3.6 35B80.0%$0.00148.5s20%
115Gemini 2.5 Flash70.0%$0.0001646ms8%
116Gemini 3.5 Flash (Reasoning)90.0%$0.00674.0s40%
117Nemotron 3 Nano80.0%$0.000216.7s20%
118Mistral Small 4 (Reasoning)70.0%$0.00022.7s8%
119GPT-5.4 Mini (Reasoning, Low)70.0%$0.00032.8s8%
120Z.AI GLM 590.0%$0.003226.3s40%
121Xiaomi MIMO v2.570.0%$0.00074.6s8%
122GPT-5.4 (Reasoning, Low)70.0%$0.00123.0s8%
123Qwen 3.6 27B80.0%$0.002613.7s20%
124Ministral 8B60.0%$0.0000733ms2%
125Gemma 3 4B60.0%$0.00001.1s2%
126GPT-4o Mini (temp=1)60.0%$0.00001.0s2%
127Gemini 2.5 Flash (Reasoning)70.0%$0.00184.1s8%
128Claude 3 Haiku60.0%$0.00011.3s2%
129GPT-580.0%$0.00399.3s20%
130Qwen 3.5 397B A17B100.0%$0.007159.1s100%
131DeepSeek V3.260.0%$0.00015.5s2%
132GPT-4o Mini (temp=0)50.0%$0.00001.2s0%
133GPT-5.450.0%$0.00081.1s0%
134Gemini 2.5 Flash Lite (Reasoning)50.0%$0.00033.6s0%
135Cohere Command R+ (Aug. 2024)50.0%$0.00081.7s0%
136Gemini 2.5 Pro80.0%$0.00686.9s20%
137Z.AI GLM 5.180.0%$0.003821.7s20%
138o4 Mini50.0%$0.00124.3s0%
139Claude Opus 4.770.0%$0.00544.2s8%
140ByteDance Seed 1.6 Flash40.0%$0.00025.1s0%
141o4 Mini High50.0%$0.00195.5s0%
142Gemini 3 Pro (Preview)90.0%$0.0129.7s40%
143Gemma 4 31B (Reasoning)50.0%$0.000117.6s0%
144Ministral 3B20.0%$0.0000591ms0%
145ByteDance Seed 2.0 Mini80.0%$0.000852.8s20%
146Gemini 3.1 Pro (Preview)90.0%$0.01314.4s40%
147GPT-5 Nano50.0%$0.000722.4s0%
148Z.AI GLM 4.760.0%$0.002329.0s2%
149Qwen 3.5 9B80.0%$0.00071.1m20%
150Gemma 4 31B10.0%$0.00006.6s0%
88.00%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen3.7 Max100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Qwen3.6 Max Preview100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Qwen 3.6 Flash100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Aion 2.0100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Grok 4100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Stealth: Hunter Alpha100100100100100100.0%
GPT-OSS 120B100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
Nemotron 3 Super100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Mistral Large 2100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Mistral Large100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Inception Mercury100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Small 4100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
GPT-5.4 Nano100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Ministral 3 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Mistral NeMO100100100100100100.0%
LFM2 24B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100080.0%
Gemini 3.5 Flash (Reasoning)100100100100080.0%
Grok 4.3 (Reasoning)100100100100080.0%
GPT-5.4 (Reasoning)100100100100080.0%
GPT-5.1100100100100080.0%
Gemma 4 26B (Reasoning)100100100100080.0%
Grok 4.20 (Reasoning)100100100100080.0%
Z.AI GLM 5100100100100080.0%
Gemini 3 Flash (Preview, Reasoning)100100100100080.0%
Gemini 3 Pro (Preview)100100100100080.0%
GPT-4.1100100100100080.0%
Claude Sonnet 4.5100100100100080.0%
Xiaomi MIMO v2.5 Pro100100100100080.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100080.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100080.0%
Qwen 3.5 9B100100100100080.0%
Gemini 3.1 Flash Lite (Preview)100100100100080.0%
GPT-5.4 Mini (Reasoning, Low)100100100100080.0%
Gemini 3 Flash (Preview)100100100100080.0%
Z.AI GLM 4.7 Flash100100100100080.0%
Claude 3.5 Sonnet100100100100080.0%
Grok 4.20 (Beta)100100100100080.0%
Inception Mercury 2100100100100080.0%
Z.AI GLM 4.5 Air100100100100080.0%
DeepSeek V4 Pro100100100100080.0%
GPT-5.4 Mini100100100100080.0%
DeepSeek V3.1100100100100080.0%
DeepSeek V4 Flash100100100100080.0%
Grok 4.20100100100100080.0%
Qwen3 235B A22B Instruct 2507100100100100080.0%
Grok 4.3100100100100080.0%
Gemma 3 12B100100100100080.0%
Llama 3.1 70B100100100100080.0%
Llama 3.1 Nemotron 70B100100100100080.0%
Ministral 8B100100100100080.0%
Llama 3.1 8B100100100100080.0%
Z.AI GLM 5.11001001000060.0%
GPT-51001001000060.0%
GPT-5.21001001000060.0%
Qwen 3.6 27B1001001000060.0%
GPT-5.51001001000060.0%
Qwen 3.6 35B1001001000060.0%
Claude Sonnet 41001001000060.0%
Gemini 2.5 Pro1001001000060.0%
ByteDance Seed 2.0 Mini1001001000060.0%
Gemma 4 26B1001001000060.0%
Gemini 3.1 Flash Lite1001001000060.0%
GPT-5.4 Nano (Reasoning)1001001000060.0%
GPT-5.4 Nano (Reasoning, Low)1001001000060.0%
Nemotron 3 Nano1001001000060.0%
Arcee AI: Trinity Mini1001001000060.0%
GPT-5.4 (Reasoning, Low)10010000040.0%
Claude Opus 4.710010000040.0%
Z.AI GLM 4.710010000040.0%
Gemini 2.5 Flash (Reasoning)10010000040.0%
Xiaomi MIMO v2.510010000040.0%
Mistral Small 4 (Reasoning)10010000040.0%
Gemini 2.5 Flash Lite10010000040.0%
Gemini 2.5 Flash10010000040.0%
Gemma 4 31B (Reasoning)100000020.0%
DeepSeek V3.2100000020.0%
GPT-4o Mini (temp=1)100000020.0%
Claude 3 Haiku100000020.0%
Gemma 3 4B100000020.0%
Ministral 3B100000020.0%
o4 Mini High000000.0%
o4 Mini000000.0%
Gemma 4 31B000000.0%
Gemini 2.5 Flash Lite (Reasoning)000000.0%
GPT-5.4000000.0%
GPT-5 Nano000000.0%
GPT-4o Mini (temp=0)000000.0%
ByteDance Seed 1.6 Flash000000.0%
Cohere Command R+ (Aug. 2024)000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen3.7 Max100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Qwen3.6 Max Preview100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5.1100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Qwen 3.6 Flash100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
GPT-5.2100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100.0%
Claude Opus 4.7100100100100100100.0%
Qwen 3.6 27B100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Aion 2.0100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
GPT-5.5100100100100100100.0%
Qwen 3.6 35B100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
o4 Mini100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100.0%
Stealth: Hunter Alpha100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100.0%
GPT-OSS 120B100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemma 4 26B100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
Nemotron 3 Super100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
Inception Mercury 2100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100.0%
Hermes 3 405B100100100100100100.0%
DeepSeek V4 Pro100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Qwen 3 32B100100100100100100.0%
DeepSeek V4 Flash100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4.20100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Mistral Large100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Inception Mercury100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Nemotron 3 Nano100100100100100100.0%
Mistral Small 4100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
GPT-5.4 Nano100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Mistral NeMO100100100100100100.0%
LFM2 24B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100080.0%
Z.AI GLM 4.7100100100100080.0%
Qwen 3.5 9B100100100100080.0%
Writer: Palmyra X5100100100100080.0%
Grok 4.3100100100100080.0%
Llama 3.1 Nemotron 70B100100100100080.0%
ByteDance Seed 1.6 Flash100100100100080.0%
Llama 3.1 8B100100100100080.0%
GPT-5.4 Mini (Reasoning, Low)1001001000060.0%
Ministral 8B10010000040.0%
Gemma 4 31B100000020.0%
Ministral 3B100000020.0%