Contains a count of nouns

Test: Language Comprehension

Avg. Score
86.9%
Scenarios
2

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Ministral 3 3B100.0%$0.0000855ms100%
2Mistral NeMO100.0%$0.0000870ms100%
3Ministral 3 8B100.0%$0.00001.1s100%
4Mistral Small Creative100.0%$0.00011.1s100%
5Stealth: Aurora Alpha100.0%1.2s100%
6Ministral 3 14B100.0%$0.00001.4s100%
7GPT-4.1 Nano100.0%$0.00001.9s100%
8GPT-4.1 Mini100.0%$0.00011.8s100%
9Mistral Small 3.2 24B100.0%$0.00002.5s100%
10Mistral Medium 3.1100.0%$0.00022.0s100%
11Arcee AI: Trinity Large (Preview)100.0%$0.00003.3s100%
12Claude 3.5 Haiku100.0%$0.00051.5s100%
13GPT-4o, Aug. 6th (temp=0)100.0%$0.00051.5s100%
14GPT-4o, Aug. 6th (temp=1)100.0%$0.00051.3s100%
15Hermes 3 70B100.0%$0.00013.5s100%
16Grok 4 Fast100.0%$0.00022.9s100%
17Claude Haiku 4.5100.0%$0.00051.8s100%
18Qwen 2.5 72B100.0%$0.00013.7s100%
19Mistral Large 3100.0%$0.00033.1s100%
20Gemma 3 27B100.0%$0.00004.8s100%
21Mistral Large 2100.0%$0.00062.6s100%
22Qwen 3.5 Plus (2026-02-15)100.0%$0.00034.2s100%
23Grok 4.1 Fast100.0%$0.00024.4s100%
24DeepSeek V3 (2024-12-26)100.0%$0.00025.1s100%
25DeepSeek-V2 Chat100.0%$0.00005.8s100%
26Mistral Large100.0%$0.00082.5s100%
27DeepSeek V3 (2025-03-24)100.0%$0.00025.3s100%
28Rocinante 12B100.0%$0.00016.0s100%
29GPT-4o, May 13th (temp=0)100.0%$0.00111.9s100%
30GPT-4o, May 13th (temp=1)100.0%$0.00132.0s100%
31WizardLM 2 8x22b100.0%$0.00037.1s100%
32Claude Sonnet 4.6100.0%$0.00162.8s100%
33Claude 3.7 Sonnet100.0%$0.00183.2s100%
34Hermes 3 405B100.0%$0.000012.2s100%
35Minimax M2.5100.0%$0.000710.3s100%
36Claude Opus 4.5100.0%$0.00263.6s100%
37Claude Opus 4.6100.0%$0.00273.8s100%
38Z.AI GLM 4.5100.0%$0.001112.9s100%
39MoonshotAI: Kimi K2.5100.0%$0.002012.7s100%
40ByteDance Seed 1.6100.0%$0.001417.1s100%
41Gemma 3 12B90.0%$0.00002.8s40%
42GPT-4.190.0%$0.00061.9s40%
43Llama 3.1 70B90.0%$0.00018.3s40%
44Claude Opus 4100.0%$0.00959.3s100%
45DeepSeek V3.190.0%$0.00018.5s40%
46Z.AI GLM 4.6100.0%$0.002341.1s100%
47Claude Sonnet 4.590.0%$0.00153.2s40%
48Writer: Palmyra X590.0%$0.00088.3s40%
49Claude 3.5 Sonnet90.0%$0.00255.4s40%
50GPT-5 Mini90.0%$0.001310.3s40%
51Grok 4100.0%$0.009516.8s100%
52Llama 3.1 8B80.0%$0.00001.2s20%
53Z.AI GLM 4.7 Flash90.0%$0.000419.9s40%
54Arcee AI: Trinity Mini80.0%$0.00013.0s20%
55Llama 3.1 Nemotron 70B80.0%$0.00018.3s20%
56Claude Sonnet 480.0%$0.00152.7s20%
57Gemini 2.5 Flash Lite70.0%$0.0000568ms8%
58Gemini 2.5 Flash70.0%$0.0001646ms8%
59Gemini 3 Flash (Preview)70.0%$0.00051.7s8%
60Z.AI GLM 590.0%$0.003226.3s40%
61Ministral 8B60.0%$0.0000733ms2%
62Gemma 3 4B60.0%$0.00001.1s2%
63GPT-4o Mini (temp=1)60.0%$0.00001.0s2%
64Claude 3 Haiku60.0%$0.00011.3s2%
65Qwen 3.5 397B A17B100.0%$0.007159.1s100%
66DeepSeek V3.260.0%$0.00015.5s2%
67GPT-5.260.0%$0.00113.3s2%
68GPT-5.160.0%$0.00124.3s2%
69GPT-4o Mini (temp=0)50.0%$0.00001.2s0%
70Gemini 2.5 Pro80.0%$0.00686.9s20%
71Cohere Command R+ (Aug. 2024)50.0%$0.00081.7s0%
72GPT-570.0%$0.00399.3s8%
73o4 Mini50.0%$0.00124.3s0%
74Gemini 3 Pro (Preview)90.0%$0.0129.7s40%
75ByteDance Seed 1.6 Flash40.0%$0.00025.1s0%
76o4 Mini High50.0%$0.00195.5s0%
77Gemini 3.1 Pro (Preview)90.0%$0.01314.4s40%
78Ministral 3B20.0%$0.0000591ms0%
79GPT-5 Nano50.0%$0.000722.4s0%
80Z.AI GLM 4.760.0%$0.002329.0s2%
86.87%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Rocinante 12B100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100080.0%
GPT-5 Mini100100100100080.0%
Z.AI GLM 5100100100100080.0%
Gemini 3 Pro (Preview)100100100100080.0%
Claude Sonnet 4.5100100100100080.0%
GPT-4.1100100100100080.0%
Gemini 3 Flash (Preview)100100100100080.0%
Z.AI GLM 4.7 Flash100100100100080.0%
Claude 3.5 Sonnet100100100100080.0%
DeepSeek V3.1100100100100080.0%
Gemma 3 12B100100100100080.0%
Llama 3.1 70B100100100100080.0%
Llama 3.1 Nemotron 70B100100100100080.0%
Llama 3.1 8B100100100100080.0%
Ministral 8B100100100100080.0%
GPT-51001001000060.0%
Gemini 2.5 Pro1001001000060.0%
Claude Sonnet 41001001000060.0%
Arcee AI: Trinity Mini1001001000060.0%
GPT-5.110010000040.0%
Z.AI GLM 4.710010000040.0%
Gemini 2.5 Flash10010000040.0%
Gemini 2.5 Flash Lite10010000040.0%
GPT-5.2100000020.0%
DeepSeek V3.2100000020.0%
GPT-4o Mini (temp=1)100000020.0%
Claude 3 Haiku100000020.0%
Gemma 3 4B100000020.0%
Ministral 3B100000020.0%
o4 Mini High000000.0%
o4 Mini000000.0%
GPT-5 Nano000000.0%
ByteDance Seed 1.6 Flash000000.0%
GPT-4o Mini (temp=0)000000.0%
Cohere Command R+ (Aug. 2024)000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Rocinante 12B100100100100100100.0%
GPT-5.1100100100100080.0%
GPT-5100100100100080.0%
Z.AI GLM 4.7100100100100080.0%
ByteDance Seed 1.6 Flash100100100100080.0%
Writer: Palmyra X5100100100100080.0%
Llama 3.1 Nemotron 70B100100100100080.0%
Llama 3.1 8B100100100100080.0%
Gemini 3 Flash (Preview)1001001000060.0%
Ministral 8B10010000040.0%
Ministral 3B100000020.0%