Matches Regex

Test: Language Comprehension

Avg. Score
78.5%
Scenarios
2

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Ministral 3 3B100.0%$0.0000811ms100%
2Gemini 3.1 Flash Lite (Preview)100.0%$0.0002975ms100%
3Gemini 3.1 Flash Lite (Reasoning)100.0%$0.00022.3s100%
4Mistral Large 3100.0%$0.00014.2s100%
5DeepSeek V3 (2024-12-26)100.0%$0.00024.5s100%
6Gemini 3.5 Flash (Reasoning, Minimal)100.0%$0.00141.3s100%
7GPT-5.4 Mini (Reasoning, Low)100.0%$0.00093.6s100%
8DeepSeek V3.1100.0%$0.00025.6s100%
9Qwen 3.5 Plus (2026-02-15)100.0%$0.00035.4s100%
10GPT-5.5100.0%$0.00191.7s100%
11DeepSeek-V2 Chat100.0%$0.00017.5s100%
12Mistral Large 2100.0%$0.00124.9s100%
13Gemini 3 Flash (Preview, Reasoning)100.0%$0.00174.0s100%
14GPT-4o, May 13th (temp=0)100.0%$0.00222.9s100%
15Gemma 4 26B100.0%$0.00009.2s100%
16Claude Sonnet 4100.0%$0.00253.4s100%
17DeepSeek V3 (2025-03-24)100.0%$0.00019.8s100%
18Claude Sonnet 4.6100.0%$0.00263.3s100%
19Gemma 4 26B (Reasoning)100.0%$0.00029.7s100%
20Claude Sonnet 4.5100.0%$0.00283.7s100%
21Hermes 3 405B100.0%$0.000011.9s100%
22GPT-5.4 Mini (Reasoning)100.0%$0.00197.0s100%
23GPT-5.4 (Reasoning, Low)100.0%$0.00304.2s100%
24DeepSeek V3.2100.0%$0.000212.8s100%
25Claude Opus 4.5100.0%$0.00433.7s100%
26Z.AI GLM 5 Turbo100.0%$0.00268.9s100%
27Qwen 3.6 35B100.0%$0.002111.1s100%
28Claude Opus 4.6100.0%$0.00474.9s100%
29ByteDance Seed 1.6100.0%$0.001214.4s100%
30Claude Opus 4.7 (Reasoning)100.0%$0.00572.9s100%
31GPT-5.5 (Reasoning, Low)100.0%$0.00495.4s100%
32Gemma 4 31B (Reasoning)100.0%$0.000219.2s100%
33Claude Opus 4.7100.0%$0.00633.4s100%
34Z.AI GLM 4.5 Air100.0%$0.000917.6s100%
35Aion 2.0100.0%$0.001217.8s100%
36GPT-5.4 (Reasoning)100.0%$0.00539.0s100%
37Z.AI GLM 4.7 Flash100.0%$0.000523.4s100%
38Gemini 3.5 Flash (Reasoning)100.0%$0.00794.3s100%
39Claude Sonnet 4.6 (Reasoning)100.0%$0.00739.8s100%
40Gemini 2.5 Pro100.0%$0.00848.1s100%
41Qwen 3.5 122B100.0%$0.006016.0s100%
42GPT-5.5 (Reasoning)100.0%$0.0107.9s100%
43Claude Opus 4100.0%$0.010011.3s100%
44Claude Opus 4.6 (Reasoning)100.0%$0.0118.8s100%
45Z.AI GLM 4.6100.0%$0.002132.1s100%
46Qwen 3.5 35B100.0%$0.006521.4s100%
47ByteDance Seed 2.0 Lite100.0%$0.002831.7s100%
48Gemini 3 Pro (Preview)100.0%$0.0128.8s100%
49Qwen 3.5 27B100.0%$0.005925.2s100%
50Qwen 3.5 Plus (2026-04-20)100.0%$0.004130.3s100%
51Gemini 3.1 Pro (Preview)100.0%$0.01112.6s100%
52Qwen 3.6 27B100.0%$0.005232.6s100%
53MiniMax M2.7100.0%$0.002741.7s100%
54Qwen 3.5 397B A17B100.0%$0.004837.0s100%
55Grok 4.20 (Beta, Reasoning)100.0%$0.01610.5s100%
56Grok 4.20 (Reasoning)100.0%$0.008233.9s100%
57MoonshotAI: Kimi K2.5100.0%$0.005843.3s100%
58Z.AI GLM 4.7100.0%$0.002454.2s100%
59Qwen 3.5 9B100.0%$0.00061.0m100%
60Z.AI GLM 5100.0%$0.004052.7s100%
61Grok 4.3 (Reasoning)100.0%$0.008151.0s100%
62Qwen3.6 Max Preview100.0%$0.01350.7s100%
63Qwen3.7 Max100.0%$0.01946.9s100%
64DeepSeek V4 Flash (Reasoning)100.0%$0.00032.0m100%
65ByteDance Seed 2.0 Mini100.0%$0.00202.2m100%
66MoonshotAI: Kimi K2.6100.0%$0.0131.8m100%
67Mistral NeMO90.0%$0.0000380ms40%
68Gemini 2.5 Flash Lite90.0%$0.0001728ms40%
69Gemini 3.1 Flash Lite90.0%$0.0002977ms40%
70Gemini 3 Flash (Preview)90.0%$0.00031.2s40%
71Gemma 4 31B90.0%$0.00003.1s40%
72Z.AI GLM 4.590.0%$0.00013.1s40%
73Claude Haiku 4.590.0%$0.00092.1s40%
74WizardLM 2 8x22b90.0%$0.00025.0s40%
75GPT-4.190.0%$0.00132.7s40%
76GPT-4o, May 13th (temp=1)90.0%$0.00202.8s40%
77Stealth: Hunter Alpha90.0%$0.000010.9s40%
78GPT-5.290.0%$0.00265.0s40%
79GPT-5 Mini90.0%$0.001512.5s40%
80GPT-OSS 120B90.0%$0.000316.3s40%
81DeepSeek V4 Flash90.0%$0.000018.9s40%
82GPT-5.190.0%$0.00459.8s40%
83Qwen 3.5 Flash90.0%$0.001425.5s40%
84MiniMax M2.590.0%$0.001932.8s40%
85Z.AI GLM 5.190.0%$0.004142.3s40%
86GPT-590.0%$0.01628.4s40%
87Grok 4.20 (Beta)80.0%$0.0004649ms20%
88Gemini 2.5 Flash80.0%$0.0003987ms20%
89Gemma 3 4B80.0%$0.00002.2s20%
90GPT-5.480.0%$0.00182.1s20%
91Mistral Large80.0%$0.00123.7s20%
92Claude 3.7 Sonnet80.0%$0.00243.4s20%
93Stealth: Healer Alpha80.0%$0.000013.0s20%
94Qwen 3 32B80.0%$0.000313.0s20%
95Xiaomi MIMO v2.5 Pro80.0%$0.002413.4s20%
96Gemini 2.5 Flash Lite (Reasoning)80.0%$0.000920.0s20%
97Grok 4.1 Fast80.0%$0.001323.5s20%
98Gemini 2.5 Flash (Reasoning)80.0%$0.006211.7s20%
99Nemotron 3 Nano80.0%$0.000640.6s20%
100GPT-5.4 Mini70.0%$0.0002749ms8%
101Grok 4.370.0%$0.0003936ms8%
102Claude 3 Haiku70.0%$0.00011.3s8%
103Grok 4 Fast70.0%$0.00045.7s8%
104Cohere Command R+ (Aug. 2024)70.0%$0.00285.5s8%
105o4 Mini70.0%$0.00267.4s8%
106Qwen 3.6 Flash70.0%$0.00248.1s8%
107o4 Mini High70.0%$0.004613.2s8%
108Arcee AI: Trinity Large (Preview)60.0%$0.00001.7s2%
109Inception Mercury 260.0%$0.0004879ms2%
110GPT-4.1 Mini60.0%$0.00042.8s2%
111Llama 3.1 70B60.0%$0.00023.4s2%
112GPT-5.4 Nano (Reasoning, Low)60.0%$0.00033.3s2%
113GPT-5.4 Nano (Reasoning)60.0%$0.00045.3s2%
114Xiaomi MIMO v2.560.0%$0.00105.2s2%
115Claude 3.5 Sonnet60.0%$0.00193.6s2%
116Qwen3 235B A22B Instruct 250760.0%$0.000214.2s2%
117GPT-5 Nano60.0%$0.000617.5s2%
118Ministral 8B50.0%$0.0000347ms0%
119GPT-4o Mini (temp=1)50.0%$0.0000862ms0%
120GPT-4o Mini (temp=0)50.0%$0.0000877ms0%
121Mistral Small 3.2 24B50.0%$0.00002.4s0%
122Hermes 3 70B50.0%$0.00013.2s0%
123Gemma 3 12B50.0%$0.00004.9s0%
124Rocinante 12B50.0%$0.00015.5s0%
125Writer: Palmyra X550.0%$0.00177.8s0%
126DeepSeek V4 Pro50.0%$0.000313.0s0%
127GPT-5.4 Nano40.0%$0.00011.1s0%
128Grok 4.2040.0%$0.0002867ms0%
129Qwen 2.5 72B40.0%$0.00013.9s0%
130ByteDance Seed 1.6 Flash40.0%$0.00025.1s0%
131Stealth: Aurora Alpha70.0%1.4s8%
132Llama 3.1 Nemotron 70B40.0%$0.000213.5s0%
133Ministral 3B30.0%$0.0000357ms0%
134Inception Mercury30.0%$0.0000642ms0%
135Llama 3.1 8B30.0%$0.00001.4s0%
136GPT-4.1 Nano30.0%$0.00001.6s0%
137GPT-4o, Aug. 6th (temp=1)30.0%$0.00071.3s0%
138LFM2 24B30.0%$0.00004.0s0%
139Arcee AI: Trinity Mini40.0%$0.000132.5s0%
140Mistral Small 4 (Reasoning)30.0%$0.00066.3s0%
141Gemma 3 27B30.0%$0.00018.2s0%
142Nemotron 3 Super30.0%$0.000012.9s0%
143Mistral Small 410.0%$0.00011.2s0%
144Ministral 3 8B0.0%$0.0000711ms0%
145Ministral 3 14B0.0%$0.0000912ms0%
146Mistral Small Creative0.0%$0.00011.1s0%
147GPT-4o, Aug. 6th (temp=0)0.0%$0.00071.9s0%
148Mistral Medium 3.10.0%$0.00043.6s0%
149DeepSeek V4 Pro (Reasoning)70.0%$0.00974.8m8%
150Grok 470.0%$0.1103.2m8%
78.53%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen3.7 Max100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Qwen3.6 Max Preview100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4.7100100100100100100.0%
Qwen 3.6 27B100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Aion 2.0100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
GPT-5.5100100100100100100.0%
Qwen 3.6 35B100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Gemma 4 31B100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100.0%
GPT-OSS 120B100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemma 4 26B100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Mistral Large 2100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Z.AI GLM 5.1100100100100080.0%
GPT-5 Mini100100100100080.0%
o4 Mini High100100100100080.0%
MiniMax M2.5100100100100080.0%
Xiaomi MIMO v2.5 Pro100100100100080.0%
Stealth: Hunter Alpha100100100100080.0%
Gemini 2.5 Flash (Reasoning)100100100100080.0%
Qwen 3.5 Flash100100100100080.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100080.0%
Gemini 3 Flash (Preview)100100100100080.0%
Claude Haiku 4.5100100100100080.0%
Stealth: Aurora Alpha100100100100080.0%
GPT-4.1 Mini100100100100080.0%
GPT-5 Nano100100100100080.0%
Qwen 3 32B100100100100080.0%
DeepSeek V4 Flash100100100100080.0%
Gemini 2.5 Flash Lite100100100100080.0%
Gemini 2.5 Flash100100100100080.0%
Qwen3 235B A22B Instruct 2507100100100100080.0%
Writer: Palmyra X5100100100100080.0%
GPT-5.4 Nano (Reasoning, Low)100100100100080.0%
Grok 4.3100100100100080.0%
WizardLM 2 8x22b100100100100080.0%
Mistral NeMO100100100100080.0%
Grok 41001001000060.0%
Stealth: Healer Alpha1001001000060.0%
Xiaomi MIMO v2.51001001000060.0%
GPT-5.41001001000060.0%
Grok 4.20 (Beta)1001001000060.0%
Inception Mercury 21001001000060.0%
Claude 3.7 Sonnet1001001000060.0%
GPT-5.4 Mini1001001000060.0%
Mistral Large1001001000060.0%
Nemotron 3 Nano1001001000060.0%
Gemma 3 4B1001001000060.0%
Llama 3.1 8B1001001000060.0%
Qwen 3.6 Flash10010000040.0%
DeepSeek V4 Pro (Reasoning)10010000040.0%
o4 Mini10010000040.0%
Nemotron 3 Super10010000040.0%
Grok 4.2010010000040.0%
GPT-5.4 Nano10010000040.0%
Arcee AI: Trinity Large (Preview)10010000040.0%
ByteDance Seed 1.6 Flash10010000040.0%
GPT-4.1 Nano10010000040.0%
Claude 3 Haiku10010000040.0%
Arcee AI: Trinity Mini10010000040.0%
Cohere Command R+ (Aug. 2024)10010000040.0%
Ministral 8B10010000040.0%
Ministral 3B10010000040.0%
Rocinante 12B10010000040.0%
Claude 3.5 Sonnet100000020.0%
GPT-4o, Aug. 6th (temp=1)100000020.0%
Inception Mercury100000020.0%
Llama 3.1 70B100000020.0%
Llama 3.1 Nemotron 70B100000020.0%
Hermes 3 70B100000020.0%
LFM2 24B100000020.0%
DeepSeek V4 Pro000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Mistral Small 4 (Reasoning)000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Small 3.2 24B000000.0%
Gemma 3 12B000000.0%
GPT-4o Mini (temp=0)000000.0%
Gemma 3 27B000000.0%
Mistral Medium 3.1000000.0%
Mistral Small 4000000.0%
Qwen 2.5 72B000000.0%
Mistral Small Creative000000.0%
Ministral 3 14B000000.0%
Ministral 3 8B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen3.7 Max100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Qwen3.6 Max Preview100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5.1100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Qwen 3.6 Flash100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100.0%
Claude Opus 4.7100100100100100100.0%
Qwen 3.6 27B100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Aion 2.0100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
MiniMax M2.7100100100100100100.0%
GPT-5.5100100100100100100.0%
Qwen 3.6 35B100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
o4 Mini100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Stealth: Hunter Alpha100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemma 4 26B100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100.0%
Hermes 3 405B100100100100100100.0%
DeepSeek V4 Pro100100100100100100.0%
Mistral Large 2100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
DeepSeek V4 Flash100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Nemotron 3 Nano100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-5.1100100100100080.0%
GPT-5100100100100080.0%
GPT-5.2100100100100080.0%
GPT-4.1100100100100080.0%
Grok 4100100100100080.0%
Xiaomi MIMO v2.5 Pro100100100100080.0%
Gemma 4 31B100100100100080.0%
Gemini 2.5 Flash (Reasoning)100100100100080.0%
GPT-OSS 120B100100100100080.0%
Z.AI GLM 4.5100100100100080.0%
Gemini 3.1 Flash Lite100100100100080.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100080.0%
GPT-4o, May 13th (temp=1)100100100100080.0%
GPT-5.4 Mini100100100100080.0%
Qwen 3 32B100100100100080.0%
Gemini 2.5 Flash100100100100080.0%
Qwen 2.5 72B100100100100080.0%
Arcee AI: Trinity Large (Preview)100100100100080.0%
Hermes 3 70B100100100100080.0%
o4 Mini High1001001000060.0%
Grok 4.1 Fast1001001000060.0%
Xiaomi MIMO v2.51001001000060.0%
Inception Mercury 21001001000060.0%
Stealth: Aurora Alpha1001001000060.0%
Mistral Small 4 (Reasoning)1001001000060.0%
Grok 4.31001001000060.0%
Gemma 3 27B1001001000060.0%
Llama 3.1 Nemotron 70B1001001000060.0%
Ministral 8B1001001000060.0%
Rocinante 12B1001001000060.0%
Grok 4 Fast10010000040.0%
GPT-4.1 Mini10010000040.0%
GPT-4o, Aug. 6th (temp=1)10010000040.0%
GPT-5 Nano10010000040.0%
Grok 4.2010010000040.0%
Qwen3 235B A22B Instruct 250710010000040.0%
Inception Mercury10010000040.0%
GPT-5.4 Nano (Reasoning, Low)10010000040.0%
GPT-5.4 Nano10010000040.0%
ByteDance Seed 1.6 Flash10010000040.0%
Arcee AI: Trinity Mini10010000040.0%
LFM2 24B10010000040.0%
Nemotron 3 Super100000020.0%
GPT-5.4 Nano (Reasoning)100000020.0%
Writer: Palmyra X5100000020.0%
Mistral Small 4100000020.0%
GPT-4.1 Nano100000020.0%
Ministral 3B100000020.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Mistral Medium 3.1000000.0%
Mistral Small Creative000000.0%
Ministral 3 14B000000.0%
Ministral 3 8B000000.0%
Llama 3.1 8B000000.0%