Contains a list of texts

Test: Data extraction

Avg. Score
98.3%
Scenarios
2

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 2.5 Flash Lite100.0%$0.0000383ms100%
2Mistral Small Creative100.0%$0.0000428ms100%
3Ministral 3 8B100.0%$0.0000452ms100%
4Gemma 3 4B100.0%$0.0000537ms100%
5Ministral 3 14B100.0%$0.0000609ms100%
6Ministral 3 3B100.0%$0.0000694ms100%
7Mistral Small 4100.0%$0.0000601ms100%
8Gemini 3.1 Flash Lite100.0%$0.0001705ms100%
9Gemini 3.1 Flash Lite (Preview)100.0%$0.0001732ms100%
10Gemma 3 12B100.0%$0.00001.0s100%
11Mistral Large 3100.0%$0.0001723ms100%
12Mistral Medium 3.1100.0%$0.0001761ms100%
13GPT-5.4 Nano100.0%$0.00001.0s100%
14Inception Mercury 2100.0%$0.0002422ms100%
15Gemma 3 27B100.0%$0.00001.3s100%
16Mistral Small 3.2 24B100.0%$0.00001.3s100%
17Llama 3.1 70B100.0%$0.0001721ms100%
18Gemini 3.1 Flash Lite (Reasoning)100.0%$0.00011.1s100%
19Qwen 2.5 72B100.0%$0.00001.2s100%
20Gemma 4 26B100.0%$0.00001.4s100%
21Gemini 3 Flash (Preview)100.0%$0.0001877ms100%
22Grok 4.20100.0%$0.0002604ms100%
23GPT-5.4 Mini100.0%$0.0002698ms100%
24GPT-4.1 Mini100.0%$0.00011.4s100%
25DeepSeek V3 (2024-12-26)100.0%$0.00011.5s100%
26Llama 3.1 Nemotron 70B100.0%$0.00001.7s100%
27Qwen3 235B A22B Instruct 2507100.0%$0.00001.9s100%
28LFM2 24B100.0%$0.00002.0s100%
29Hermes 3 70B100.0%$0.00011.8s100%
30Qwen 3.5 Plus (2026-02-15)100.0%$0.00011.9s100%
31GPT-5.4 Nano (Reasoning, Low)100.0%$0.00012.0s100%
32Stealth: Aurora Alpha100.0%2.0s100%
33DeepSeek V3.1100.0%$0.00002.3s100%
34DeepSeek V4 Flash100.0%$0.00002.5s100%
35DeepSeek-V2 Chat100.0%$0.00002.5s100%
36Z.AI GLM 4.5100.0%$0.00012.4s100%
37Gemini 2.5 Flash Lite (Reasoning)100.0%$0.00012.0s100%
38Claude Haiku 4.5100.0%$0.00021.5s100%
39Grok 4 Fast100.0%$0.00022.0s100%
40Gemini 3.5 Flash (Reasoning, Minimal)100.0%$0.0003980ms100%
41DeepSeek V4 Flash (Reasoning)100.0%$0.00003.0s100%
42Mistral NeMO100.0%$0.00003.1s100%
43Mistral Large 2100.0%$0.0004997ms100%
44Arcee AI: Trinity Mini100.0%$0.00013.0s100%
45GPT-4.1100.0%$0.00031.4s100%
46DeepSeek V3 (2025-03-24)100.0%$0.00013.1s100%
47GPT-4o, Aug. 6th (temp=0)100.0%$0.00041.1s100%
48GPT-4o, Aug. 6th (temp=1)100.0%$0.00041.1s100%
49Mistral Small 4 (Reasoning)100.0%$0.00022.5s100%
50Grok 4.1 Fast100.0%$0.00022.8s100%
51Nemotron 3 Super100.0%$0.00004.3s100%
52GPT-5.4100.0%$0.0006876ms100%
53ByteDance Seed 1.6 Flash100.0%$0.00023.7s100%
54GPT-5.4 Mini (Reasoning, Low)100.0%$0.00033.0s100%
55Claude 3 Haiku100.0%$0.00015.1s100%
56Gemma 4 31B100.0%$0.00005.5s100%
57WizardLM 2 8x22b100.0%$0.00015.4s100%
58Claude Sonnet 4100.0%$0.00071.8s100%
59Claude Sonnet 4.5100.0%$0.00072.1s100%
60Claude 3.7 Sonnet100.0%$0.00082.0s100%
61DeepSeek V3.2100.0%$0.00016.5s100%
62GPT-5.4 Mini (Reasoning)100.0%$0.00063.3s100%
63GPT-OSS 120B100.0%$0.00016.7s100%
64GPT-5.5100.0%$0.00121.1s100%
65GPT-5 Mini100.0%$0.00055.3s100%
66GPT-5.4 (Reasoning, Low)100.0%$0.00102.3s100%
67Stealth: Hunter Alpha100.0%$0.00009.0s100%
68ByteDance Seed 2.0 Lite100.0%$0.00056.0s100%
69Gemini 3 Flash (Preview, Reasoning)100.0%$0.00102.9s100%
70Nemotron 3 Nano100.0%$0.00018.8s100%
71Claude Sonnet 4.6100.0%$0.00131.6s100%
72Qwen 3 32B100.0%$0.00028.5s100%
73Writer: Palmyra X5100.0%$0.00037.9s100%
74Claude Opus 4.6100.0%$0.00122.4s100%
75GPT-5.2100.0%$0.00113.1s100%
76GPT-4o, May 13th (temp=1)100.0%$0.00075.7s100%
77GPT-4o, May 13th (temp=0)100.0%$0.00075.7s100%
78ByteDance Seed 2.0 Mini100.0%$0.00019.5s100%
79GPT-5.1100.0%$0.00113.9s100%
80o4 Mini100.0%$0.00123.4s100%
81DeepSeek V4 Pro100.0%$0.000110.1s100%
82Claude Opus 4.7100.0%$0.00161.1s100%
83Claude Opus 4.5100.0%$0.00152.3s100%
84o4 Mini High100.0%$0.00133.7s100%
85Claude Opus 4.7 (Reasoning)100.0%$0.00161.4s100%
86Claude 3.5 Sonnet100.0%$0.00115.1s100%
87ByteDance Seed 1.6100.0%$0.00068.0s100%
88Xiaomi MIMO v2.5100.0%$0.00105.4s100%
89GPT-5 Nano100.0%$0.000310.6s100%
90GPT-4o Mini (temp=0)100.0%$0.000012.9s100%
91Qwen 3.5 9B100.0%$0.000112.7s100%
92Qwen 3.5 Flash100.0%$0.000411.2s100%
93MiniMax M2.7100.0%$0.00089.2s100%
94GPT-5.4 (Reasoning)100.0%$0.00163.8s100%
95MiniMax M2.5100.0%$0.00089.4s100%
96Gemma 4 26B (Reasoning)100.0%$0.000113.9s100%
97Mistral Large100.0%$0.00155.7s100%
98Qwen 3.6 Flash100.0%$0.00165.2s100%
99Grok 4.20 (Reasoning)100.0%$0.00166.2s100%
100Xiaomi MIMO v2.5 Pro100.0%$0.00147.6s100%
101Aion 2.0100.0%$0.000811.7s100%
102GPT-5.5 (Reasoning, Low)100.0%$0.00233.1s100%
103Gemma 4 31B (Reasoning)100.0%$0.000117.8s100%
104Qwen 3.6 35B100.0%$0.00168.6s100%
105Qwen 3.5 35B100.0%$0.00206.6s100%
106Qwen 3.5 27B100.0%$0.00178.7s100%
107MoonshotAI: Kimi K2.5100.0%$0.001411.4s100%
108Claude Sonnet 4.6 (Reasoning)100.0%$0.00293.2s100%
109Qwen 3.5 122B100.0%$0.00266.3s100%
110MoonshotAI: Kimi K2.6100.0%$0.001513.0s100%
111GPT-5.5 (Reasoning)100.0%$0.00304.2s100%
112Grok 4.20 (Beta, Reasoning)100.0%$0.00332.1s100%
113Z.AI GLM 5.1100.0%$0.002012.1s100%
114Z.AI GLM 4.7100.0%$0.001119.2s100%
115Grok 4.3 (Reasoning)100.0%$0.001916.0s100%
116Llama 3.1 8B95.0%$0.0000508ms56%
117Inception Mercury95.0%$0.0000522ms56%
118Z.AI GLM 5100.0%$0.001817.3s100%
119Grok 4.20 (Beta)95.0%$0.0002472ms56%
120Grok 4.395.0%$0.0002758ms56%
121GPT-5.4 Nano (Reasoning)95.0%$0.00012.5s56%
122Claude Opus 4.6 (Reasoning)100.0%$0.00433.3s100%
123Rocinante 12B95.0%$0.00003.8s56%
124Stealth: Healer Alpha95.0%$0.00005.4s56%
125Gemini 3.5 Flash (Reasoning)100.0%$0.00482.7s100%
126Z.AI GLM 4.6100.0%$0.001623.6s100%
127Qwen 3.5 397B A17B100.0%$0.002519.5s100%
128Qwen 3.5 Plus (2026-04-20)100.0%$0.002818.1s100%
129Z.AI GLM 4.7 Flash95.0%$0.000210.5s56%
130GPT-4o Mini (temp=1)100.0%$0.000040.9s100%
131Gemini 3.1 Pro (Preview)100.0%$0.00578.2s100%
132Claude Opus 4100.0%$0.00637.5s100%
133Hermes 3 405B90.0%$0.00007.6s40%
134Grok 4100.0%$0.00639.2s100%
135Gemini 3 Pro (Preview)100.0%$0.00686.5s100%
136Gemini 2.5 Flash (Reasoning)90.0%$0.00102.6s40%
137DeepSeek V4 Pro (Reasoning)100.0%$0.003132.0s100%
138Ministral 3B85.0%$0.0000362ms29%
139Ministral 8B85.0%$0.0000462ms29%
140Qwen3.7 Max100.0%$0.006312.8s100%
141Arcee AI: Trinity Large (Preview)85.0%$0.00001.3s29%
142Qwen3.6 Max Preview100.0%$0.005418.9s100%
143Z.AI GLM 5 Turbo90.0%$0.00145.1s40%
144Qwen 3.6 27B95.0%$0.003014.3s56%
145GPT-595.0%$0.00429.5s56%
146Z.AI GLM 4.5 Air100.0%$0.002651.6s100%
147Cohere Command R+ (Aug. 2024)70.0%$0.0005930ms8%
148Gemini 2.5 Pro95.0%$0.00817.1s56%
149Gemini 2.5 Flash50.0%$0.0001510ms0%
150GPT-4.1 Nano50.0%$0.00001.4s0%
98.27%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Grok 4.3100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
Rocinante 12B100100100100100100100100100100100.0%
Ministral 8B1001001001001001001001000080.0%
Ministral 3B1001001001001001001001000080.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
Qwen 3.6 35B100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Grok 4.20100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
LFM2 24B100100100100100100100100100100100.0%
GPT-5100100100100100100100100100090.0%
Qwen 3.6 27B100100100100100100100100100090.0%
Gemini 2.5 Pro100100100100100100100100100090.0%
Stealth: Healer Alpha100100100100100100100100100090.0%
Z.AI GLM 4.7 Flash100100100100100100100100100090.0%
Grok 4.20 (Beta)100100100100100100100100100090.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100090.0%
Inception Mercury100100100100100100100100100090.0%
Grok 4.3100100100100100100100100100090.0%
Ministral 8B100100100100100100100100100090.0%
Llama 3.1 8B100100100100100100100100100090.0%
Ministral 3B100100100100100100100100100090.0%
Rocinante 12B100100100100100100100100100090.0%
Z.AI GLM 5 Turbo1001001001001001001001000080.0%
Gemini 2.5 Flash (Reasoning)1001001001001001001001000080.0%
Hermes 3 405B1001001001001001001001000080.0%
Arcee AI: Trinity Large (Preview)10010010010010010010000070.0%
Cohere Command R+ (Aug. 2024)10010010010000000040.0%
Gemini 2.5 Flash00000000000.0%
GPT-4.1 Nano00000000000.0%