Matches Regex

Test: Voice/dialogue sheets

Avg. Score
60.5%
Scenarios
5

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Claude Sonnet 4.6100.0%$0.00331.9s100%
2Claude Sonnet 4100.0%$0.00332.8s100%
3Claude 3.7 Sonnet100.0%$0.00343.5s100%
4Claude Opus 4.6100.0%$0.00553.9s100%
5Claude Opus 4.6 (Reasoning)100.0%$0.00603.1s100%
6GPT-4.196.0%$0.00172.6s61%
7GPT-4o, May 13th (temp=0)96.0%$0.00375.2s61%
8GPT-4o, Aug. 6th (temp=0)92.0%$0.00222.4s46%
9Gemini 2.5 Flash88.0%$0.0005933ms35%
10ByteDance Seed 1.694.0%$0.001314.2s53%
11Qwen 3.5 Plus (2026-02-15)86.0%$0.00056.6s31%
12Grok 4 Fast84.0%$0.00033.6s27%
13Grok 4.1 Fast84.0%$0.00044.4s27%
14GPT-4o, May 13th (temp=1)90.0%$0.00375.2s40%
15DeepSeek V3 (2025-03-24)84.0%$0.00036.1s27%
16Claude Haiku 4.582.0%$0.00111.6s23%
17Gemini 3 Flash (Preview)80.0%$0.00061.8s20%
18Claude Sonnet 4.6 (Reasoning)90.0%$0.00533.3s40%
19Mistral Large 380.0%$0.00044.1s20%
20GPT-4o, Aug. 6th (temp=1)82.0%$0.00222.3s23%
21DeepSeek-V2 Chat80.0%$0.00018.2s20%
22Claude Sonnet 4.584.0%$0.00332.8s27%
23GPT-4.1 Mini76.0%$0.00032.7s15%
24Stealth: Healer Alpha76.0%$0.00005.0s15%
25Z.AI GLM 4.576.0%$0.00045.3s15%
26Mistral Small 3.2 24B72.0%$0.00012.1s10%
27DeepSeek V3 (2024-12-26)74.0%$0.00034.6s12%
28Llama 3.1 70B72.0%$0.00051.5s10%
29Writer: Palmyra X578.0%$0.00108.1s17%
30DeepSeek V3.176.0%$0.00028.4s15%
31Gemini 3.1 Flash Lite (Preview)70.0%$0.0003979ms8%
32Qwen3 235B A22B Instruct 250772.0%$0.00014.9s10%
33Claude 3.5 Sonnet80.0%$0.00344.5s20%
34Gemini 2.5 Flash Lite (Reasoning)70.0%$0.00032.7s8%
35Gemini 2.5 Flash Lite66.0%$0.0001610ms5%
36Z.AI GLM 5 Turbo78.0%$0.00307.7s17%
37Hermes 3 70B68.0%$0.00026.0s7%
38ByteDance Seed 1.6 Flash66.0%$0.00024.5s5%
39GPT-5.4 (Reasoning)80.0%$0.00495.4s20%
40Gemini 3 Flash (Preview, Reasoning)72.0%$0.00244.3s10%
41Mistral Large 270.0%$0.00184.7s8%
42Cohere Command R+ (Aug. 2024)70.0%$0.00233.2s8%
43Mistral Small 460.0%$0.00011.3s2%
44GPT-4o Mini (temp=0)60.0%$0.00013.7s2%
45Grok 4.20 (Beta)60.0%$0.0009735ms2%
46Claude 3.5 Haiku62.0%$0.00094.6s3%
47Qwen 3.5 122B88.0%$0.007317.5s35%
48Llama 3.1 8B54.0%$0.0001919ms0%
49GPT-5 Mini72.0%$0.001813.0s10%
50GPT-5.4 Mini (Reasoning)64.0%$0.00184.1s4%
51Stealth: Hunter Alpha70.0%$0.000018.7s8%
52ByteDance Seed 2.0 Lite76.0%$0.001820.6s15%
53Qwen 3.5 35B82.0%$0.005117.7s23%
54Gemma 3 12B52.0%$0.00004.1s0%
55MiniMax M2.756.0%$0.00066.8s1%
56Z.AI GLM 4.7 Flash70.0%$0.000521.5s8%
57Arcee AI: Trinity Large (Preview)48.0%$0.00004.8s0%
58Claude Opus 4.570.0%$0.00552.9s8%
59Hermes 3 405B58.0%$0.000013.3s1%
60Mistral Small 4 (Reasoning)48.0%$0.00044.2s0%
61DeepSeek V3.250.0%$0.00026.7s0%
62GPT-4o Mini (temp=1)60.0%$0.000115.0s2%
63GPT-5.4 Mini (Reasoning, Low)48.0%$0.00102.2s0%
64Ministral 3 8B40.0%$0.00011.2s0%
65Qwen 3.5 Flash70.0%$0.001222.0s8%
66Rocinante 12B52.0%$0.00029.1s0%
67GPT-4.1 Nano40.0%$0.00012.0s0%
68GPT-5.4 Nano (Reasoning, Low)40.0%$0.00021.4s0%
69GPT-5.254.0%$0.00252.1s0%
70Mistral Small Creative38.0%$0.00011.2s0%
71Grok 4.20 (Beta, Reasoning)78.0%$0.00759.9s17%
72GPT-5.4 (Reasoning, Low)58.0%$0.00343.2s1%
73GPT-5.164.0%$0.00416.9s4%
74GPT-5.4 Mini40.0%$0.00091.1s0%
75MiniMax M2.548.0%$0.00067.7s0%
76Claude 3 Haiku40.0%$0.00033.7s0%
77Inception Mercury34.0%$0.0001961ms0%
78Aion 2.066.0%$0.001520.9s5%
79WizardLM 2 8x22b42.0%$0.00047.6s0%
80o4 Mini60.0%$0.00368.6s2%
81Nemotron 3 Super42.0%$0.00009.8s0%
82Gemma 3 4B30.0%$0.00001.9s0%
83GPT-5.4 Nano30.0%$0.00021.4s0%
84Gemini 2.5 Flash (Reasoning)38.0%$0.00142.7s0%
85Inception Mercury 226.0%$0.0004823ms0%
86Gemma 3 27B30.0%$0.00015.0s0%
87GPT-5.4 Nano (Reasoning)24.0%$0.00031.5s0%
88Llama 3.1 Nemotron 70B30.0%$0.00026.9s0%
89Ministral 3 14B20.0%$0.00011.5s0%
90Mistral NeMO22.0%$0.00013.2s0%
91Mistral Medium 3.122.0%$0.00043.2s0%
92Qwen 3.5 27B66.0%$0.004420.4s5%
93GPT-5.434.0%$0.00302.0s0%
94Qwen 2.5 72B20.0%$0.00024.9s0%
95Qwen 3 32B50.0%$0.000624.6s0%
96Ministral 3B10.0%$0.00001.1s0%
97GPT-5 Nano44.0%$0.000721.1s0%
98ByteDance Seed 2.0 Mini64.0%$0.000737.7s4%
99Claude Opus 486.0%$0.0177.0s31%
100Ministral 8B8.0%$0.00011.5s0%
101Nemotron 3 Nano22.0%$0.000213.0s0%
102Ministral 3 3B0.0%$0.0001835ms0%
103LFM2 24B0.0%$0.00002.9s0%
104Z.AI GLM 556.0%$0.003526.7s1%
105Arcee AI: Trinity Mini8.0%$0.00018.1s0%
106Gemini 3.1 Pro (Preview)74.0%$0.01312.9s12%
107Qwen 3.5 9B76.0%$0.00071.1m15%
108MoonshotAI: Kimi K2.550.0%$0.004030.8s0%
109Z.AI GLM 4.654.0%$0.002539.8s0%
110Grok 468.0%$0.01217.7s7%
111Stealth: Aurora Alpha34.0%1.7s0%
112Mistral Large26.0%$0.00706.2s0%
113GPT-558.0%$0.01119.2s1%
114Gemini 3 Pro (Preview)68.0%$0.01610.7s7%
115Gemini 2.5 Pro40.0%$0.0118.4s0%
116Qwen 3.5 397B A17B82.0%$0.00861.1m23%
117Z.AI GLM 4.760.0%$0.00271.1m2%
118o4 Mini High58.0%$0.005258.3s1%
60.47%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100090.0%
GPT-4.1100100100100100100100100100090.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100090.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100090.0%
GPT-4.1 Mini100100100100100100100100100090.0%
ByteDance Seed 1.61001001001001001001001000080.0%
Gemini 3 Flash (Preview, Reasoning)1001001001001001001001000080.0%
GPT-4o, May 13th (temp=0)1001001001001001001001000080.0%
DeepSeek V3.21001001001001001001001000080.0%
Aion 2.010010010010010010010000070.0%
DeepSeek V3.110010010010010010010000070.0%
Z.AI GLM 5 Turbo100100100100100100000060.0%
Qwen 3.5 397B A17B100100100100100100000060.0%
Z.AI GLM 5100100100100100100000060.0%
Grok 4.1 Fast100100100100100100000060.0%
Grok 4100100100100100100000060.0%
Stealth: Hunter Alpha100100100100100100000060.0%
Gemini 2.5 Flash (Reasoning)100100100100100100000060.0%
Grok 4 Fast100100100100100100000060.0%
Stealth: Healer Alpha100100100100100100000060.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100000060.0%
GPT-5.4 (Reasoning)1001001001001000000050.0%
Qwen 3.5 27B1001001001001000000050.0%
Claude Opus 4.51001001001001000000050.0%
Z.AI GLM 4.61001001001001000000050.0%
DeepSeek V3 (2024-12-26)1001001001001000000050.0%
DeepSeek V3 (2025-03-24)1001001001001000000050.0%
Gemma 3 12B1001001001001000000050.0%
Z.AI GLM 4.710010010010000000040.0%
Gemini 2.5 Pro10010010010000000040.0%
Qwen 3.5 9B10010010010000000040.0%
MiniMax M2.5100100100000000030.0%
Qwen 3.5 35B100100100000000030.0%
Gemini 3.1 Flash Lite (Preview)100100100000000030.0%
Z.AI GLM 4.7 Flash100100100000000030.0%
GPT-4o, Aug. 6th (temp=1)100100100000000030.0%
GPT-5.11001000000000020.0%
Grok 4.20 (Beta, Reasoning)1001000000000020.0%
GPT-5.4 (Reasoning, Low)1001000000000020.0%
ByteDance Seed 2.0 Mini1001000000000020.0%
Qwen 3.5 Flash1001000000000020.0%
Qwen3 235B A22B Instruct 25071001000000000020.0%
Arcee AI: Trinity Large (Preview)1001000000000020.0%
ByteDance Seed 1.6 Flash1001000000000020.0%
Cohere Command R+ (Aug. 2024)1001000000000020.0%
MoonshotAI: Kimi K2.510000000000010.0%
GPT-5.4 Mini (Reasoning)10000000000010.0%
GPT-5.210000000000010.0%
MiniMax M2.710000000000010.0%
Z.AI GLM 4.510000000000010.0%
Stealth: Aurora Alpha10000000000010.0%
Gemini 2.5 Flash Lite10000000000010.0%
Inception Mercury10000000000010.0%
GPT-5 Mini00000000000.0%
GPT-500000000000.0%
o4 Mini High00000000000.0%
o4 Mini00000000000.0%
GPT-5.4 Mini (Reasoning, Low)00000000000.0%
Mistral Large 300000000000.0%
DeepSeek-V2 Chat00000000000.0%
Nemotron 3 Super00000000000.0%
GPT-5.400000000000.0%
Claude 3.5 Sonnet00000000000.0%
Grok 4.20 (Beta)00000000000.0%
Inception Mercury 200000000000.0%
Claude 3.5 Haiku00000000000.0%
Hermes 3 405B00000000000.0%
GPT-5 Nano00000000000.0%
GPT-5.4 Mini00000000000.0%
Mistral Small 4 (Reasoning)00000000000.0%
Qwen 3 32B00000000000.0%
GPT-5.4 Nano (Reasoning)00000000000.0%
Mistral Large00000000000.0%
Writer: Palmyra X500000000000.0%
GPT-5.4 Nano (Reasoning, Low)00000000000.0%
GPT-4o Mini (temp=1)00000000000.0%
Mistral Small 3.2 24B00000000000.0%
Llama 3.1 70B00000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Gemma 3 27B00000000000.0%
Mistral Medium 3.100000000000.0%
Nemotron 3 Nano00000000000.0%
Mistral Small 400000000000.0%
Qwen 2.5 72B00000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
GPT-5.4 Nano00000000000.0%
Mistral Small Creative00000000000.0%
Hermes 3 70B00000000000.0%
Ministral 3 14B00000000000.0%
GPT-4.1 Nano00000000000.0%
Ministral 3 8B00000000000.0%
Claude 3 Haiku00000000000.0%
WizardLM 2 8x22b00000000000.0%
Arcee AI: Trinity Mini00000000000.0%
Gemma 3 4B00000000000.0%
Ministral 3 3B00000000000.0%
Mistral NeMO00000000000.0%
Ministral 8B00000000000.0%
Llama 3.1 8B00000000000.0%
Ministral 3B00000000000.0%
LFM2 24B00000000000.0%
Rocinante 12B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100090.0%
GPT-4.1100100100100100100100100100090.0%
Stealth: Hunter Alpha100100100100100100100100100090.0%
ByteDance Seed 2.0 Lite100100100100100100100100100090.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100090.0%
GPT-4.1 Mini100100100100100100100100100090.0%
DeepSeek V3.1100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
Gemini 2.5 Flash Lite100100100100100100100100100090.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100090.0%
Mistral Small 4100100100100100100100100100090.0%
Grok 4.1 Fast1001001001001001001001000080.0%
Claude Sonnet 4.51001001001001001001001000080.0%
Qwen 3.5 35B1001001001001001001001000080.0%
Grok 4 Fast1001001001001001001001000080.0%
Qwen 3.5 9B1001001001001001001001000080.0%
Cohere Command R+ (Aug. 2024)1001001001001001001001000080.0%
GPT-5 Mini10010010010010010010000070.0%
Grok 4.20 (Beta, Reasoning)10010010010010010010000070.0%
Qwen 3.5 Flash10010010010010010010000070.0%
Z.AI GLM 4.7 Flash10010010010010010010000070.0%
Mistral Large 210010010010010010010000070.0%
Llama 3.1 70B10010010010010010010000070.0%
ByteDance Seed 1.6 Flash10010010010010010010000070.0%
Qwen 3.5 122B100100100100100100000060.0%
Aion 2.0100100100100100100000060.0%
MiniMax M2.7100100100100100100000060.0%
ByteDance Seed 2.0 Mini100100100100100100000060.0%
Gemini 2.5 Flash100100100100100100000060.0%
Mistral Small 3.2 24B100100100100100100000060.0%
Hermes 3 70B100100100100100100000060.0%
Rocinante 12B100100100100100100000060.0%
Gemini 3.1 Pro (Preview)1001001001001000000050.0%
GPT-5.4 (Reasoning)1001001001001000000050.0%
Qwen 3.5 397B A17B1001001001001000000050.0%
Qwen 3.5 27B1001001001001000000050.0%
MiniMax M2.51001001001001000000050.0%
Stealth: Healer Alpha1001001001001000000050.0%
DeepSeek V3.21001001001001000000050.0%
Qwen 3 32B1001001001001000000050.0%
GPT-5.210010010010000000040.0%
Gemini 2.5 Flash Lite (Reasoning)10010010010000000040.0%
Gemma 3 4B10010010010000000040.0%
Llama 3.1 8B10010010010000000040.0%
Z.AI GLM 5 Turbo100100100000000030.0%
Gemini 3 Flash (Preview, Reasoning)100100100000000030.0%
Z.AI GLM 4.6100100100000000030.0%
Gemini 3 Pro (Preview)100100100000000030.0%
Gemini 3 Flash (Preview)100100100000000030.0%
Mistral Small 4 (Reasoning)100100100000000030.0%
Z.AI GLM 51001000000000020.0%
GPT-5.4 Mini (Reasoning)1001000000000020.0%
o4 Mini High1001000000000020.0%
Z.AI GLM 4.71001000000000020.0%
Gemini 2.5 Pro1001000000000020.0%
o4 Mini1001000000000020.0%
GPT-5.4 Mini (Reasoning, Low)1001000000000020.0%
Gemma 3 27B1001000000000020.0%
Arcee AI: Trinity Large (Preview)1001000000000020.0%
Ministral 3B1001000000000020.0%
Grok 410000000000010.0%
Gemini 2.5 Flash (Reasoning)10000000000010.0%
Claude Haiku 4.510000000000010.0%
Nemotron 3 Super10000000000010.0%
Inception Mercury10000000000010.0%
GPT-5.4 Nano10000000000010.0%
WizardLM 2 8x22b10000000000010.0%
Arcee AI: Trinity Mini10000000000010.0%
Ministral 8B10000000000010.0%
GPT-5.100000000000.0%
GPT-500000000000.0%
GPT-5.4 (Reasoning, Low)00000000000.0%
MoonshotAI: Kimi K2.500000000000.0%
Claude Opus 4.500000000000.0%
GPT-5.400000000000.0%
Inception Mercury 200000000000.0%
Stealth: Aurora Alpha00000000000.0%
GPT-5 Nano00000000000.0%
GPT-5.4 Mini00000000000.0%
GPT-5.4 Nano (Reasoning)00000000000.0%
Mistral Large00000000000.0%
GPT-5.4 Nano (Reasoning, Low)00000000000.0%
Gemma 3 12B00000000000.0%
Mistral Medium 3.100000000000.0%
Nemotron 3 Nano00000000000.0%
Qwen 2.5 72B00000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
Mistral Small Creative00000000000.0%
Ministral 3 14B00000000000.0%
GPT-4.1 Nano00000000000.0%
Ministral 3 8B00000000000.0%
Claude 3 Haiku00000000000.0%
Ministral 3 3B00000000000.0%
Mistral NeMO00000000000.0%
LFM2 24B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100090.0%
Qwen 3.5 122B100100100100100100100100100090.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100090.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100090.0%
Grok 4 Fast100100100100100100100100100090.0%
Qwen 3.5 9B100100100100100100100100100090.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100090.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100090.0%
Hermes 3 405B100100100100100100100100100090.0%
DeepSeek V3.1100100100100100100100100100090.0%
Gemini 2.5 Flash Lite100100100100100100100100100090.0%
Llama 3.1 70B100100100100100100100100100090.0%
ByteDance Seed 1.6 Flash100100100100100100100100100090.0%
Mistral Small Creative100100100100100100100100100090.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100090.0%
GPT-5.21001001001001001001001000080.0%
Grok 4.1 Fast1001001001001001001001000080.0%
Grok 41001001001001001001001000080.0%
GPT-5.41001001001001001001001000080.0%
Mistral Large 21001001001001001001001000080.0%
Rocinante 12B1001001001001001001001000080.0%
GPT-5.4 (Reasoning, Low)10010010010010010010000070.0%
Qwen 3.5 Flash10010010010010010010000070.0%
Gemini 3.1 Flash Lite (Preview)10010010010010010010000070.0%
Gemini 3 Flash (Preview)10010010010010010010000070.0%
DeepSeek V3 (2024-12-26)10010010010010010010000070.0%
Qwen 3 32B10010010010010010010000070.0%
GPT-5.4 Nano (Reasoning, Low)10010010010010010010000070.0%
MoonshotAI: Kimi K2.5100100100100100100000060.0%
Z.AI GLM 4.7 Flash100100100100100100000060.0%
ByteDance Seed 2.0 Lite100100100100100100000060.0%
Mistral Small 4 (Reasoning)100100100100100100000060.0%
Llama 3.1 Nemotron 70B100100100100100100000060.0%
Llama 3.1 8B100100100100100100000060.0%
Gemini 3.1 Pro (Preview)1001001001001000000050.0%
Aion 2.01001001001001000000050.0%
Gemini 3 Pro (Preview)1001001001001000000050.0%
MiniMax M2.51001001001001000000050.0%
Z.AI GLM 4.71001001001001000000050.0%
Nemotron 3 Super1001001001001000000050.0%
DeepSeek V3.21001001001001000000050.0%
Arcee AI: Trinity Large (Preview)1001001001001000000050.0%
Qwen 3.5 27B10010010010000000040.0%
Stealth: Hunter Alpha10010010010000000040.0%
GPT-5 Nano10010010010000000040.0%
GPT-5.4 Nano (Reasoning)10010010010000000040.0%
Mistral NeMO10010010010000000040.0%
Z.AI GLM 5100100100000000030.0%
Z.AI GLM 4.6100100100000000030.0%
Gemini 2.5 Flash Lite (Reasoning)100100100000000030.0%
Stealth: Aurora Alpha100100100000000030.0%
Mistral Large100100100000000030.0%
Inception Mercury100100100000000030.0%
MiniMax M2.71001000000000020.0%
Mistral Medium 3.11001000000000020.0%
Ministral 3B1001000000000020.0%
Gemini 2.5 Pro10000000000010.0%
Inception Mercury 210000000000010.0%
Nemotron 3 Nano10000000000010.0%
GPT-5.4 Nano10000000000010.0%
Gemma 3 4B10000000000010.0%
Gemini 2.5 Flash (Reasoning)00000000000.0%
Gemma 3 27B00000000000.0%
Qwen 2.5 72B00000000000.0%
Ministral 3 14B00000000000.0%
Arcee AI: Trinity Mini00000000000.0%
Ministral 3 3B00000000000.0%
Ministral 8B00000000000.0%
LFM2 24B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100090.0%
GPT-5100100100100100100100100100090.0%
Qwen 3.5 27B100100100100100100100100100090.0%
GPT-5.2100100100100100100100100100090.0%
MiniMax M2.7100100100100100100100100100090.0%
Z.AI GLM 4.7100100100100100100100100100090.0%
Grok 4100100100100100100100100100090.0%
Qwen 3.5 Flash100100100100100100100100100090.0%
Grok 4 Fast100100100100100100100100100090.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100090.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100090.0%
GPT-5.4100100100100100100100100100090.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100090.0%
Gemini 2.5 Flash100100100100100100100100100090.0%
Mistral Medium 3.1100100100100100100100100100090.0%
Llama 3.1 Nemotron 70B100100100100100100100100100090.0%
MoonshotAI: Kimi K2.51001001001001001001001000080.0%
o4 Mini1001001001001001001001000080.0%
Qwen 3.5 Plus (2026-02-15)1001001001001001001001000080.0%
Stealth: Healer Alpha1001001001001001001001000080.0%
GPT-5 Nano1001001001001001001001000080.0%
DeepSeek V3.11001001001001001001001000080.0%
Qwen 3 32B1001001001001001001001000080.0%
GPT-5.4 Nano (Reasoning)1001001001001001001001000080.0%
ByteDance Seed 1.6 Flash1001001001001001001001000080.0%
Llama 3.1 8B1001001001001001001001000080.0%
Gemini 3.1 Pro (Preview)10010010010010010010000070.0%
Z.AI GLM 510010010010010010010000070.0%
Gemini 3 Flash (Preview, Reasoning)10010010010010010010000070.0%
o4 Mini High10010010010010010010000070.0%
Stealth: Hunter Alpha10010010010010010010000070.0%
Qwen 3.5 9B10010010010010010010000070.0%
GPT-4o, May 13th (temp=1)10010010010010010010000070.0%
Mistral Small 4 (Reasoning)10010010010010010010000070.0%
DeepSeek V3.210010010010010010010000070.0%
Gemini 2.5 Flash Lite10010010010010010010000070.0%
Rocinante 12B10010010010010010010000070.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100000060.0%
Aion 2.0100100100100100100000060.0%
Z.AI GLM 4.6100100100100100100000060.0%
Gemini 3 Pro (Preview)100100100100100100000060.0%
Mistral NeMO100100100100100100000060.0%
Nemotron 3 Super1001001001001000000050.0%
Gemini 2.5 Pro100100100000000030.0%
ByteDance Seed 2.0 Lite100100100000000030.0%
Stealth: Aurora Alpha100100100000000030.0%
Gemma 3 27B100100100000000030.0%
Ministral 8B100100100000000030.0%
Gemini 2.5 Flash (Reasoning)1001000000000020.0%
Inception Mercury 21001000000000020.0%
Inception Mercury1001000000000020.0%
MiniMax M2.510000000000010.0%
Nemotron 3 Nano10000000000010.0%
Ministral 3B10000000000010.0%
Qwen 2.5 72B00000000000.0%
Arcee AI: Trinity Mini00000000000.0%
Ministral 3 3B00000000000.0%
LFM2 24B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100090.0%
Aion 2.0100100100100100100100100100090.0%
Stealth: Hunter Alpha100100100100100100100100100090.0%
Stealth: Healer Alpha100100100100100100100100100090.0%
Z.AI GLM 4.7 Flash100100100100100100100100100090.0%
Gemini 2.5 Flash100100100100100100100100100090.0%
Writer: Palmyra X5100100100100100100100100100090.0%
Nemotron 3 Nano100100100100100100100100100090.0%
Llama 3.1 8B100100100100100100100100100090.0%
GPT-4o, Aug. 6th (temp=1)1001001001001001001001000080.0%
Mistral Small 4 (Reasoning)1001001001001001001001000080.0%
DeepSeek V3 (2025-03-24)1001001001001001001001000080.0%
Hermes 3 70B1001001001001001001001000080.0%
Z.AI GLM 4.510010010010010010010000070.0%
Gemini 2.5 Flash Lite10010010010010010010000070.0%
ByteDance Seed 1.6 Flash10010010010010010010000070.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100000060.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100000060.0%
DeepSeek V3 (2024-12-26)100100100100100100000060.0%
Cohere Command R+ (Aug. 2024)100100100100100100000060.0%
GPT-5.21001001001001000000050.0%
DeepSeek V3.11001001001001000000050.0%
Qwen 3 32B1001001001001000000050.0%
Qwen3 235B A22B Instruct 25071001001001001000000050.0%
Arcee AI: Trinity Large (Preview)1001001001001000000050.0%
Rocinante 12B1001001001001000000050.0%
Claude Sonnet 4.510010010010000000040.0%
ByteDance Seed 2.0 Mini10010010010000000040.0%
Claude Opus 4100100100000000030.0%
GPT-5.4 Mini (Reasoning, Low)100100100000000030.0%
GPT-5.4 Nano (Reasoning, Low)100100100000000030.0%
GPT-5.4 Nano100100100000000030.0%
Arcee AI: Trinity Mini100100100000000030.0%
Claude 3.5 Haiku10000000000010.0%
Gemma 3 12B10000000000010.0%
Mistral Small 410000000000010.0%
Mistral NeMO10000000000010.0%
GPT-5.400000000000.0%
Grok 4.20 (Beta)00000000000.0%
GPT-4.1 Mini00000000000.0%
Hermes 3 405B00000000000.0%
GPT-5.4 Mini00000000000.0%
Mistral Large 200000000000.0%
DeepSeek V3.200000000000.0%
GPT-5.4 Nano (Reasoning)00000000000.0%
Mistral Large00000000000.0%
GPT-4o Mini (temp=1)00000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Mistral Medium 3.100000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
Mistral Small Creative00000000000.0%
Ministral 3 14B00000000000.0%
GPT-4.1 Nano00000000000.0%
Ministral 3 8B00000000000.0%
Claude 3 Haiku00000000000.0%
WizardLM 2 8x22b00000000000.0%
Gemma 3 4B00000000000.0%
Ministral 3 3B00000000000.0%
Ministral 8B00000000000.0%
Ministral 3B00000000000.0%
LFM2 24B00000000000.0%