Matches Regex

Test: Voice/dialogue sheets

Avg. Score
61.5%
Scenarios
5

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Claude Sonnet 4.6100.0%$0.00331.9s100%
2Claude Sonnet 4100.0%$0.00332.8s100%
3Claude 3.7 Sonnet100.0%$0.00343.5s100%
4Claude Opus 4.6100.0%$0.00553.9s100%
5GPT-4.196.0%$0.00172.6s61%
6GPT-4o, May 13th (temp=0)96.0%$0.00375.2s61%
7GPT-4o, Aug. 6th (temp=0)92.0%$0.00222.4s46%
8Gemini 2.5 Flash88.0%$0.0005933ms35%
9ByteDance Seed 1.694.0%$0.001314.2s53%
10Qwen 3.5 Plus (2026-02-15)86.0%$0.00056.6s31%
11Grok 4 Fast84.0%$0.00033.6s27%
12Grok 4.1 Fast84.0%$0.00044.4s27%
13GPT-4o, May 13th (temp=1)90.0%$0.00375.2s40%
14DeepSeek V3 (2025-03-24)84.0%$0.00036.1s27%
15Claude Haiku 4.582.0%$0.00111.6s23%
16Gemini 3 Flash (Preview)80.0%$0.00061.8s20%
17Mistral Large 380.0%$0.00044.1s20%
18GPT-4o, Aug. 6th (temp=1)82.0%$0.00222.3s23%
19DeepSeek-V2 Chat80.0%$0.00018.2s20%
20Claude Sonnet 4.584.0%$0.00332.8s27%
21GPT-4.1 Mini76.0%$0.00032.7s15%
22Z.AI GLM 4.576.0%$0.00045.3s15%
23Mistral Small 3.2 24B72.0%$0.00012.1s10%
24DeepSeek V3 (2024-12-26)74.0%$0.00034.6s12%
25Llama 3.1 70B72.0%$0.00051.5s10%
26Writer: Palmyra X578.0%$0.00108.1s17%
27DeepSeek V3.176.0%$0.00028.4s15%
28Claude 3.5 Sonnet80.0%$0.00344.5s20%
29Gemini 2.5 Flash Lite66.0%$0.0001610ms5%
30Hermes 3 70B68.0%$0.00026.0s7%
31ByteDance Seed 1.6 Flash66.0%$0.00024.5s5%
32Mistral Large 270.0%$0.00184.7s8%
33Cohere Command R+ (Aug. 2024)70.0%$0.00233.2s8%
34GPT-4o Mini (temp=0)60.0%$0.00013.7s2%
35Claude 3.5 Haiku62.0%$0.00094.6s3%
36Llama 3.1 8B54.0%$0.0001919ms0%
37GPT-5 Mini72.0%$0.001813.0s10%
38Gemma 3 12B52.0%$0.00004.1s0%
39Z.AI GLM 4.7 Flash70.0%$0.000521.5s8%
40Arcee AI: Trinity Large (Preview)48.0%$0.00004.8s0%
41Claude Opus 4.570.0%$0.00552.9s8%
42Hermes 3 405B58.0%$0.000013.3s1%
43DeepSeek V3.250.0%$0.00026.7s0%
44GPT-4o Mini (temp=1)60.0%$0.000115.0s2%
45Ministral 3 8B40.0%$0.00011.2s0%
46Rocinante 12B52.0%$0.00029.1s0%
47GPT-4.1 Nano40.0%$0.00012.0s0%
48GPT-5.254.0%$0.00252.1s0%
49Mistral Small Creative38.0%$0.00011.2s0%
50GPT-5.164.0%$0.00416.9s4%
51Minimax M2.548.0%$0.00067.7s0%
52Claude 3 Haiku40.0%$0.00033.7s0%
53WizardLM 2 8x22b42.0%$0.00047.6s0%
54o4 Mini60.0%$0.00368.6s2%
55Gemma 3 4B30.0%$0.00001.9s0%
56Gemma 3 27B30.0%$0.00015.0s0%
57Llama 3.1 Nemotron 70B30.0%$0.00026.9s0%
58Ministral 3 14B20.0%$0.00011.5s0%
59Mistral NeMO22.0%$0.00013.2s0%
60Mistral Medium 3.122.0%$0.00043.2s0%
61Qwen 2.5 72B20.0%$0.00024.9s0%
62Ministral 3B10.0%$0.00001.1s0%
63GPT-5 Nano44.0%$0.000721.1s0%
64Claude Opus 486.0%$0.0177.0s31%
65Ministral 8B8.0%$0.00011.5s0%
66Ministral 3 3B0.0%$0.0001835ms0%
67Z.AI GLM 556.0%$0.003526.7s1%
68Arcee AI: Trinity Mini8.0%$0.00018.1s0%
69Gemini 3.1 Pro (Preview)74.0%$0.01312.9s12%
70MoonshotAI: Kimi K2.550.0%$0.004030.8s0%
71Z.AI GLM 4.654.0%$0.002539.8s0%
72Grok 468.0%$0.01217.7s7%
73Stealth: Aurora Alpha34.0%1.7s0%
74Mistral Large26.0%$0.00706.2s0%
75GPT-558.0%$0.01119.2s1%
76Gemini 3 Pro (Preview)68.0%$0.01610.7s7%
77Gemini 2.5 Pro40.0%$0.0118.4s0%
78Qwen 3.5 397B A17B82.0%$0.00861.1m23%
79Z.AI GLM 4.760.0%$0.00271.1m2%
80o4 Mini High58.0%$0.005258.3s1%
61.48%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100090.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100090.0%
GPT-4.1 Mini100100100100100100100100100090.0%
ByteDance Seed 1.61001001001001001001001000080.0%
GPT-4o, May 13th (temp=0)1001001001001001001001000080.0%
DeepSeek V3.21001001001001001001001000080.0%
DeepSeek V3.110010010010010010010000070.0%
Qwen 3.5 397B A17B100100100100100100000060.0%
Z.AI GLM 5100100100100100100000060.0%
Grok 4100100100100100100000060.0%
Grok 4.1 Fast100100100100100100000060.0%
Grok 4 Fast100100100100100100000060.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100000060.0%
Claude Opus 4.51001001001001000000050.0%
Z.AI GLM 4.61001001001001000000050.0%
DeepSeek V3 (2025-03-24)1001001001001000000050.0%
DeepSeek V3 (2024-12-26)1001001001001000000050.0%
Gemma 3 12B1001001001001000000050.0%
Gemini 2.5 Pro10010010010000000040.0%
Z.AI GLM 4.710010010010000000040.0%
Minimax M2.5100100100000000030.0%
Z.AI GLM 4.7 Flash100100100000000030.0%
GPT-4o, Aug. 6th (temp=1)100100100000000030.0%
GPT-5.11001000000000020.0%
ByteDance Seed 1.6 Flash1001000000000020.0%
Arcee AI: Trinity Large (Preview)1001000000000020.0%
Cohere Command R+ (Aug. 2024)1001000000000020.0%
MoonshotAI: Kimi K2.510000000000010.0%
GPT-5.210000000000010.0%
Stealth: Aurora Alpha10000000000010.0%
Z.AI GLM 4.510000000000010.0%
Gemini 2.5 Flash Lite10000000000010.0%
GPT-5 Mini00000000000.0%
o4 Mini High00000000000.0%
GPT-500000000000.0%
o4 Mini00000000000.0%
GPT-5 Nano00000000000.0%
Claude 3.5 Sonnet00000000000.0%
Mistral Large 300000000000.0%
DeepSeek-V2 Chat00000000000.0%
Claude 3.5 Haiku00000000000.0%
Mistral Medium 3.100000000000.0%
Writer: Palmyra X500000000000.0%
Hermes 3 405B00000000000.0%
GPT-4o Mini (temp=1)00000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Llama 3.1 70B00000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
Mistral Large00000000000.0%
Gemma 3 27B00000000000.0%
Mistral Small Creative00000000000.0%
Ministral 3 14B00000000000.0%
Qwen 2.5 72B00000000000.0%
Mistral Small 3.2 24B00000000000.0%
Hermes 3 70B00000000000.0%
Claude 3 Haiku00000000000.0%
Ministral 3 8B00000000000.0%
Arcee AI: Trinity Mini00000000000.0%
Ministral 3 3B00000000000.0%
Llama 3.1 8B00000000000.0%
Mistral NeMO00000000000.0%
GPT-4.1 Nano00000000000.0%
Gemma 3 4B00000000000.0%
Ministral 8B00000000000.0%
WizardLM 2 8x22b00000000000.0%
Ministral 3B00000000000.0%
Rocinante 12B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100090.0%
GPT-4.1100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100090.0%
GPT-4.1 Mini100100100100100100100100100090.0%
DeepSeek V3.1100100100100100100100100100090.0%
Gemini 2.5 Flash Lite100100100100100100100100100090.0%
Claude Sonnet 4.51001001001001001001001000080.0%
Grok 4.1 Fast1001001001001001001001000080.0%
Grok 4 Fast1001001001001001001001000080.0%
Cohere Command R+ (Aug. 2024)1001001001001001001001000080.0%
GPT-5 Mini10010010010010010010000070.0%
Z.AI GLM 4.7 Flash10010010010010010010000070.0%
ByteDance Seed 1.6 Flash10010010010010010010000070.0%
Mistral Large 210010010010010010010000070.0%
Llama 3.1 70B10010010010010010010000070.0%
Gemini 2.5 Flash100100100100100100000060.0%
Mistral Small 3.2 24B100100100100100100000060.0%
Hermes 3 70B100100100100100100000060.0%
Rocinante 12B100100100100100100000060.0%
Gemini 3.1 Pro (Preview)1001001001001000000050.0%
Qwen 3.5 397B A17B1001001001001000000050.0%
Minimax M2.51001001001001000000050.0%
DeepSeek V3.21001001001001000000050.0%
GPT-5.210010010010000000040.0%
Llama 3.1 8B10010010010000000040.0%
Gemma 3 4B10010010010000000040.0%
Gemini 3 Pro (Preview)100100100000000030.0%
Z.AI GLM 4.6100100100000000030.0%
Gemini 3 Flash (Preview)100100100000000030.0%
o4 Mini High1001000000000020.0%
o4 Mini1001000000000020.0%
Z.AI GLM 51001000000000020.0%
Gemini 2.5 Pro1001000000000020.0%
Z.AI GLM 4.71001000000000020.0%
Gemma 3 27B1001000000000020.0%
Arcee AI: Trinity Large (Preview)1001000000000020.0%
Ministral 3B1001000000000020.0%
Grok 410000000000010.0%
Claude Haiku 4.510000000000010.0%
Arcee AI: Trinity Mini10000000000010.0%
Ministral 8B10000000000010.0%
WizardLM 2 8x22b10000000000010.0%
GPT-5.100000000000.0%
MoonshotAI: Kimi K2.500000000000.0%
Claude Opus 4.500000000000.0%
GPT-500000000000.0%
Stealth: Aurora Alpha00000000000.0%
GPT-5 Nano00000000000.0%
Mistral Medium 3.100000000000.0%
Gemma 3 12B00000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
Mistral Large00000000000.0%
Mistral Small Creative00000000000.0%
Ministral 3 14B00000000000.0%
Qwen 2.5 72B00000000000.0%
Claude 3 Haiku00000000000.0%
Ministral 3 8B00000000000.0%
Ministral 3 3B00000000000.0%
Mistral NeMO00000000000.0%
GPT-4.1 Nano00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100090.0%
Grok 4 Fast100100100100100100100100100090.0%
ByteDance Seed 1.6 Flash100100100100100100100100100090.0%
DeepSeek V3.1100100100100100100100100100090.0%
Hermes 3 405B100100100100100100100100100090.0%
Llama 3.1 70B100100100100100100100100100090.0%
Gemini 2.5 Flash Lite100100100100100100100100100090.0%
Mistral Small Creative100100100100100100100100100090.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100090.0%
GPT-5.21001001001001001001001000080.0%
Grok 41001001001001001001001000080.0%
Grok 4.1 Fast1001001001001001001001000080.0%
Mistral Large 21001001001001001001001000080.0%
Rocinante 12B1001001001001001001001000080.0%
Gemini 3 Flash (Preview)10010010010010010010000070.0%
DeepSeek V3 (2024-12-26)10010010010010010010000070.0%
MoonshotAI: Kimi K2.5100100100100100100000060.0%
Z.AI GLM 4.7 Flash100100100100100100000060.0%
Llama 3.1 Nemotron 70B100100100100100100000060.0%
Llama 3.1 8B100100100100100100000060.0%
Gemini 3.1 Pro (Preview)1001001001001000000050.0%
Z.AI GLM 4.71001001001001000000050.0%
Gemini 3 Pro (Preview)1001001001001000000050.0%
Minimax M2.51001001001001000000050.0%
DeepSeek V3.21001001001001000000050.0%
Arcee AI: Trinity Large (Preview)1001001001001000000050.0%
GPT-5 Nano10010010010000000040.0%
Mistral NeMO10010010010000000040.0%
Z.AI GLM 5100100100000000030.0%
Z.AI GLM 4.6100100100000000030.0%
Stealth: Aurora Alpha100100100000000030.0%
Mistral Large100100100000000030.0%
Mistral Medium 3.11001000000000020.0%
Ministral 3B1001000000000020.0%
Gemini 2.5 Pro10000000000010.0%
Gemma 3 4B10000000000010.0%
Gemma 3 27B00000000000.0%
Ministral 3 14B00000000000.0%
Qwen 2.5 72B00000000000.0%
Arcee AI: Trinity Mini00000000000.0%
Ministral 3 3B00000000000.0%
Ministral 8B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100090.0%
GPT-5100100100100100100100100100090.0%
GPT-5.2100100100100100100100100100090.0%
Z.AI GLM 4.7100100100100100100100100100090.0%
Grok 4100100100100100100100100100090.0%
Grok 4 Fast100100100100100100100100100090.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100090.0%
Mistral Medium 3.1100100100100100100100100100090.0%
Gemini 2.5 Flash100100100100100100100100100090.0%
Llama 3.1 Nemotron 70B100100100100100100100100100090.0%
MoonshotAI: Kimi K2.51001001001001001001001000080.0%
o4 Mini1001001001001001001001000080.0%
Qwen 3.5 Plus (2026-02-15)1001001001001001001001000080.0%
GPT-5 Nano1001001001001001001001000080.0%
ByteDance Seed 1.6 Flash1001001001001001001001000080.0%
DeepSeek V3.11001001001001001001001000080.0%
Llama 3.1 8B1001001001001001001001000080.0%
Gemini 3.1 Pro (Preview)10010010010010010010000070.0%
o4 Mini High10010010010010010010000070.0%
Z.AI GLM 510010010010010010010000070.0%
DeepSeek V3.210010010010010010010000070.0%
GPT-4o, May 13th (temp=1)10010010010010010010000070.0%
Gemini 2.5 Flash Lite10010010010010010010000070.0%
Rocinante 12B10010010010010010010000070.0%
Gemini 3 Pro (Preview)100100100100100100000060.0%
Z.AI GLM 4.6100100100100100100000060.0%
Mistral NeMO100100100100100100000060.0%
Gemini 2.5 Pro100100100000000030.0%
Stealth: Aurora Alpha100100100000000030.0%
Gemma 3 27B100100100000000030.0%
Ministral 8B100100100000000030.0%
Minimax M2.510000000000010.0%
Ministral 3B10000000000010.0%
Qwen 2.5 72B00000000000.0%
Arcee AI: Trinity Mini00000000000.0%
Ministral 3 3B00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Minimax M2.5100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100090.0%
Writer: Palmyra X5100100100100100100100100100090.0%
Gemini 2.5 Flash100100100100100100100100100090.0%
Llama 3.1 8B100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)1001001001001001001001000080.0%
GPT-4o, Aug. 6th (temp=1)1001001001001001001001000080.0%
Hermes 3 70B1001001001001001001001000080.0%
Z.AI GLM 4.510010010010010010010000070.0%
ByteDance Seed 1.6 Flash10010010010010010010000070.0%
Gemini 2.5 Flash Lite10010010010010010010000070.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100000060.0%
DeepSeek V3 (2024-12-26)100100100100100100000060.0%
Cohere Command R+ (Aug. 2024)100100100100100100000060.0%
GPT-5.21001001001001000000050.0%
DeepSeek V3.11001001001001000000050.0%
Arcee AI: Trinity Large (Preview)1001001001001000000050.0%
Rocinante 12B1001001001001000000050.0%
Claude Sonnet 4.510010010010000000040.0%
Claude Opus 4100100100000000030.0%
Arcee AI: Trinity Mini100100100000000030.0%
Claude 3.5 Haiku10000000000010.0%
Gemma 3 12B10000000000010.0%
Mistral NeMO10000000000010.0%
DeepSeek V3.200000000000.0%
GPT-4.1 Mini00000000000.0%
Mistral Medium 3.100000000000.0%
Mistral Large 200000000000.0%
Hermes 3 405B00000000000.0%
GPT-4o Mini (temp=1)00000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
Mistral Large00000000000.0%
Mistral Small Creative00000000000.0%
Ministral 3 14B00000000000.0%
Claude 3 Haiku00000000000.0%
Ministral 3 8B00000000000.0%
Ministral 3 3B00000000000.0%
GPT-4.1 Nano00000000000.0%
Gemma 3 4B00000000000.0%
Ministral 8B00000000000.0%
WizardLM 2 8x22b00000000000.0%
Ministral 3B00000000000.0%