Gibberish response detection

Test: Bad Writing Habits

Avg. Score
98.6%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Ministral 3B100.0%$0.00018.1s100%
2Arcee AI: Trinity Mini100.0%$0.00039.2s100%
3Mistral Small Creative100.0%$0.00079.1s100%
4Ministral 8B100.0%$0.000410.4s100%
5Ministral 3 3B100.0%$0.000511.1s100%
6Ministral 3 14B100.0%$0.000711.7s100%
7Mistral NeMO100.0%$0.000510.1s99%
8Gemini 2.5 Flash Lite100.0%$0.00099.5s99%
9Stealth: Aurora Alpha99.9%$0.00009.8s99%
10Claude 3.5 Haiku100.0%$0.003510.8s100%
11GPT-4.1 Nano99.9%$0.000713.3s99%
12Claude 3 Haiku100.0%$0.002514.9s100%
13Gemma 3 4B100.0%$0.000220.0s100%
14Ministral 3 8B100.0%$0.000819.6s100%
15GPT-4.1 Mini100.0%$0.002719.0s100%
16Grok 4 Fast100.0%$0.001724.1s100%
17Gemini 3 Flash (Preview)100.0%$0.007819.6s99%
18Mistral Large 3100.0%$0.003330.3s100%
19GPT-4o Mini (temp=0)100.0%$0.001234.8s100%
20GPT-4o Mini (temp=1)100.0%$0.001234.8s100%
21Claude Haiku 4.5100.0%$0.01121.6s100%
22ByteDance Seed 1.6 Flash99.6%$0.001327.3s97%
23Writer: Palmyra X5100.0%$0.01122.0s100%
24Grok 4.1 Fast100.0%$0.001837.8s100%
25Gemma 3 12B100.0%$0.000441.3s100%
26Arcee AI: Trinity Large (Preview)100.0%$0.000043.6s100%
27Mistral Medium 3.1100.0%$0.004836.5s100%
28o4 Mini100.0%$0.01525.7s100%
29Mistral Large 2100.0%$0.01329.4s100%
30Qwen 3.5 Plus (2026-02-15)99.6%$0.006031.5s98%
31Llama 3.1 Nemotron 70B99.7%$0.003831.7s97%
32Mistral Large100.0%$0.01430.9s100%
33Gemma 3 27B100.0%$0.000652.6s100%
34DeepSeek-V2 Chat100.0%$0.002153.3s100%
35DeepSeek V3 (2024-12-26)100.0%$0.002154.6s100%
36Hermes 3 405B100.0%$0.003253.2s100%
37GPT-4o, Aug. 6th (temp=0)100.0%$0.02322.7s100%
38Gemini 2.5 Flash99.1%$0.005210.6s89%
39GPT-4o, Aug. 6th (temp=1)99.8%$0.01824.4s97%
40GPT-4o, May 13th (temp=1)100.0%$0.03314.4s100%
41GPT-4o, May 13th (temp=0)100.0%$0.03514.1s100%
42GPT-5 Mini100.0%$0.010057.4s100%
43Z.AI GLM 4.7 Flash100.0%$0.00171.2m100%
44GPT-4.1100.0%$0.01844.7s100%
45Cohere Command R+ (Aug. 2024)100.0%$0.02052.5s100%
46Z.AI GLM 5100.0%$0.00841.2m100%
47o4 Mini High100.0%$0.02547.2s100%
48Claude Sonnet 4.6100.0%$0.03139.3s100%
49Minimax M2.599.5%$0.00341.3m98%
50GPT-5 Nano100.0%$0.00421.4m100%
51Claude Sonnet 4.5100.0%$0.03538.1s100%
52Gemini 2.5 Pro100.0%$0.03636.2s100%
53Claude Sonnet 4100.0%$0.03243.7s100%
54Z.AI GLM 4.7100.0%$0.0101.4m99%
55Claude 3.5 Sonnet100.0%$0.04835.5s100%
56Claude 3.7 Sonnet100.0%$0.04246.7s100%
57DeepSeek V3.2100.0%$0.00141.9m100%
58DeepSeek V3.199.9%$0.00201.8m98%
59Qwen 2.5 72B98.3%$0.001036.7s78%
60Z.AI GLM 4.598.8%$0.005142.1s79%
61Z.AI GLM 4.698.9%$0.006551.5s79%
62Gemini 3 Pro (Preview)100.0%$0.05554.4s100%
63Claude Opus 4.5100.0%$0.07053.4s100%
64ByteDance Seed 1.6100.0%$0.0132.5m100%
65Hermes 3 70B97.4%$0.00101.2m75%
66Grok 4100.0%$0.0481.7m100%
67GPT-5.2100.0%$0.0561.5m100%
68DeepSeek V3 (2025-03-24)94.6%$0.001439.4s65%
69GPT-5.1100.0%$0.0541.8m100%
70Claude Opus 4.6100.0%$0.0781.2m100%
71Qwen 3.5 397B A17B99.9%$0.0143.0m99%
72MoonshotAI: Kimi K2.5100.0%$0.0193.2m100%
73Rocinante 12B89.9%$0.001438.4s51%
74GPT-5100.0%$0.0652.8m100%
75WizardLM 2 8x22b95.3%$0.00261.8m59%
76Gemini 3.1 Pro (Preview)100.0%$0.1071.8m100%
77Llama 3.1 70B87.2%$0.001529.4s35%
78Claude Opus 4100.0%$0.2091.4m100%
79Mistral Small 3.2 24B98.7%$0.00695.7m79%
80Llama 3.1 8B34.4%$0.00031.3m0%
98.63%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
ByteDance Seed 1.6 Flash1001001001009899.6%
Gemini 3 Flash (Preview)1001001001009699.2%
Minimax M2.51001001001009699.1%
Rocinante 12B100100100979698.8%
DeepSeek V3 (2025-03-24)1001001001002885.6%
Gemini 2.5 Flash10010090686183.6%
Llama 3.1 70B100100100100080.0%
Hermes 3 70B100100100100080.0%
Llama 3.1 8B10010000040.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B1001001001009999.7%
Minimax M2.5100100100999899.3%
GPT-4.1 Nano1001001001009599.0%
Stealth: Aurora Alpha1001001001009498.8%
Hermes 3 70B1001001001009097.9%
Mistral Small 3.2 24B1001001001008597.1%
DeepSeek V3 (2025-03-24)1001001001003887.6%
Llama 3.1 70B100100100100080.0%
Llama 3.1 8B100100390047.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)1001001001009498.7%
Hermes 3 70B100100100958996.8%
Rocinante 12B100100100925489.2%
Llama 3.1 70B100100100100080.0%
Llama 3.1 8B1001500023.1%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)10010010010010099.9%
Hermes 3 70B1001001001009599.0%
ByteDance Seed 1.6 Flash10010098989898.9%
DeepSeek V3 (2025-03-24)1001001001006492.9%
Rocinante 12B97928819059.2%
Llama 3.1 8B100100470049.5%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Qwen 3.5 397B A17B1001001001009999.9%
GPT-4.11001001001009899.5%
DeepSeek V3 (2025-03-24)1001001001009398.6%
Llama 3.1 70B100100100100080.0%
Rocinante 12B10010010085778.4%
Llama 3.1 8B100000020.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
ByteDance Seed 1.6 Flash10010010010010099.9%
Qwen 3.5 Plus (2026-02-15)1001001001009899.7%
GPT-4o, Aug. 6th (temp=1)1001001001008496.8%
Rocinante 12B1009898959396.8%
Hermes 3 70B100100100908895.6%
Llama 3.1 8B100000020.0%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Qwen 3.5 397B A17B1001001001009799.4%
Qwen 3.5 Plus (2026-02-15)1001001001009699.2%
Minimax M2.510010098989898.7%
DeepSeek V3.11001001001009098.1%
ByteDance Seed 1.6 Flash10010099999197.8%
Qwen 2.5 72B1001001001006392.6%
DeepSeek V3 (2025-03-24)1001001001005490.8%
Rocinante 12B10010091857390.0%
Hermes 3 70B1001001001003386.7%
Z.AI GLM 4.6100100100100080.0%
Llama 3.1 8B10010000040.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
ByteDance Seed 1.6 Flash10010010010010099.9%
Minimax M2.51001001001009999.7%
Qwen 3.5 Plus (2026-02-15)1001001001009999.7%
Z.AI GLM 4.71001001001009899.6%
Llama 3.1 Nemotron 70B1001001001009298.4%
Qwen 2.5 72B10010010094078.8%
Llama 3.1 8B1008200036.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
ByteDance Seed 1.6 Flash1001001001009999.9%
GPT-5 Nano1001001001009999.7%
Qwen 3.5 Plus (2026-02-15)100100100969498.2%
Rocinante 12B100100100969397.7%
DeepSeek V3 (2025-03-24)100100100956191.1%
Z.AI GLM 4.5100100100100080.0%
WizardLM 2 8x22b100100100100080.0%
Llama 3.1 70B100100620052.5%
Llama 3.1 8B100100300046.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Z.AI GLM 4.51001001001009699.3%
ByteDance Seed 1.6 Flash100100100989799.1%
Minimax M2.5100100100989799.0%
Qwen 2.5 72B1001001001009298.3%
Rocinante 12B100100100914487.1%
Llama 3.1 8B10010030040.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Z.AI GLM 4.71001001001009999.8%
Gemini 2.5 Flash1001001001009999.8%
Qwen 3.5 397B A17B1001001001009899.5%
ByteDance Seed 1.6 Flash1001001001009899.5%
Qwen 2.5 72B1001001001009799.3%
Qwen 3.5 Plus (2026-02-15)1001001001009699.3%
DeepSeek V3 (2025-03-24)1001001001003486.9%
Rocinante 12B100100100822781.7%
Llama 3.1 70B100100100100080.0%
Llama 3.1 8B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
ByteDance Seed 1.6 Flash1001001001009899.7%
DeepSeek V3 (2025-03-24)1001001001009599.1%
Minimax M2.510010098979698.3%
Rocinante 12B10010010042068.4%
WizardLM 2 8x22b1007300034.7%
Llama 3.1 8B000000.0%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
ByteDance Seed 1.6 Flash100100100999999.6%
Qwen 3.5 Plus (2026-02-15)1001001001009899.5%
Minimax M2.51001001001009598.9%
Llama 3.1 70B1001001001008496.9%
DeepSeek V3 (2025-03-24)1001001001005691.2%
Rocinante 12B100100100100080.0%
Llama 3.1 8B1001300022.5%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100999999.4%
Minimax M2.51001001001009799.4%
Hermes 3 70B100100100969197.3%
Llama 3.1 Nemotron 70B1001001001008597.1%
Rocinante 12B1001001001006192.2%
Llama 3.1 8B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
WizardLM 2 8x22b1001001001009999.8%
Grok 4.1 Fast1001001001009999.8%
DeepSeek V3 (2025-03-24)1001001001009699.1%
Llama 3.1 70B100100100100080.0%
Llama 3.1 8B100100830056.5%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Z.AI GLM 510010010010010099.9%
Rocinante 12B1001001001009699.3%
Gemini 2.5 Flash Lite1001001001009699.2%
DeepSeek V3 (2025-03-24)100100100100080.0%
Mistral Small 3.2 24B100100100100080.0%
Llama 3.1 8B10010010017865.0%
Llama 3.1 70B1001001000060.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
ByteDance Seed 1.6 Flash1001001001009899.6%
Mistral NeMO1001001001009799.5%
DeepSeek V3 (2025-03-24)1001001001009799.5%
Minimax M2.51001001001009799.4%
Llama 3.1 70B100100100100581.0%
Llama 3.1 8B10010010062072.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Z.AI GLM 4.71001001001009999.8%
Minimax M2.5100100100989799.1%
ByteDance Seed 1.6 Flash1001001001009498.7%
Llama 3.1 8B10010000040.0%