Markdown formatting overuse

Test: Bad Writing Habits

Avg. Score
98.7%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Stealth: Aurora Alpha100.0%$0.00009.8s100%
2Mistral NeMO100.0%$0.000510.1s100%
3Gemini 2.5 Flash Lite100.0%$0.00099.5s100%
4GPT-4.1 Nano100.0%$0.000713.3s100%
5Claude 3.5 Haiku100.0%$0.003510.8s100%
6Claude 3 Haiku100.0%$0.002514.9s100%
7Gemini 2.5 Flash100.0%$0.005210.6s100%
8Gemma 3 4B100.0%$0.000220.0s100%
9GPT-4.1 Mini100.0%$0.002719.0s100%
10Grok 4 Fast100.0%$0.001724.1s100%
11Gemini 3 Flash (Preview)100.0%$0.007819.6s100%
12GPT-4o Mini (temp=0)100.0%$0.001234.8s100%
13GPT-4o Mini (temp=1)100.0%$0.001234.8s100%
14Qwen 2.5 72B100.0%$0.001036.7s100%
15Claude Haiku 4.5100.0%$0.01121.6s100%
16Grok 4.1 Fast100.0%$0.001837.8s100%
17DeepSeek V3 (2025-03-24)100.0%$0.001439.4s100%
18Arcee AI: Trinity Large (Preview)100.0%$0.000043.6s100%
19o4 Mini100.0%$0.01525.7s100%
20GPT-4o, Aug. 6th (temp=1)100.0%$0.01824.4s100%
21Gemma 3 27B100.0%$0.000652.6s100%
22DeepSeek-V2 Chat100.0%$0.002153.3s100%
23DeepSeek V3 (2024-12-26)100.0%$0.002154.6s100%
24Hermes 3 405B100.0%$0.003253.2s100%
25GPT-4o, Aug. 6th (temp=0)100.0%$0.02322.7s100%
26Gemma 3 12B99.8%$0.000441.3s97%
27Z.AI GLM 4.6100.0%$0.006551.5s100%
28Z.AI GLM 4.599.9%$0.005142.1s97%
29GPT-4o, May 13th (temp=1)100.0%$0.03314.4s100%
30GPT-4o, May 13th (temp=0)100.0%$0.03514.1s100%
31GPT-4.1100.0%$0.01844.7s100%
32GPT-5 Mini100.0%$0.010057.4s100%
33Z.AI GLM 4.7 Flash100.0%$0.00171.2m100%
34Mistral Large99.8%$0.01430.9s96%
35o4 Mini High100.0%$0.02547.2s100%
36Claude Sonnet 4.6100.0%$0.03139.3s100%
37GPT-5 Nano100.0%$0.00421.4m100%
38Cohere Command R+ (Aug. 2024)99.9%$0.02052.5s99%
39Gemini 2.5 Pro100.0%$0.03636.2s100%
40Claude Sonnet 4100.0%$0.03243.7s100%
41Minimax M2.599.8%$0.00341.3m97%
42Z.AI GLM 4.7100.0%$0.0101.4m100%
43Qwen 3.5 Plus (2026-02-15)99.5%$0.006031.5s90%
44Claude 3.5 Sonnet100.0%$0.04835.5s100%
45DeepSeek V3.1100.0%$0.00201.8m100%
46Claude 3.7 Sonnet100.0%$0.04246.7s100%
47Arcee AI: Trinity Mini98.8%$0.00039.2s84%
48DeepSeek V3.299.9%$0.00141.9m97%
49Z.AI GLM 599.6%$0.00841.2m92%
50Gemini 3 Pro (Preview)100.0%$0.05554.4s100%
51Llama 3.1 70B98.7%$0.001529.4s79%
52Claude Opus 4.5100.0%$0.07053.4s100%
53Llama 3.1 Nemotron 70B98.9%$0.003831.7s79%
54ByteDance Seed 1.6100.0%$0.0132.5m100%
55Grok 4100.0%$0.0481.7m100%
56GPT-5.2100.0%$0.0561.5m100%
57Writer: Palmyra X598.5%$0.01122.0s78%
58GPT-5.1100.0%$0.0541.8m100%
59Claude Opus 4.6100.0%$0.0781.2m100%
60Qwen 3.5 397B A17B100.0%$0.0143.0m100%
61Claude Sonnet 4.598.9%$0.03538.1s79%
62MoonshotAI: Kimi K2.5100.0%$0.0193.2m100%
63WizardLM 2 8x22b98.9%$0.00261.8m79%
64Hermes 3 70B97.8%$0.00101.2m71%
65GPT-5100.0%$0.0652.8m100%
66Gemini 3.1 Pro (Preview)100.0%$0.1071.8m100%
67Mistral Large 296.2%$0.01329.4s66%
68Mistral Large 396.0%$0.003330.3s62%
69Ministral 3 14B94.4%$0.000711.7s60%
70Mistral Small 3.2 24B100.0%$0.00695.7m100%
71Llama 3.1 8B95.8%$0.00031.3m62%
72Mistral Small Creative93.3%$0.00079.1s59%
73ByteDance Seed 1.6 Flash91.7%$0.001327.3s60%
74Claude Opus 499.7%$0.2091.4m94%
75Rocinante 12B92.0%$0.001438.4s47%
76Ministral 3 8B91.0%$0.000819.6s48%
77Ministral 3 3B90.5%$0.000511.1s44%
78Ministral 8B89.8%$0.000410.4s45%
79Mistral Medium 3.190.6%$0.004836.5s46%
80Ministral 3B85.8%$0.00018.1s32%
98.69%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
ByteDance Seed 1.6 Flash1001001001009398.6%
Ministral 3 3B1001001001007895.6%
Mistral Small Creative100100100927393.0%
Arcee AI: Trinity Mini1001001001004088.1%
Llama 3.1 8B100100100100581.1%
Mistral Medium 3.1100100100100080.0%
Hermes 3 70B100100100100080.0%
Ministral 3B100100100100080.0%
Rocinante 12B100100100100080.0%
Ministral 8B10010010075075.1%
Ministral 3 8B10010010046069.1%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Ministral 8B1001001001009198.2%
Minimax M2.51001001001008597.1%
Claude Opus 41001001001007394.5%
Z.AI GLM 51001001001006292.4%
Qwen 3.5 Plus (2026-02-15)1001001001005190.3%
Arcee AI: Trinity Mini1001001001005090.0%
Mistral Large 31001001001003987.8%
Ministral 3 14B100100100765385.9%
Ministral 3 8B10010091904585.2%
Mistral Large 21001001001002384.6%
Claude Sonnet 4.5100100100100080.0%
WizardLM 2 8x22b100100100100080.0%
Mistral Small Creative100100100652578.0%
Mistral Medium 3.110010010033066.6%
ByteDance Seed 1.6 Flash83735742050.9%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Llama 3.1 70B1001001001008697.3%
Ministral 8B1001001001008697.2%
ByteDance Seed 1.6 Flash100100100664482.0%
Ministral 3 8B100100100100080.0%
Ministral 3 3B100100100100080.0%
Ministral 3B100100100100080.0%
Rocinante 12B100100100100080.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Mistral Small Creative1001001001004088.0%
ByteDance Seed 1.6 Flash1001001001003286.4%
Rocinante 12B100100100100981.8%
Ministral 3 3B100100100100080.0%
Ministral 3B100100100100080.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Llama 3.1 70B100100100100080.0%
Llama 3.1 8B100100100100080.0%
Rocinante 12B10010010095079.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Mistral Medium 3.11001001001008096.1%
Llama 3.1 Nemotron 70B100100100100080.0%
Mistral Small Creative100100100100080.0%
Ministral 3 14B100100100100080.0%
Llama 3.1 8B100100100100080.0%
Ministral 8B100100100100080.0%
Rocinante 12B100100100100080.0%
Ministral 3 3B10010010099079.8%
Ministral 3B1001003610049.1%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Rocinante 12B100100100100100100.0%
Z.AI GLM 4.61001001001009899.7%
ByteDance Seed 1.6 Flash1001001001009298.4%
Mistral Large 2100100100100080.0%
Hermes 3 70B100100100100080.0%
Ministral 3 3B100100100100080.0%
Ministral 3B100100100100080.0%
Mistral Small Creative10010010066073.3%
Ministral 3 14B10010090502472.9%
Mistral Medium 3.11001009036065.3%
Ministral 3 8B1001001000060.0%
Ministral 8B100100770055.5%
Mistral Large 310010000040.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Writer: Palmyra X51001001001009799.4%
Ministral 3 14B1001001001009599.0%
Mistral Large 21001001001008897.5%
Z.AI GLM 4.51001001001008897.5%
DeepSeek V3.21001001001008797.4%
Llama 3.1 8B1001001001008496.8%
Ministral 3 8B100100100877392.0%
Mistral Small Creative1001001001003687.1%
ByteDance Seed 1.6 Flash10010091786085.7%
Mistral Medium 3.110010099694983.4%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
ByteDance Seed 1.6 Flash1001001001008997.8%
Mistral Medium 3.1100100100100080.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
ByteDance Seed 1.6 Flash1001001001006793.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Rocinante 12B100100100100100100.0%
Ministral 3B100100100100080.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
ByteDance Seed 1.6 Flash1001001001007494.8%
Llama 3.1 8B1001001001003386.7%
Mistral Small Creative1001001001002885.6%
Ministral 3 14B100100100100080.0%
Ministral 3 3B100100100100080.0%
Rocinante 12B100100100100080.0%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Rocinante 12B100100100100100100.0%
Mistral Large 31001001001009899.6%
Mistral Large 21001001001008797.4%
Writer: Palmyra X5100100100100480.9%
Ministral 3B10010010076075.3%
Ministral 3 8B10010010056071.3%
Mistral Medium 3.11001001000060.0%
Ministral 8B10050180033.4%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)1001001001009498.8%
Gemma 3 12B1001001001008396.7%
Mistral Large1001001001008296.4%
Rocinante 12B1001001001007695.2%
Mistral Small Creative1001001001006893.6%
Ministral 8B100100100927693.5%
Writer: Palmyra X51001001001006292.3%
Mistral Large 210010010053571.6%
ByteDance Seed 1.6 Flash1001008165069.2%
Ministral 3 3B10010010013062.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Rocinante 12B100100100100100100.0%
ByteDance Seed 1.6 Flash1001001001007094.0%
Ministral 3B100100100100080.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Ministral 3 3B100100100100080.0%
Rocinante 12B100100100100080.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Rocinante 12B100100100100100100.0%
ByteDance Seed 1.6 Flash1001001001009899.7%
Ministral 3B100100100100080.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Rocinante 12B100100100100100100.0%
Ministral 3 3B1001001001005891.5%
Ministral 8B1001001001001883.5%
Ministral 3 14B100100100941481.5%
Ministral 3 8B100100100100080.0%
Ministral 3B1001001000060.0%