Cliché density

Test: Bad Writing Habits

Avg. Score
91.0%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 3 Flash (Preview)98.9%$0.007819.6s88%
2Claude 3.5 Haiku98.1%$0.003510.8s85%
3ByteDance Seed 1.6 Flash98.5%$0.001327.3s86%
4Claude Sonnet 4.5100.0%$0.03538.1s100%
5Ministral 3 14B97.4%$0.000711.7s82%
6Claude Haiku 4.598.9%$0.01121.6s84%
7Mistral Large 298.5%$0.01329.4s86%
8DeepSeek V3 (2025-03-24)98.5%$0.001439.4s83%
9GPT-5 Mini99.3%$0.010057.4s90%
10Writer: Palmyra X597.8%$0.01122.0s83%
11Claude Sonnet 4.699.6%$0.03139.3s93%
12GPT-4.1 Mini96.3%$0.002719.0s79%
13GPT-4o, Aug. 6th (temp=1)97.8%$0.01824.4s83%
14Arcee AI: Trinity Mini95.6%$0.00039.2s75%
15Claude Sonnet 499.3%$0.03243.7s90%
16Claude 3.5 Sonnet99.6%$0.04835.5s93%
17Mistral Large 396.7%$0.003330.3s78%
18Minimax M2.598.5%$0.00341.3m86%
19Z.AI GLM 598.5%$0.00841.2m86%
20GPT-4.197.8%$0.01844.7s83%
21Grok 4.1 Fast95.9%$0.001837.8s76%
22Mistral Medium 3.197.0%$0.004836.5s73%
23Grok 4 Fast94.8%$0.001724.1s72%
24Mistral Large95.9%$0.01430.9s76%
25Qwen 3.5 Plus (2026-02-15)94.1%$0.006031.5s73%
26Z.AI GLM 4.594.1%$0.005142.1s75%
27Z.AI GLM 4.7 Flash96.7%$0.00171.2m76%
28Rocinante 12B95.6%$0.001438.4s68%
29Gemma 3 4B93.3%$0.000220.0s65%
30o4 Mini94.1%$0.01525.7s69%
31Mistral Small Creative92.2%$0.00079.1s61%
32Z.AI GLM 4.694.8%$0.006551.5s70%
33Z.AI GLM 4.796.7%$0.0101.4m78%
34DeepSeek V3.196.7%$0.00201.8m80%
35DeepSeek V3.296.7%$0.00141.9m80%
36Gemma 3 27B93.7%$0.000652.6s69%
37Gemini 2.5 Pro96.3%$0.03636.2s75%
38GPT-5 Nano95.6%$0.00421.4m73%
39Ministral 8B91.1%$0.000410.4s59%
40o4 Mini High95.2%$0.02547.2s71%
41Gemini 2.5 Flash Lite91.1%$0.00099.5s57%
42Gemini 3 Pro (Preview)97.0%$0.05554.4s81%
43Ministral 3B90.7%$0.00018.1s57%
44Claude Opus 4.699.3%$0.0781.2m90%
45Hermes 3 405B92.6%$0.003253.2s64%
46Claude Opus 4.597.8%$0.07053.4s83%
47Gemma 3 12B90.7%$0.000441.3s58%
48MoonshotAI: Kimi K2.599.3%$0.0193.2m90%
49GPT-4o Mini (temp=1)88.9%$0.001234.8s59%
50Ministral 3 8B90.7%$0.000819.6s49%
51Grok 496.3%$0.0481.7m79%
52WizardLM 2 8x22b93.3%$0.00261.8m65%
53GPT-4.1 Nano86.3%$0.000713.3s51%
54GPT-5.196.7%$0.0541.8m80%
55ByteDance Seed 1.695.9%$0.0132.5m74%
56Claude 3.7 Sonnet91.5%$0.04246.7s63%
57GPT-599.3%$0.0652.8m90%
58Gemini 3.1 Pro (Preview)99.3%$0.1071.8m90%
59Gemini 2.5 Flash84.4%$0.005210.6s46%
60GPT-4o, May 13th (temp=1)87.0%$0.03314.4s54%
61Arcee AI: Trinity Large (Preview)86.7%$0.000043.6s48%
62Ministral 3 3B85.6%$0.000511.1s40%
63DeepSeek-V2 Chat86.3%$0.002153.3s48%
64Llama 3.1 Nemotron 70B84.4%$0.003831.7s46%
65Cohere Command R+ (Aug. 2024)86.7%$0.02052.5s55%
66Hermes 3 70B87.0%$0.00101.2m49%
67DeepSeek V3 (2024-12-26)86.3%$0.002154.6s46%
68Llama 3.1 8B85.9%$0.00031.3m48%
69Qwen 3.5 397B A17B93.0%$0.0143.0m65%
70Mistral NeMO78.1%$0.000510.1s36%
71GPT-5.288.5%$0.0561.5m59%
72Claude Opus 499.3%$0.2091.4m90%
73Llama 3.1 70B70.4%$0.001529.4s24%
74Stealth: Aurora Alpha66.3%$0.00009.8s23%
75Claude 3 Haiku64.4%$0.002514.9s21%
76GPT-4o, Aug. 6th (temp=0)68.1%$0.02322.7s24%
77GPT-4o Mini (temp=0)63.7%$0.001234.8s18%
78Qwen 2.5 72B61.1%$0.001036.7s16%
79GPT-4o, May 13th (temp=0)53.0%$0.03514.1s18%
80Mistral Small 3.2 24B53.0%$0.00695.7m9%
91.02%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Gemini 3.1 Pro (Preview)1001001001006793.3%
o4 Mini High1001001001006793.3%
MoonshotAI: Kimi K2.51001001001006793.3%
Grok 41001001001006793.3%
Grok 4.1 Fast1001001001006793.3%
Grok 4 Fast1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
GPT-4o, Aug. 6th (temp=1)1001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
ByteDance Seed 1.6 Flash1001001001006793.3%
DeepSeek V3.11001001001006793.3%
Gemini 2.5 Flash1001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Mistral Large1001001001006793.3%
Ministral 3 14B1001001001006793.3%
Qwen 2.5 72B1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
DeepSeek V3 (2024-12-26)1001001001003386.7%
Mistral Large 3100100100676786.7%
Hermes 3 405B1001001001003386.7%
Llama 3.1 70B1001001001003386.7%
Arcee AI: Trinity Large (Preview)1001001001003386.7%
Cohere Command R+ (Aug. 2024)1001001001003386.7%
GPT-4o, Aug. 6th (temp=0)100100100673380.0%
GPT-4o Mini (temp=0)10010010067073.3%
Claude 3 Haiku1006767676773.3%
GPT-4o, May 13th (temp=1)1001006767066.7%
GPT-4o, May 13th (temp=0)1006767333360.0%
Mistral NeMO100676767060.0%
Stealth: Aurora Alpha67333333033.3%
Mistral Small 3.2 24B3300006.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Z.AI GLM 4.61001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
Mistral Large 21001001001006793.3%
Gemini 2.5 Flash1001001001006793.3%
Llama 3.1 70B1001001001006793.3%
Hermes 3 70B1001001001006793.3%
WizardLM 2 8x22b1001001001006793.3%
Qwen 3.5 Plus (2026-02-15)100100100676786.7%
GPT-4o Mini (temp=0)100100100676786.7%
Llama 3.1 8B100100100676786.7%
Rocinante 12B1001001001003386.7%
GPT-4o, May 13th (temp=1)1001001001003386.7%
Stealth: Aurora Alpha100100100673380.0%
GPT-4o, May 13th (temp=0)100100100673380.0%
Mistral NeMO100100100333373.3%
GPT-4o, Aug. 6th (temp=0)10010010033066.7%
Llama 3.1 Nemotron 70B10010010033066.7%
Claude 3 Haiku1001006767066.7%
GPT-4.1 Nano1001006767066.7%
Qwen 2.5 72B1006767333360.0%
Mistral Small 3.2 24B100100330046.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Large100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Mistral NeMO100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Claude Opus 4.61001001001006793.3%
o4 Mini1001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
GPT-4.11001001001006793.3%
Z.AI GLM 4.7 Flash1001001001006793.3%
Qwen 3.5 Plus (2026-02-15)1001001001006793.3%
GPT-5 Nano1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
DeepSeek V3.21001001001006793.3%
GPT-4o, May 13th (temp=1)1001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Gemini 2.5 Flash Lite1001001001006793.3%
Ministral 3 14B1001001001006793.3%
Llama 3.1 8B1001001001006793.3%
GPT-5.2100100100676786.7%
DeepSeek-V2 Chat100100100676786.7%
Z.AI GLM 4.5100100100676786.7%
Llama 3.1 Nemotron 70B1001001001003386.7%
Gemma 3 27B100100100676786.7%
Cohere Command R+ (Aug. 2024)1001001001003386.7%
GPT-4.1 Nano100100100676786.7%
Ministral 8B100100100676786.7%
Arcee AI: Trinity Mini10010067676780.0%
Z.AI GLM 4.610010067676780.0%
DeepSeek V3 (2024-12-26)100100100673380.0%
GPT-4o, May 13th (temp=0)100100100673380.0%
Mistral Small Creative100100100100080.0%
Hermes 3 70B100100100100080.0%
Ministral 3 8B100100100100080.0%
Rocinante 12B100100100100080.0%
Arcee AI: Trinity Large (Preview)100100100333373.3%
Ministral 3B100100100333373.3%
Qwen 3.5 397B A17B1001006767066.7%
o4 Mini High10010067333366.7%
GPT-4o, Aug. 6th (temp=0)10010010033066.7%
Gemma 3 4B1001006767066.7%
Stealth: Aurora Alpha676767673360.0%
Qwen 2.5 72B100676733053.3%
Mistral Small 3.2 24B100100670053.3%
Ministral 3 3B1001003333053.3%
Llama 3.1 70B100333333040.0%
GPT-4o Mini (temp=0)333300013.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Mistral Large 3100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Mistral NeMO100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
o4 Mini1001001001006793.3%
Gemini 2.5 Pro1001001001006793.3%
GPT-5.21001001001006793.3%
Minimax M2.51001001001006793.3%
ByteDance Seed 1.61001001001006793.3%
Z.AI GLM 4.61001001001006793.3%
GPT-4.11001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
Claude Haiku 4.51001001001006793.3%
Z.AI GLM 4.51001001001006793.3%
Mistral Large 21001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Llama 3.1 Nemotron 70B1001001001006793.3%
Hermes 3 70B1001001001006793.3%
Arcee AI: Trinity Mini1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
Grok 4100100100676786.7%
Stealth: Aurora Alpha100100100676786.7%
GPT-5 Nano100100100676786.7%
DeepSeek V3 (2024-12-26)1001001001003386.7%
ByteDance Seed 1.6 Flash100100100676786.7%
Qwen 3.5 397B A17B1001001001003386.7%
o4 Mini High100100100673380.0%
Grok 4.1 Fast10010067676780.0%
Claude 3.7 Sonnet10010067676780.0%
GPT-4o, May 13th (temp=1)100100100673380.0%
Grok 4 Fast10010067673373.3%
GPT-4o, Aug. 6th (temp=0)1006767676773.3%
Cohere Command R+ (Aug. 2024)10010067673373.3%
Llama 3.1 70B1006767673366.7%
Llama 3.1 8B10010010033066.7%
Qwen 2.5 72B100333333040.0%
GPT-4o Mini (temp=0)67673333040.0%
Claude 3 Haiku10067330040.0%
GPT-4o, May 13th (temp=0)6767330033.3%
Mistral Small 3.2 24B1003300026.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Rocinante 12B100100100100100100.0%
GPT-5 Mini1001001001006793.3%
Claude Opus 4.61001001001006793.3%
o4 Mini High1001001001006793.3%
Claude Opus 41001001001006793.3%
Minimax M2.51001001001006793.3%
ByteDance Seed 1.61001001001006793.3%
GPT-4.11001001001006793.3%
GPT-5 Nano1001001001006793.3%
DeepSeek V3.21001001001006793.3%
GPT-4o, Aug. 6th (temp=1)1001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
Mistral Medium 3.11001001001006793.3%
Mistral Large 21001001001006793.3%
Hermes 3 405B1001001001006793.3%
Ministral 3 14B1001001001006793.3%
Cohere Command R+ (Aug. 2024)1001001001006793.3%
Ministral 3B1001001001006793.3%
Qwen 3.5 397B A17B1001001001003386.7%
Qwen 3.5 Plus (2026-02-15)100100100676786.7%
Z.AI GLM 4.5100100100676786.7%
GPT-4o, May 13th (temp=1)100100100676786.7%
GPT-4o Mini (temp=1)100100100676786.7%
Mistral Large100100100676786.7%
Llama 3.1 8B1001001001003386.7%
Gemma 3 4B100100100676786.7%
Mistral NeMO1001001001003386.7%
o4 Mini100100100673380.0%
DeepSeek V3 (2024-12-26)100100100673380.0%
Mistral Large 3100100100673380.0%
Gemini 2.5 Flash100100100673380.0%
Hermes 3 70B100100100673380.0%
GPT-5.210010067673373.3%
Z.AI GLM 4.7 Flash100100100333373.3%
DeepSeek-V2 Chat10010067673373.3%
GPT-4o, Aug. 6th (temp=0)10010067673373.3%
Mistral Small Creative10010010067073.3%
Qwen 2.5 72B10010010067073.3%
Ministral 3 3B10010010067073.3%
Gemma 3 12B10010067333366.7%
Gemma 3 27B10010067333366.7%
Arcee AI: Trinity Large (Preview)10010067333366.7%
GPT-4.1 Nano10010010033066.7%
Ministral 8B1006767673366.7%
Llama 3.1 70B1006767333360.0%
Llama 3.1 Nemotron 70B1006767333360.0%
Claude 3 Haiku100676767060.0%
Ministral 3 8B1001001000060.0%
Stealth: Aurora Alpha100676733053.3%
Gemini 2.5 Flash Lite100673333046.7%
GPT-4o, May 13th (temp=0)673300020.0%
GPT-4o Mini (temp=0)67000013.3%
Mistral Small 3.2 24B67000013.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Gemma 3 4B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Rocinante 12B100100100100100100.0%
o4 Mini1001001001006793.3%
Claude Sonnet 41001001001006793.3%
ByteDance Seed 1.61001001001006793.3%
Qwen 3.5 Plus (2026-02-15)1001001001006793.3%
GPT-5 Nano1001001001006793.3%
Grok 4 Fast1001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
DeepSeek V3.21001001001006793.3%
Z.AI GLM 4.51001001001006793.3%
GPT-4o, Aug. 6th (temp=1)1001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
Gemini 2.5 Flash1001001001006793.3%
Gemini 2.5 Flash Lite1001001001006793.3%
Mistral Large1001001001006793.3%
Ministral 3 14B1001001001006793.3%
Ministral 3 3B1001001001006793.3%
GPT-5.2100100100676786.7%
Gemma 3 27B1001001001003386.7%
Arcee AI: Trinity Large (Preview)100100100676786.7%
Ministral 3 8B1001001001003386.7%
Ministral 3B1001001001003386.7%
Gemma 3 12B1001001001003386.7%
Ministral 8B1001001001003386.7%
GPT-4.1 Nano10010067676780.0%
GPT-4o, Aug. 6th (temp=0)10010067676780.0%
GPT-4o Mini (temp=1)100100100100080.0%
Cohere Command R+ (Aug. 2024)1001006767066.7%
GPT-4o Mini (temp=0)1001006733060.0%
Mistral Small 3.2 24B1001006733060.0%
Claude 3 Haiku100676767060.0%
Llama 3.1 8B100676767060.0%
Stealth: Aurora Alpha100676733053.3%
Llama 3.1 70B10067670046.7%
GPT-4o, May 13th (temp=0)6767670040.0%
Qwen 2.5 72B10033330033.3%
Mistral NeMO1003300026.7%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Rocinante 12B100100100100100100.0%
GPT-5.11001001001006793.3%
o4 Mini1001001001006793.3%
Z.AI GLM 51001001001006793.3%
GPT-5.21001001001006793.3%
Claude Sonnet 4.61001001001006793.3%
Z.AI GLM 4.61001001001006793.3%
Gemini 3 Flash (Preview)1001001001006793.3%
Grok 4 Fast1001001001006793.3%
Mistral Large 31001001001006793.3%
Z.AI GLM 4.51001001001006793.3%
GPT-4o, Aug. 6th (temp=1)1001001001006793.3%
GPT-4o, May 13th (temp=1)1001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
Hermes 3 405B1001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Llama 3.1 70B1001001001006793.3%
Mistral Small Creative1001001001006793.3%
Ministral 3 14B1001001001006793.3%
Mistral Small 3.2 24B1001001001006793.3%
Llama 3.1 8B1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
WizardLM 2 8x22b1001001001006793.3%
Hermes 3 70B1001001001003386.7%
Cohere Command R+ (Aug. 2024)1001001001003386.7%
Ministral 8B100100100676786.7%
Ministral 3B100100100676786.7%
GPT-4o Mini (temp=0)10010067676780.0%
Qwen 2.5 72B100100100673380.0%
Mistral NeMO10010067676780.0%
DeepSeek V3 (2024-12-26)1006767676773.3%
GPT-4o, Aug. 6th (temp=0)10010010067073.3%
Ministral 3 8B10010010067073.3%
Ministral 3 3B10010010067073.3%
Stealth: Aurora Alpha1006767673366.7%
Claude 3 Haiku10010067333366.7%
GPT-4o, May 13th (temp=0)1001006733060.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Rocinante 12B100100100100100100.0%
Gemini 3.1 Pro (Preview)1001001001006793.3%
o4 Mini1001001001006793.3%
Gemini 2.5 Pro1001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
GPT-4o, May 13th (temp=1)1001001001006793.3%
Gemini 2.5 Flash1001001001006793.3%
Gemma 3 12B1001001001006793.3%
Llama 3.1 Nemotron 70B1001001001006793.3%
Hermes 3 70B1001001001006793.3%
Ministral 3B1001001001006793.3%
GPT-5.1100100100676786.7%
GPT-5100100100676786.7%
GPT-4o, May 13th (temp=0)100100100676786.7%
Mistral Small 3.2 24B100100100676786.7%
Llama 3.1 8B100100100676786.7%
Cohere Command R+ (Aug. 2024)100100100676786.7%
Mistral Small Creative1001001001003386.7%
GPT-4o Mini (temp=0)10010067676780.0%
Qwen 2.5 72B100100100673380.0%
GPT-4o, Aug. 6th (temp=0)10010067673373.3%
Ministral 3 3B10010010067073.3%
Mistral NeMO100100100333373.3%
Llama 3.1 70B1006767673366.7%
GPT-4.1 Nano1006767333360.0%
Stealth: Aurora Alpha100676733053.3%
Claude 3 Haiku100676733053.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Large 2100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Mistral NeMO100100100100100100.0%
Ministral 8B100100100100100100.0%
GPT-5.11001001001006793.3%
Claude Opus 4.51001001001006793.3%
Gemini 2.5 Pro1001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
Gemini 3 Flash (Preview)1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
GPT-4o, Aug. 6th (temp=1)1001001001006793.3%
Writer: Palmyra X51001001001006793.3%
Gemma 3 27B1001001001006793.3%
Mistral Small Creative1001001001006793.3%
Cohere Command R+ (Aug. 2024)1001001001006793.3%
Ministral 3B1001001001006793.3%
Qwen 3.5 397B A17B100100100676786.7%
Z.AI GLM 4.61001001001003386.7%
Stealth: Aurora Alpha100100100676786.7%
Qwen 3.5 Plus (2026-02-15)1001001001003386.7%
DeepSeek V3.1100100100676786.7%
GPT-4o Mini (temp=1)1001001001003386.7%
Llama 3.1 Nemotron 70B100100100676786.7%
Ministral 3 3B1001001001003386.7%
Gemma 3 4B100100100676786.7%
WizardLM 2 8x22b100100100676786.7%
Rocinante 12B100100100676786.7%
Mistral Medium 3.11001001001003386.7%
GPT-4o, May 13th (temp=1)100100100673380.0%
Hermes 3 405B100100100673380.0%
Claude 3 Haiku100100100673380.0%
GPT-4.1 Nano100100100100080.0%
GPT-4o, May 13th (temp=0)10010067673373.3%
GPT-4o, Aug. 6th (temp=0)10010010067073.3%
Arcee AI: Trinity Large (Preview)100100100333373.3%
Llama 3.1 8B10010010067073.3%
GPT-5.2676767676766.7%
GPT-4o Mini (temp=0)1001006767066.7%
Gemma 3 12B1006767673366.7%
Hermes 3 70B1001006767066.7%
Mistral Small 3.2 24B1001001000060.0%
Gemini 2.5 Flash67673333040.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
o4 Mini High1001001001006793.3%
MoonshotAI: Kimi K2.51001001001006793.3%
Grok 4.1 Fast1001001001006793.3%
Grok 4 Fast1001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Gemma 3 12B1001001001006793.3%
Mistral Large1001001001006793.3%
Hermes 3 70B1001001001006793.3%
Mistral NeMO1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
GPT-5.21001001001003386.7%
GPT-5 Nano1001001001003386.7%
Z.AI GLM 4.5100100100676786.7%
Claude 3.5 Haiku100100100676786.7%
DeepSeek V3.1100100100676786.7%
Writer: Palmyra X5100100100676786.7%
Gemini 2.5 Flash100100100676786.7%
Hermes 3 405B100100100676786.7%
Mistral Small Creative100100100676786.7%
Arcee AI: Trinity Large (Preview)100100100676786.7%
Gemini 2.5 Pro100100100673380.0%
ByteDance Seed 1.6100100100673380.0%
DeepSeek V3 (2024-12-26)100100100100080.0%
Claude 3.7 Sonnet100100100673380.0%
GPT-4o, Aug. 6th (temp=0)10010067676780.0%
Ministral 3 3B100100100673380.0%
Cohere Command R+ (Aug. 2024)10010067676780.0%
GPT-4o Mini (temp=0)1006767676773.3%
Llama 3.1 70B10010067673373.3%
Llama 3.1 Nemotron 70B10010010067073.3%
Ministral 8B100100100333373.3%
WizardLM 2 8x22b10010067673373.3%
Llama 3.1 8B1001006767066.7%
Stealth: Aurora Alpha100676733053.3%
Qwen 2.5 72B100676733053.3%
Claude 3 Haiku100673333046.7%
Mistral Small 3.2 24B10033330033.3%
GPT-4o, May 13th (temp=0)673300020.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Gemini 3 Pro (Preview)1001001001006793.3%
Minimax M2.51001001001006793.3%
Grok 41001001001006793.3%
Z.AI GLM 4.61001001001006793.3%
Z.AI GLM 4.7 Flash1001001001006793.3%
DeepSeek V3 (2024-12-26)1001001001006793.3%
Mistral Large 31001001001006793.3%
Z.AI GLM 4.51001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
DeepSeek V3.11001001001006793.3%
Writer: Palmyra X51001001001006793.3%
Mistral Large 21001001001006793.3%
GPT-4o Mini (temp=0)1001001001006793.3%
Gemini 2.5 Flash Lite1001001001006793.3%
Arcee AI: Trinity Mini1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
Gemma 3 4B1001001001006793.3%
Rocinante 12B1001001001006793.3%
GPT-5.1100100100676786.7%
GPT-5.21001001001003386.7%
Claude 3.7 Sonnet100100100676786.7%
Llama 3.1 Nemotron 70B100100100676786.7%
Mistral Small 3.2 24B100100100676786.7%
Ministral 3 3B100100100676786.7%
Llama 3.1 8B100100100676786.7%
Cohere Command R+ (Aug. 2024)100100100676786.7%
Ministral 3B1001001001003386.7%
Mistral Large1001001001003386.7%
Gemma 3 27B10010067676780.0%
Mistral Small Creative10010067676780.0%
Claude Opus 4.510010067676780.0%
Z.AI GLM 510010067676780.0%
Gemini 2.5 Pro100100100673380.0%
Z.AI GLM 4.710010067676780.0%
Qwen 3.5 Plus (2026-02-15)10010067676780.0%
Claude 3 Haiku100100100673380.0%
GPT-4o, May 13th (temp=1)10010010067073.3%
Gemma 3 12B10010010067073.3%
Llama 3.1 70B10010067673373.3%
Arcee AI: Trinity Large (Preview)10010010067073.3%
Ministral 3 8B10010067673373.3%
WizardLM 2 8x22b10010067673373.3%
Stealth: Aurora Alpha10010067333366.7%
GPT-4o, Aug. 6th (temp=0)10010067333366.7%
GPT-4o Mini (temp=1)10010067333366.7%
Ministral 8B1001006767066.7%
GPT-4o, May 13th (temp=0)1006767333360.0%
DeepSeek-V2 Chat100676733053.3%
Gemini 2.5 Flash67676767053.3%
Mistral NeMO1006733333353.3%
Qwen 2.5 72B6767330033.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Rocinante 12B100100100100100100.0%
GPT-5.11001001001006793.3%
GPT-5.21001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
Grok 41001001001006793.3%
GPT-5 Nano1001001001006793.3%
DeepSeek V3 (2024-12-26)1001001001006793.3%
Mistral Large 31001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
DeepSeek V3.21001001001006793.3%
Mistral Medium 3.11001001001006793.3%
DeepSeek V3.11001001001006793.3%
Gemini 2.5 Flash1001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Gemini 2.5 Flash Lite1001001001006793.3%
Llama 3.1 Nemotron 70B1001001001006793.3%
Mistral Large1001001001006793.3%
Gemma 3 27B1001001001006793.3%
Mistral Small Creative1001001001006793.3%
Ministral 3 14B1001001001006793.3%
Ministral 3 8B1001001001006793.3%
ByteDance Seed 1.61001001001003386.7%
Grok 4 Fast1001001001003386.7%
GPT-4o, May 13th (temp=1)100100100676786.7%
Gemma 3 12B1001001001003386.7%
Cohere Command R+ (Aug. 2024)100100100676786.7%
GPT-4.1 Nano100100100676786.7%
Hermes 3 405B1001001001003386.7%
Ministral 3B10010067676780.0%
GPT-4o Mini (temp=0)10010067676780.0%
Arcee AI: Trinity Large (Preview)100100100100080.0%
Hermes 3 70B100100100673380.0%
Stealth: Aurora Alpha10010067673373.3%
Llama 3.1 70B100100100333373.3%
Ministral 3 3B10010010067073.3%
WizardLM 2 8x22b10010010067073.3%
Mistral NeMO10010010033066.7%
GPT-4o, Aug. 6th (temp=0)10067670046.7%
Mistral Small 3.2 24B10067670046.7%
GPT-4o, May 13th (temp=0)10033330033.3%
Qwen 2.5 72B10033330033.3%
Claude 3 Haiku6767330033.3%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Qwen 3.5 397B A17B1001001001006793.3%
Claude Opus 4.51001001001006793.3%
GPT-4.11001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
Claude 3.5 Haiku1001001001006793.3%
Mistral Medium 3.11001001001006793.3%
Writer: Palmyra X51001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
GPT-4o Mini (temp=0)1001001001006793.3%
Arcee AI: Trinity Large (Preview)1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
WizardLM 2 8x22b1001001001006793.3%
GPT-4o, Aug. 6th (temp=0)100100100676786.7%
GPT-4o, May 13th (temp=1)1001001001003386.7%
Hermes 3 405B100100100676786.7%
Cohere Command R+ (Aug. 2024)1001001001003386.7%
Gemma 3 4B100100100676786.7%
Hermes 3 70B10010067676780.0%
GPT-5.2100100100673380.0%
DeepSeek V3 (2024-12-26)100100100673380.0%
Llama 3.1 70B100100100673380.0%
Qwen 2.5 72B10010067676780.0%
Claude 3 Haiku100100100673380.0%
Stealth: Aurora Alpha1006767676773.3%
GPT-4o, May 13th (temp=0)1006767673366.7%
Mistral NeMO1001006767066.7%
Mistral Small 3.2 24B100100330046.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
GPT-5.11001001001006793.3%
o4 Mini High1001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
ByteDance Seed 1.61001001001006793.3%
Qwen 3.5 Plus (2026-02-15)1001001001006793.3%
DeepSeek V3.21001001001006793.3%
Hermes 3 405B1001001001006793.3%
Llama 3.1 Nemotron 70B1001001001006793.3%
Arcee AI: Trinity Large (Preview)1001001001006793.3%
Arcee AI: Trinity Mini1001001001006793.3%
Mistral NeMO1001001001006793.3%
DeepSeek-V2 Chat1001001001003386.7%
GPT-4o Mini (temp=1)100100100676786.7%
Llama 3.1 70B1001001001003386.7%
Qwen 3.5 397B A17B10010067676780.0%
GPT-5.2100100100673380.0%
Ministral 3 3B100100100100080.0%
GPT-4.1 Nano100100100673380.0%
Qwen 2.5 72B10010010067073.3%
Mistral Small 3.2 24B10010010067073.3%
Claude 3 Haiku10010067673373.3%
Stealth: Aurora Alpha1001006767066.7%
GPT-4o, Aug. 6th (temp=0)1006733333353.3%
GPT-4o, May 13th (temp=0)10067330040.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Qwen 3.5 397B A17B1001001001006793.3%
GPT-5 Mini1001001001006793.3%
GPT-5.11001001001006793.3%
Gemini 2.5 Pro1001001001006793.3%
Grok 41001001001006793.3%
Grok 4.1 Fast1001001001006793.3%
GPT-4.11001001001006793.3%
Z.AI GLM 4.7 Flash1001001001006793.3%
DeepSeek V3 (2025-03-24)1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
Z.AI GLM 4.51001001001006793.3%
DeepSeek V3.11001001001006793.3%
Mistral Large1001001001006793.3%
Gemma 3 27B1001001001006793.3%
Hermes 3 70B1001001001006793.3%
Llama 3.1 8B1001001001006793.3%
Cohere Command R+ (Aug. 2024)1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
Gemma 3 4B1001001001006793.3%
Rocinante 12B1001001001006793.3%
Z.AI GLM 4.7100100100676786.7%
GPT-5 Nano1001001001003386.7%
DeepSeek V3.2100100100676786.7%
GPT-4o, May 13th (temp=1)100100100676786.7%
Gemini 2.5 Flash Lite100100100676786.7%
Arcee AI: Trinity Large (Preview)1001001001003386.7%
Mistral NeMO100100100676786.7%
Ministral 8B1001001001003386.7%
DeepSeek-V2 Chat1001001001003386.7%
DeepSeek V3 (2024-12-26)100100100100080.0%
GPT-4o, May 13th (temp=0)10010067676780.0%
Llama 3.1 Nemotron 70B100100100100080.0%
Ministral 3 8B100100100100080.0%
Ministral 3 3B100100100100080.0%
Stealth: Aurora Alpha10010010067073.3%
GPT-4o, Aug. 6th (temp=0)10010067673373.3%
GPT-4o Mini (temp=1)10010067673373.3%
Qwen 2.5 72B10010010067073.3%
GPT-5.210010067333366.7%
Gemini 2.5 Flash1001001000060.0%
Llama 3.1 70B100676733053.3%
Mistral Small 3.2 24B10010000040.0%
Claude 3 Haiku1006700033.3%
GPT-4o Mini (temp=0)67000013.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Rocinante 12B100100100100100100.0%
Gemini 3 Pro (Preview)1001001001006793.3%
Claude 3.5 Sonnet1001001001006793.3%
Mistral Large 31001001001006793.3%
GPT-4o, Aug. 6th (temp=1)1001001001006793.3%
GPT-4o, May 13th (temp=1)1001001001006793.3%
GPT-4.1 Mini1001001001006793.3%
Gemini 2.5 Flash1001001001006793.3%
Hermes 3 405B1001001001006793.3%
Gemini 2.5 Flash Lite1001001001006793.3%
Cohere Command R+ (Aug. 2024)1001001001006793.3%
Mistral NeMO1001001001006793.3%
Ministral 3B1001001001006793.3%
o4 Mini1001001001003386.7%
Grok 4100100100676786.7%
Grok 4 Fast100100100676786.7%
Z.AI GLM 4.5100100100676786.7%
Claude 3.5 Haiku100100100676786.7%
Arcee AI: Trinity Large (Preview)100100100676786.7%
Hermes 3 70B1001001001003386.7%
Arcee AI: Trinity Mini100100100676786.7%
Grok 4.1 Fast100100100673380.0%
Stealth: Aurora Alpha100100100673380.0%
Claude 3.7 Sonnet100100100673380.0%
GPT-4o, Aug. 6th (temp=0)100100100673380.0%
GPT-4o Mini (temp=1)10010067676780.0%
DeepSeek V3 (2024-12-26)10010067673373.3%
GPT-4o Mini (temp=0)10010067673373.3%
Llama 3.1 Nemotron 70B100100100333373.3%
Qwen 2.5 72B10010067673373.3%
Mistral Small 3.2 24B10010067673373.3%
Claude 3 Haiku1006767676773.3%
Llama 3.1 70B10010067333366.7%
Llama 3.1 8B1006767333360.0%
DeepSeek-V2 Chat100676767060.0%
GPT-4o, May 13th (temp=0)10067330040.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Minimax M2.5100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Mistral Large 3100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
o4 Mini High1001001001006793.3%
Claude Opus 4.51001001001006793.3%
Gemini 3 Pro (Preview)1001001001006793.3%
Claude Opus 41001001001006793.3%
Claude Sonnet 41001001001006793.3%
Grok 4.1 Fast1001001001006793.3%
GPT-4.11001001001006793.3%
Gemini 3 Flash (Preview)1001001001006793.3%
Grok 4 Fast1001001001006793.3%
DeepSeek-V2 Chat1001001001006793.3%
Claude 3.7 Sonnet1001001001006793.3%
GPT-4o, May 13th (temp=1)1001001001006793.3%
ByteDance Seed 1.6 Flash1001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Mistral Large1001001001006793.3%
Gemma 3 27B1001001001006793.3%
Ministral 3 14B1001001001006793.3%
Arcee AI: Trinity Large (Preview)1001001001006793.3%
Ministral 3 8B1001001001006793.3%
Ministral 3 3B1001001001006793.3%
Llama 3.1 8B1001001001006793.3%
Ministral 8B1001001001006793.3%
ByteDance Seed 1.6100100100676786.7%
Z.AI GLM 4.7 Flash100100100676786.7%
GPT-5 Nano100100100676786.7%
Claude Haiku 4.51001001001003386.7%
Z.AI GLM 4.5100100100676786.7%
Llama 3.1 Nemotron 70B100100100676786.7%
Hermes 3 70B1001001001003386.7%
Rocinante 12B1001001001003386.7%
Qwen 3.5 397B A17B10010067676780.0%
Qwen 3.5 Plus (2026-02-15)10010067676780.0%
o4 Mini100100100673380.0%
DeepSeek V3 (2025-03-24)100100100673380.0%
DeepSeek V3 (2024-12-26)100100100100080.0%
Mistral Medium 3.1100100100100080.0%
Hermes 3 405B100100100673380.0%
Mistral Small Creative10010067676780.0%
Arcee AI: Trinity Mini100100100673380.0%
Mistral NeMO10010067676780.0%
Gemma 3 4B100100100673380.0%
Z.AI GLM 4.710010067673373.3%
Claude 3 Haiku10010010067073.3%
Cohere Command R+ (Aug. 2024)10010067673373.3%
Z.AI GLM 4.610010067333366.7%
GPT-4o, May 13th (temp=0)10010067333366.7%
Gemma 3 12B1006767673366.7%
Mistral Small 3.2 24B10010010033066.7%
GPT-4o Mini (temp=0)1006767333360.0%
Llama 3.1 70B1001006733060.0%
Gemini 2.5 Flash67676767053.3%
Ministral 3B100100670053.3%
Qwen 2.5 72B100673333046.7%
Gemini 2.5 Flash Lite67673333040.0%
GPT-4o, Aug. 6th (temp=0)6767330033.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Minimax M2.51001001001006793.3%
Grok 4.1 Fast1001001001006793.3%
Qwen 3.5 Plus (2026-02-15)1001001001006793.3%
Grok 4 Fast1001001001006793.3%
Z.AI GLM 4.51001001001006793.3%
DeepSeek V3.11001001001006793.3%
Writer: Palmyra X51001001001006793.3%
GPT-4o Mini (temp=1)1001001001006793.3%
Mistral Large1001001001006793.3%
Gemma 3 27B1001001001006793.3%
Mistral Small Creative1001001001006793.3%
Ministral 3 8B1001001001006793.3%
Arcee AI: Trinity Mini1001001001006793.3%
Ministral 3 3B1001001001006793.3%
GPT-4.1 Nano1001001001006793.3%
Ministral 8B1001001001006793.3%
WizardLM 2 8x22b1001001001006793.3%
Ministral 3B1001001001006793.3%
Rocinante 12B1001001001006793.3%
Grok 4100100100676786.7%
DeepSeek V3.2100100100676786.7%
Gemini 2.5 Flash100100100676786.7%
Cohere Command R+ (Aug. 2024)1001001001003386.7%
Gemma 3 4B1001001001003386.7%
Hermes 3 405B1001001001003386.7%
o4 Mini10010067676780.0%
GPT-4.1 Mini10010067676780.0%
Arcee AI: Trinity Large (Preview)100100100100080.0%
Claude 3.7 Sonnet10010010067073.3%
Mistral NeMO10010010067073.3%
DeepSeek V3 (2024-12-26)10010010033066.7%
GPT-4o, May 13th (temp=1)1006767673366.7%
Claude 3 Haiku1006767673366.7%
DeepSeek-V2 Chat1001001000060.0%
Hermes 3 70B1001003333053.3%
Llama 3.1 70B100673333046.7%
GPT-4o, Aug. 6th (temp=0)676733333346.7%
GPT-4o Mini (temp=0)100673333046.7%
Llama 3.1 Nemotron 70B676733333346.7%
Mistral Small 3.2 24B10067330040.0%
Stealth: Aurora Alpha10033330033.3%
Qwen 2.5 72B100000020.0%
GPT-4o, May 13th (temp=0)333300013.3%