Paragraph length variance

Test: Bad Writing Habits

Avg. Score
92.3%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Mistral Small Creative100.0%$0.00079.1s100%
2Ministral 3 14B100.0%$0.000711.7s100%
3Ministral 3 8B100.0%$0.000819.6s100%
4Ministral 8B99.9%$0.000410.4s98%
5ByteDance Seed 1.6 Flash100.0%$0.001327.3s100%
6Mistral Large 3100.0%$0.003330.3s100%
7Ministral 3B99.5%$0.00018.1s94%
8Writer: Palmyra X5100.0%$0.01122.0s100%
9Mistral Medium 3.1100.0%$0.004836.5s100%
10Mistral Large 299.9%$0.01329.4s99%
11Gemma 3 27B99.9%$0.000652.6s98%
12Gemini 3 Flash (Preview)99.3%$0.007819.6s94%
13Minimax M2.5100.0%$0.00341.3m100%
14Mistral Large99.6%$0.01430.9s93%
15Gemma 3 12B98.9%$0.000441.3s92%
16Ministral 3 3B98.6%$0.000511.1s85%
17Claude Sonnet 4.6100.0%$0.03139.3s100%
18Qwen 3.5 Plus (2026-02-15)98.0%$0.006031.5s89%
19Claude Haiku 4.599.0%$0.01121.6s87%
20GPT-4.1 Mini97.6%$0.002719.0s85%
21DeepSeek V3 (2025-03-24)98.9%$0.001439.4s87%
22Z.AI GLM 4.799.8%$0.0101.4m97%
23Claude Sonnet 4.599.4%$0.03538.1s93%
24Gemma 3 4B97.2%$0.000220.0s79%
25Z.AI GLM 4.7 Flash99.0%$0.00171.2m88%
26DeepSeek V3.299.7%$0.00141.9m96%
27GPT-4.198.3%$0.01844.7s87%
28Z.AI GLM 599.0%$0.00841.2m87%
29Claude 3.7 Sonnet99.0%$0.04246.7s93%
30DeepSeek V3 (2024-12-26)97.9%$0.002154.6s82%
31Grok 4.1 Fast96.7%$0.001837.8s79%
32Claude 3.5 Haiku94.5%$0.003510.8s77%
33Claude Opus 4.5100.0%$0.07053.4s100%
34DeepSeek-V2 Chat97.5%$0.002153.3s79%
35Gemini 3 Pro (Preview)99.7%$0.05554.4s94%
36DeepSeek V3.198.6%$0.00201.8m88%
37GPT-5.2100.0%$0.0561.5m100%
38GPT-5.1100.0%$0.0541.8m100%
39Claude Opus 4.6100.0%$0.0781.2m100%
40Z.AI GLM 4.593.8%$0.005142.1s75%
41Z.AI GLM 4.695.9%$0.006551.5s74%
42GPT-5 Mini95.2%$0.010057.4s77%
43GPT-4o Mini (temp=0)94.5%$0.001234.8s69%
44Gemini 2.5 Pro96.4%$0.03636.2s78%
45Claude Sonnet 495.3%$0.03243.7s77%
46GPT-4o Mini (temp=1)91.1%$0.001234.8s71%
47MoonshotAI: Kimi K2.599.7%$0.0193.2m96%
48Qwen 2.5 72B91.0%$0.001036.7s67%
49Qwen 3.5 397B A17B98.6%$0.0143.0m90%
50GPT-5100.0%$0.0652.8m100%
51Arcee AI: Trinity Large (Preview)88.4%$0.000043.6s67%
52GPT-4o, May 13th (temp=1)89.7%$0.03314.4s69%
53Llama 3.1 8B91.7%$0.00031.3m60%
54Gemini 3.1 Pro (Preview)99.0%$0.1071.8m90%
55Gemini 2.5 Flash86.7%$0.005210.6s54%
56o4 Mini88.6%$0.01525.7s55%
57GPT-4o, May 13th (temp=0)89.6%$0.03514.1s56%
58GPT-4.1 Nano83.7%$0.000713.3s54%
59Rocinante 12B83.5%$0.001438.4s52%
60Claude 3.5 Sonnet86.7%$0.04835.5s62%
61Gemini 2.5 Flash Lite82.0%$0.00099.5s46%
62Grok 4 Fast82.5%$0.001724.1s48%
63WizardLM 2 8x22b88.5%$0.00261.8m53%
64GPT-4o, Aug. 6th (temp=0)82.3%$0.02322.7s51%
65o4 Mini High85.9%$0.02547.2s47%
66Llama 3.1 70B79.4%$0.001529.4s45%
67GPT-4o, Aug. 6th (temp=1)79.5%$0.01824.4s50%
68Cohere Command R+ (Aug. 2024)82.1%$0.02052.5s50%
69Mistral NeMO77.6%$0.000510.1s41%
70Grok 486.7%$0.0481.7m59%
71Hermes 3 405B78.0%$0.003253.2s48%
72Hermes 3 70B79.8%$0.00101.2m47%
73ByteDance Seed 1.686.1%$0.0132.5m56%
74Claude Opus 498.5%$0.2091.4m89%
75Llama 3.1 Nemotron 70B76.1%$0.003831.7s41%
76Claude 3 Haiku72.6%$0.002514.9s38%
77Stealth: Aurora Alpha69.2%$0.00009.8s38%
78GPT-5 Nano65.5%$0.00421.4m43%
79Arcee AI: Trinity Mini64.3%$0.00039.2s20%
80Mistral Small 3.2 24B61.8%$0.00695.7m20%
92.29%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Large100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
MoonshotAI: Kimi K2.51001001001009999.8%
GPT-4o Mini (temp=0)1001001001009899.6%
Gemma 3 27B1001001001009799.5%
Claude Sonnet 41001001001009699.2%
DeepSeek-V2 Chat1001001001009498.8%
Gemini 2.5 Flash Lite1001001001009198.2%
Claude Opus 41001001001009198.2%
o4 Mini1001001001009198.1%
GPT-4o, May 13th (temp=1)100100100989197.9%
Gemini 2.5 Flash100100100979297.8%
Qwen 2.5 72B100100100939196.9%
Claude 3.7 Sonnet1001001001008496.8%
o4 Mini High1001001001008396.6%
Gemma 3 12B100100100997594.8%
Claude 3.5 Haiku10010098888594.0%
Arcee AI: Trinity Large (Preview)10010098898393.8%
GPT-4o, May 13th (temp=0)100100100907993.8%
Z.AI GLM 4.5100100100868293.6%
Claude 3.5 Sonnet10010091878392.2%
Z.AI GLM 4.6100100100857191.2%
GPT-4o Mini (temp=1)10010090877891.0%
Grok 4100100100797089.7%
GPT-4o, Aug. 6th (temp=0)10010086816285.8%
Hermes 3 405B10010083835984.9%
Rocinante 12B100100100636084.5%
Hermes 3 70B1009995705583.9%
Ministral 3 3B100100100753682.3%
Mistral NeMO1009693764782.3%
ByteDance Seed 1.610010075736181.7%
GPT-4o, Aug. 6th (temp=1)1008179757381.7%
Llama 3.1 70B10010083695681.6%
Mistral Small 3.2 24B10010090675081.3%
GPT-4.1 Nano989783636280.4%
Claude 3 Haiku10010082675280.1%
GPT-5 Nano978576696578.2%
Llama 3.1 Nemotron 70B888683745577.0%
Cohere Command R+ (Aug. 2024)10010091781376.5%
Llama 3.1 8B100100100532876.3%
Stealth: Aurora Alpha976948424159.2%
Arcee AI: Trinity Mini80473215034.9%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Mistral Large 3100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Claude 3.5 Sonnet1001001001009999.8%
Ministral 3 3B1001001001009699.2%
Arcee AI: Trinity Large (Preview)1001001001008897.6%
Claude 3.5 Haiku1001001001008797.5%
Llama 3.1 8B100100100959197.1%
Gemini 2.5 Pro1001001001007595.0%
ByteDance Seed 1.61001001001007494.7%
Arcee AI: Trinity Mini1001001001005390.6%
GPT-4o Mini (temp=0)10010099955690.0%
DeepSeek V3 (2024-12-26)1001001001004689.2%
Z.AI GLM 4.51009893876588.5%
Gemini 2.5 Flash100100100924988.2%
Llama 3.1 Nemotron 70B100100100824986.2%
GPT-4o Mini (temp=1)100100100625984.0%
Z.AI GLM 4.61001001001002083.9%
Rocinante 12B100100100100080.0%
Gemma 3 4B100100100493175.9%
Hermes 3 405B999264615474.0%
o4 Mini High10010074474472.9%
GPT-4o, May 13th (temp=1)1008365595071.4%
Stealth: Aurora Alpha877271655269.4%
GPT-4.1 Nano1007474633469.1%
o4 Mini877574624568.5%
Grok 4.1 Fast1007566524668.0%
Cohere Command R+ (Aug. 2024)1008174602467.8%
Gemini 2.5 Flash Lite96928857467.3%
Mistral Small 3.2 24B1001006850063.8%
Mistral NeMO91878649663.6%
Qwen 2.5 72B1008262403463.5%
GPT-4o, Aug. 6th (temp=1)816657564160.0%
Hermes 3 70B100866931057.0%
GPT-4o, May 13th (temp=0)76727262056.5%
Llama 3.1 70B1009450171454.9%
GPT-5 Nano705947474253.0%
Grok 4776439291845.4%
Grok 4 Fast634843432644.5%
Claude 3 Haiku1008118121044.3%
GPT-4o, Aug. 6th (temp=0)615344342142.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Claude 3.5 Haiku1001001001009799.5%
Llama 3.1 Nemotron 70B1001001001009699.2%
GPT-4.1 Nano100100100989298.2%
GPT-4o Mini (temp=0)100100100979398.1%
DeepSeek V3 (2024-12-26)1001001001008997.8%
GPT-4o Mini (temp=1)100100100969197.5%
GPT-4o, May 13th (temp=1)100100100958796.3%
Grok 4.1 Fast1001001001007394.7%
Mistral NeMO100100100967193.3%
Cohere Command R+ (Aug. 2024)1001001001006693.2%
Gemma 3 4B1001001001006593.1%
Hermes 3 405B1001001001006593.1%
Llama 3.1 70B100100100867993.0%
Gemini 2.5 Flash Lite100100100847992.6%
Arcee AI: Trinity Large (Preview)100100100897091.8%
Claude 3 Haiku10010095907391.5%
GPT-4o, Aug. 6th (temp=0)1001001001005791.3%
GPT-4o, Aug. 6th (temp=1)1009695838090.9%
Rocinante 12B100100100787390.2%
Hermes 3 70B10010088877489.7%
ByteDance Seed 1.61009894945087.3%
Arcee AI: Trinity Mini10010099933986.1%
Grok 410010099755786.1%
Grok 4 Fast10010083766785.2%
Stealth: Aurora Alpha1009084625277.7%
GPT-5 Nano858171715472.5%
Mistral Small 3.2 24B85846053757.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Mistral NeMO1001001001009699.3%
Gemma 3 12B1001001001009699.2%
GPT-4o Mini (temp=1)1001001001009398.5%
GPT-4o, May 13th (temp=1)1001001001009198.3%
GPT-4.1 Nano100100100969397.9%
Claude 3.5 Sonnet1001001001008997.7%
Grok 4 Fast100100100959397.5%
GPT-4o, Aug. 6th (temp=0)100100100978997.2%
Z.AI GLM 4.51001001001008597.1%
Claude 3.5 Haiku1001001001008597.1%
Grok 41001001001008496.8%
GPT-4o Mini (temp=0)100100100958896.6%
Qwen 2.5 72B1001001001008196.2%
Gemini 2.5 Flash Lite1001001001007895.6%
GPT-4o, May 13th (temp=0)100100100907993.8%
Hermes 3 70B100100100897592.8%
Llama 3.1 8B100100100877692.5%
Claude 3 Haiku1001001001006192.3%
Hermes 3 405B100100100867492.0%
Llama 3.1 70B10010096887591.8%
Arcee AI: Trinity Large (Preview)100100100807290.4%
ByteDance Seed 1.6100100100965490.0%
Gemini 2.5 Flash10010099737088.5%
Cohere Command R+ (Aug. 2024)100100100974087.5%
Llama 3.1 Nemotron 70B10010097835486.8%
Mistral Small 3.2 24B1009587854181.5%
Arcee AI: Trinity Mini100100100100681.2%
GPT-4o, Aug. 6th (temp=1)998577746279.3%
GPT-5 Nano878381737179.1%
Stealth: Aurora Alpha1007064462661.2%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B1001001001009899.6%
Claude 3.5 Sonnet1001001001009799.4%
Gemma 3 12B1001001001009699.2%
Gemma 3 4B1001001001008897.6%
Z.AI GLM 4.51001001001008897.6%
GPT-4o, Aug. 6th (temp=0)10010099998897.1%
Cohere Command R+ (Aug. 2024)100100100978696.6%
o4 Mini High1001001001008196.2%
GPT-4o Mini (temp=1)10010098928695.2%
Llama 3.1 8B1001001001007094.0%
GPT-4o, May 13th (temp=1)100100100997093.8%
Mistral NeMO100100100937693.7%
Grok 4100100100976893.1%
Llama 3.1 70B100100100835788.1%
GPT-4o, Aug. 6th (temp=1)10010094856088.0%
Gemini 2.5 Flash Lite10010086827187.7%
Hermes 3 405B10010086856086.3%
ByteDance Seed 1.610010094765985.8%
Arcee AI: Trinity Large (Preview)10010084716584.0%
Hermes 3 70B10010080736483.3%
Claude 3 Haiku1009994803982.3%
GPT-4o Mini (temp=0)948885717081.7%
Grok 4 Fast10010081715781.6%
GPT-4.1 Nano938378736177.6%
Arcee AI: Trinity Mini10010095573677.5%
Rocinante 12B100100100631776.0%
Mistral Small 3.2 24B1001008579072.9%
GPT-5 Nano958469674972.9%
Stealth: Aurora Alpha837773616070.7%
Llama 3.1 Nemotron 70B1007366602264.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
DeepSeek V3 (2024-12-26)1001001001009899.7%
Gemini 2.5 Pro1001001001009799.5%
Claude Sonnet 41001001001008797.5%
Z.AI GLM 4.51001001001008697.3%
o4 Mini1001001001008697.3%
Qwen 2.5 72B100100100998797.3%
Ministral 3B1001001001008597.1%
Grok 4100100100998697.0%
Grok 4 Fast1001001001008596.9%
GPT-4o Mini (temp=1)1001001001008196.2%
GPT-4o, May 13th (temp=0)10010096919095.4%
Cohere Command R+ (Aug. 2024)100100100997394.4%
Arcee AI: Trinity Large (Preview)100100100967494.1%
DeepSeek-V2 Chat1001001001006593.1%
Claude 3.5 Haiku1001001001006693.1%
GPT-4.1 Mini1001001001006492.8%
Claude 3.5 Sonnet10010098937192.4%
GPT-4.1 Nano100100100886991.5%
Stealth: Aurora Alpha10010095837791.2%
Hermes 3 405B100100100926391.0%
Llama 3.1 Nemotron 70B100100100797190.0%
Llama 3.1 8B1001001001004588.9%
GPT-4o, May 13th (temp=1)1009990777087.2%
Hermes 3 70B100100100775887.0%
Arcee AI: Trinity Mini1001001001003186.2%
Z.AI GLM 4.6100100100923485.1%
GPT-4o, Aug. 6th (temp=1)10010084716283.4%
Claude 3 Haiku10010082676682.9%
ByteDance Seed 1.610010075685880.3%
Gemini 2.5 Flash1009987644177.9%
Rocinante 12B1008278666177.5%
GPT-4o, Aug. 6th (temp=0)1008966645474.7%
Llama 3.1 70B10010085611371.9%
GPT-5 Nano717067656066.5%
Mistral NeMO876133272546.3%
Gemini 2.5 Flash Lite81685611043.2%
Mistral Small 3.2 24B7367260033.2%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 8B100100100100100100.0%
Gemini 2.5 Pro1001001001009999.6%
DeepSeek V3.21001001001009599.1%
DeepSeek V3.11001001001009599.0%
Ministral 3B1001001001009498.8%
GPT-4o, May 13th (temp=0)1001001001009098.1%
DeepSeek V3 (2025-03-24)1001001001008897.5%
GPT-4o Mini (temp=0)100100100979097.4%
GPT-4.110010099979097.0%
Z.AI GLM 4.7 Flash100100100959097.0%
Z.AI GLM 4.7100100100968696.4%
Gemma 3 12B1001001001008296.4%
Qwen 3.5 397B A17B1001001001008296.4%
Ministral 3 3B1001001001008095.9%
Grok 4.1 Fast1001001001007895.6%
Gemini 2.5 Flash1001001001007695.2%
Claude 3.7 Sonnet100100100958095.0%
Claude Sonnet 4.5100100100977794.7%
Z.AI GLM 4.510010098967092.7%
Gemma 3 4B10010099976492.2%
Llama 3.1 8B100100100876890.9%
Qwen 3.5 Plus (2026-02-15)10010096897090.9%
Gemini 3.1 Pro (Preview)100100100827290.8%
Z.AI GLM 4.610010089857690.0%
Gemini 3 Flash (Preview)1009188858389.5%
ByteDance Seed 1.610010093787589.2%
DeepSeek-V2 Chat100100100836289.0%
GPT-4.1 Mini10010095806287.3%
Gemini 2.5 Flash Lite10010092766887.3%
GPT-4o, May 13th (temp=1)10010094796487.3%
Arcee AI: Trinity Large (Preview)989892875886.6%
GPT-4o Mini (temp=1)10010094804683.9%
o4 Mini878484828083.4%
Qwen 2.5 72B10010086665781.7%
Rocinante 12B100100100535080.6%
GPT-5 Mini1009281695679.6%
Hermes 3 70B1009281744678.5%
Claude 3 Haiku1008682744978.3%
GPT-4o, Aug. 6th (temp=0)898473696776.7%
Claude 3.5 Sonnet1008677693974.3%
Claude Sonnet 4968769635574.1%
Grok 41009073703573.6%
Grok 4 Fast1008786731973.0%
Mistral NeMO948465554468.4%
Llama 3.1 70B986958584365.3%
Claude 3.5 Haiku847364515164.6%
Hermes 3 405B1007760553064.3%
GPT-5 Nano746863605664.2%
Cohere Command R+ (Aug. 2024)929154383461.9%
GPT-4.1 Nano877666562261.3%
Arcee AI: Trinity Mini1009366262261.1%
o4 Mini High10085710051.1%
GPT-4o, Aug. 6th (temp=1)615149464149.6%
Stealth: Aurora Alpha525148424246.8%
Llama 3.1 Nemotron 70B854728251640.1%
WizardLM 2 8x22b534139271434.8%
Mistral Small 3.2 24B43550010.5%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Gemini 2.5 Pro1001001001009599.1%
Qwen 3.5 397B A17B1001001001009098.1%
DeepSeek V3.11001001001008797.4%
Claude 3.7 Sonnet100100100939196.9%
Gemma 3 4B1001001001008296.5%
DeepSeek-V2 Chat100100100987494.3%
Z.AI GLM 4.510010094918593.9%
Claude Sonnet 4100100100927192.6%
Qwen 3.5 Plus (2026-02-15)10010087878291.2%
Gemini 2.5 Flash1009896867491.0%
Z.AI GLM 4.61001001001004789.5%
GPT-4.1100100100757089.0%
Arcee AI: Trinity Large (Preview)979291838088.5%
Grok 4.1 Fast10010096785886.5%
DeepSeek V3 (2024-12-26)10010098605883.3%
Llama 3.1 8B100100100100280.3%
Qwen 2.5 72B10010083575679.2%
GPT-5 Mini1009472615576.5%
Cohere Command R+ (Aug. 2024)1008582673874.4%
GPT-4o, May 13th (temp=1)1008880642771.7%
GPT-4o Mini (temp=1)817270704668.0%
Rocinante 12B1006865544967.2%
Hermes 3 70B1001006557164.6%
Gemini 2.5 Flash Lite915756514660.2%
Llama 3.1 Nemotron 70B948852343159.8%
Hermes 3 405B855856483756.8%
GPT-4.1 Nano1006564332256.8%
Mistral Small 3.2 24B1008940292456.4%
Claude 3.5 Sonnet898149302755.1%
GPT-4o, Aug. 6th (temp=1)89654946851.5%
Grok 41005736352751.1%
GPT-4o Mini (temp=0)100863525049.3%
GPT-4o, May 13th (temp=0)100684718547.6%
Mistral NeMO1005333312147.4%
GPT-4o, Aug. 6th (temp=0)785850211444.3%
o4 Mini94901815043.5%
WizardLM 2 8x22b735038312343.0%
Stealth: Aurora Alpha675644291842.9%
ByteDance Seed 1.610070149639.9%
o4 Mini High814134241438.8%
Llama 3.1 70B675025201735.9%
Claude 3 Haiku67443520033.1%
GPT-5 Nano422928262029.1%
Grok 4 Fast5647316027.9%
Arcee AI: Trinity Mini2353006.1%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Llama 3.1 70B1001001001009899.6%
Llama 3.1 8B1001001001009799.4%
Mistral Small 3.2 24B1001001001009598.9%
GPT-4o Mini (temp=1)1001001001009198.3%
GPT-4o, Aug. 6th (temp=1)100100100979197.6%
Gemini 2.5 Flash1001001001008797.5%
Claude Sonnet 41001001001008697.3%
GPT-5 Mini1001001001008697.2%
Arcee AI: Trinity Large (Preview)100100100998697.1%
Qwen 3.5 Plus (2026-02-15)100100100988095.6%
Grok 4 Fast1001001001007394.5%
ByteDance Seed 1.61001001001006893.7%
Grok 4100100100847792.3%
WizardLM 2 8x22b100100100847792.2%
Llama 3.1 Nemotron 70B10010088878191.1%
Mistral NeMO100100100965590.3%
Hermes 3 70B10010083828189.1%
GPT-4.1 Nano10010089854784.3%
Stealth: Aurora Alpha1009487845283.5%
Hermes 3 405B10010091703478.9%
Rocinante 12B10010071595176.2%
Claude 3 Haiku1008984643674.8%
Cohere Command R+ (Aug. 2024)897877635973.2%
Arcee AI: Trinity Mini1007361504866.2%
GPT-5 Nano807567633964.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
GPT-4.1 Mini1001001001009899.5%
Gemini 3 Flash (Preview)1001001001009699.3%
Grok 4 Fast1001001001009599.0%
Llama 3.1 70B1001001001009599.0%
GPT-4o Mini (temp=0)1001001001009498.8%
GPT-4o, May 13th (temp=0)1001001001009498.8%
Claude Opus 41001001001008997.8%
Arcee AI: Trinity Large (Preview)1001001001008897.7%
Claude Haiku 4.51001001001008797.3%
GPT-4o Mini (temp=1)100100100959297.3%
DeepSeek V3.11001001001008697.2%
Mistral NeMO1001001001008597.0%
GPT-4o, Aug. 6th (temp=0)100100100988095.6%
Qwen 3.5 397B A17B1001001001007695.1%
Grok 4100100100898494.6%
Claude 3.5 Sonnet100100100918194.4%
Claude 3.5 Haiku10010099997193.9%
Rocinante 12B10010098898193.6%
GPT-5 Mini1001001001006893.6%
o4 Mini100100100858093.2%
Claude 3 Haiku100100100826990.1%
GPT-4o, May 13th (temp=1)10010091886588.8%
Llama 3.1 Nemotron 70B10010084817788.4%
Llama 3.1 8B100100100746688.0%
WizardLM 2 8x22b100100100806087.9%
Stealth: Aurora Alpha10010085836887.1%
Qwen 2.5 72B10010099726486.9%
Mistral Small 3.2 24B10010092735684.3%
Hermes 3 70B10010096764783.8%
Cohere Command R+ (Aug. 2024)10010075696982.7%
GPT-4o, Aug. 6th (temp=1)1009777696782.0%
ByteDance Seed 1.6100100100604681.2%
o4 Mini High100100100100080.0%
Hermes 3 405B10010069625777.4%
Arcee AI: Trinity Mini10010099281368.0%
GPT-5 Nano725250494353.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)1001001001009899.6%
Gemini 3 Flash (Preview)1001001001009899.5%
Claude 3.5 Sonnet1001001001009498.7%
GPT-4o, Aug. 6th (temp=0)1001001001009298.4%
DeepSeek-V2 Chat1001001001009098.1%
GPT-4o Mini (temp=0)100100100959297.5%
Gemma 3 12B1001001001008697.1%
o4 Mini1001001001008496.7%
GPT-4.1 Mini100100100939196.7%
Claude 3.5 Haiku100100100938595.5%
Grok 4.1 Fast1001001001007795.4%
Llama 3.1 70B100100100908695.2%
Qwen 2.5 72B100100100987794.9%
Z.AI GLM 4.510010097908193.5%
Gemini 2.5 Flash100100100956792.3%
GPT-4o, May 13th (temp=1)100100100807791.4%
Hermes 3 70B100100100787891.2%
Gemini 2.5 Flash Lite10010088858391.1%
Arcee AI: Trinity Large (Preview)1001001001005290.5%
Cohere Command R+ (Aug. 2024)100100100836790.0%
Rocinante 12B1001001001004789.4%
Grok 410010096896189.2%
GPT-5 Mini10010096925789.0%
Grok 4 Fast10010089856988.5%
Mistral NeMO10010089787187.7%
GPT-4o Mini (temp=1)10010088757387.2%
Stealth: Aurora Alpha100100100844886.5%
GPT-4o, Aug. 6th (temp=1)10010098805386.2%
Mistral Small 3.2 24B1009990855385.3%
Hermes 3 405B928984816181.5%
ByteDance Seed 1.610010079514374.4%
GPT-4.1 Nano887975733970.7%
Claude 3 Haiku1008076633370.3%
Llama 3.1 Nemotron 70B1007965513065.0%
Arcee AI: Trinity Mini887863611460.6%
GPT-5 Nano555451504551.0%
WizardLM 2 8x22b615943403347.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Gemma 3 12B1001001001009599.0%
Gemma 3 27B1001001001009599.0%
Gemini 3 Flash (Preview)100100100989698.7%
Mistral Large 21001001001009398.6%
Gemini 3.1 Pro (Preview)100100100989398.3%
Ministral 3 3B1001001001009198.3%
Qwen 3.5 Plus (2026-02-15)1001001001009098.0%
Claude Sonnet 4.51001001001008797.5%
GPT-4o Mini (temp=0)1001001001008797.5%
MoonshotAI: Kimi K2.51001001001008196.2%
DeepSeek V3.11001001001008096.0%
GPT-4.11001001001008096.0%
GPT-4.1 Mini100100100987193.6%
Z.AI GLM 4.6100100100858093.0%
Claude Opus 4100100100897091.9%
Gemini 2.5 Pro10010091797288.4%
Claude 3.5 Haiku1009590846887.4%
WizardLM 2 8x22b1001001001003687.2%
GPT-5 Mini100100100696386.5%
Cohere Command R+ (Aug. 2024)1009793875185.5%
Qwen 3.5 397B A17B1008783797685.0%
GPT-4o, May 13th (temp=1)10010099655683.9%
Arcee AI: Trinity Large (Preview)10010088805183.8%
Grok 410010096675583.5%
Qwen 2.5 72B1009493695982.9%
Z.AI GLM 5100100100714482.9%
DeepSeek V3 (2025-03-24)100100100585482.4%
Z.AI GLM 4.51009889594778.7%
Claude 3.5 Sonnet898986686078.4%
GPT-4o, May 13th (temp=0)10010080653576.0%
Arcee AI: Trinity Mini10010087692275.7%
Rocinante 12B10010083672074.0%
ByteDance Seed 1.61007772694973.3%
Stealth: Aurora Alpha817768686371.3%
Grok 4 Fast1009160584671.0%
Hermes 3 70B1007675614270.8%
GPT-4o Mini (temp=1)1007773583668.7%
GPT-4o, Aug. 6th (temp=1)979578413168.5%
Claude Sonnet 4937861554967.1%
Hermes 3 405B1008461593066.6%
Claude 3 Haiku1006655545265.4%
GPT-4.1 Nano10010064252562.6%
o4 Mini High978558361458.1%
Mistral NeMO797856492557.4%
Gemini 2.5 Flash Lite866153483656.8%
Llama 3.1 Nemotron 70B10095655053.1%
GPT-4o, Aug. 6th (temp=0)98644437649.9%
o4 Mini100903821049.8%
Mistral Small 3.2 24B100713933549.7%
Gemini 2.5 Flash646260451449.1%
Llama 3.1 70B854437312344.0%
GPT-5 Nano524034313037.5%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Large100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Ministral 3 3B1001001001009999.8%
Z.AI GLM 4.71001001001009899.5%
Gemma 3 27B1001001001009599.0%
DeepSeek-V2 Chat1001001001009599.0%
GPT-5 Mini100100100999498.6%
Gemini 2.5 Pro1001001001009398.5%
Gemini 2.5 Flash1001001001009198.2%
GPT-4.1100100100949497.7%
Grok 4100100100969197.5%
Claude Haiku 4.51001001001008597.0%
Claude Sonnet 4.51001001001008396.6%
ByteDance Seed 1.61001001001008296.3%
DeepSeek V3.2100100100998396.3%
Claude Sonnet 4100100100998095.8%
GPT-4o Mini (temp=0)100100100908895.7%
Grok 4 Fast1001001001007895.6%
Llama 3.1 8B1001001001007795.4%
Z.AI GLM 4.61001001001007795.4%
GPT-4o Mini (temp=1)1001001001007695.3%
Qwen 3.5 Plus (2026-02-15)10010093938894.8%
Gemini 3 Pro (Preview)1001001001007394.6%
GPT-4.1 Nano10010096948394.5%
Claude 3.7 Sonnet10010091918894.0%
GPT-4o, May 13th (temp=1)1001001001007094.0%
GPT-4.1 Mini100100100967193.4%
Claude 3.5 Haiku1009895897390.8%
Gemini 2.5 Flash Lite10010095887090.6%
GPT-4o, May 13th (temp=0)10010093906990.2%
o4 Mini High10010095817389.6%
Qwen 2.5 72B10010092807288.9%
Claude Opus 410010094776387.0%
Z.AI GLM 4.51009592875686.0%
DeepSeek V3.11009696884985.9%
o4 Mini1009682777285.3%
Rocinante 12B100100100834285.0%
Z.AI GLM 4.7 Flash100100100755085.0%
GPT-4o, Aug. 6th (temp=0)100100100962884.8%
Arcee AI: Trinity Large (Preview)1009383815782.7%
Cohere Command R+ (Aug. 2024)1009383686481.6%
Llama 3.1 70B1009981734379.2%
Claude 3 Haiku979492585579.0%
Hermes 3 70B10010093581873.9%
GPT-4o, Aug. 6th (temp=1)1007974654873.3%
Mistral NeMO1008573603370.2%
GPT-5 Nano857472704769.7%
Hermes 3 405B828175684069.1%
Llama 3.1 Nemotron 70B737067665165.5%
Claude 3.5 Sonnet767166564562.8%
Stealth: Aurora Alpha847659501356.7%
Mistral Small 3.2 24B646160442550.8%
Arcee AI: Trinity Mini9657454040.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Claude 3.7 Sonnet1001001001009799.3%
Gemma 3 12B1001001001009799.3%
Claude Opus 41001001001009598.9%
MoonshotAI: Kimi K2.51001001001009398.6%
Gemini 2.5 Pro1001001001008997.7%
Qwen 3.5 Plus (2026-02-15)1001001001008196.2%
GPT-4o Mini (temp=1)1001001001006593.1%
ByteDance Seed 1.61001001001006593.0%
GPT-5 Mini1001001001006492.8%
DeepSeek V3 (2024-12-26)1001001001005991.8%
Grok 41001001001005590.9%
GPT-4.11001001001005490.8%
Llama 3.1 Nemotron 70B959490757485.7%
GPT-4o, Aug. 6th (temp=1)10010084747085.6%
DeepSeek-V2 Chat1001001001001883.5%
o4 Mini10010076696281.5%
Llama 3.1 70B1008888626079.7%
GPT-4o, May 13th (temp=1)10010071675478.5%
Z.AI GLM 4.510010077713977.4%
Arcee AI: Trinity Large (Preview)1007976725977.0%
Qwen 2.5 72B1009486822376.9%
Hermes 3 70B1009592662575.5%
o4 Mini High10010079533974.2%
Hermes 3 405B898875704974.2%
GPT-4o, Aug. 6th (temp=0)888676674572.5%
GPT-4.1 Nano1007774534970.7%
Stealth: Aurora Alpha897869604568.0%
GPT-5 Nano757474595166.5%
GPT-4o, May 13th (temp=0)979358491863.0%
Gemini 2.5 Flash Lite1007467531862.5%
Rocinante 12B1007453501859.2%
Cohere Command R+ (Aug. 2024)99826640057.4%
Claude 3.5 Sonnet777655492656.6%
Arcee AI: Trinity Mini100100468050.7%
Grok 4 Fast1005137171644.2%
Mistral NeMO8559449540.4%
Claude 3 Haiku61544127237.1%
Gemini 2.5 Flash67443727035.1%
Mistral Small 3.2 24B3426186016.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
GPT-4.1 Nano1001001001009999.8%
GPT-5 Mini1001001001009999.7%
Rocinante 12B100100100969097.3%
Claude 3.5 Sonnet100100100949297.2%
Mistral Small 3.2 24B1001001001008396.7%
GPT-4o, Aug. 6th (temp=1)1001001001008096.0%
Grok 4 Fast100100100918895.7%
Llama 3.1 70B1001001001007795.5%
Gemini 2.5 Flash Lite1001001001007595.0%
Hermes 3 405B1001001001006292.5%
Llama 3.1 Nemotron 70B100100100877291.9%
Llama 3.1 8B100100100896691.0%
Grok 410010095797088.9%
GPT-5 Nano10010098735986.0%
Arcee AI: Trinity Large (Preview)10010076747284.4%
Cohere Command R+ (Aug. 2024)10010098753982.4%
Hermes 3 70B10010081676282.0%
Mistral NeMO10010063585575.3%
Claude 3 Haiku757465524361.9%
Arcee AI: Trinity Mini100836359061.0%
Stealth: Aurora Alpha996354483259.2%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Rocinante 12B100100100100100100.0%
GPT-4o, May 13th (temp=0)1001001001009899.6%
Gemma 3 4B1001001001009699.2%
Mistral NeMO1001001001009599.0%
GPT-4.1 Nano1001001001009599.0%
GPT-4.1 Mini1001001001009598.9%
Gemini 2.5 Flash Lite1001001001009498.9%
Cohere Command R+ (Aug. 2024)1001001001009498.6%
GPT-4o Mini (temp=0)100100100979598.4%
Z.AI GLM 4.5100100100969598.1%
Gemma 3 12B1001001001008797.5%
Qwen 2.5 72B100100100939196.7%
Ministral 3B1001001001007695.2%
o4 Mini High100100100987594.7%
Grok 4 Fast100100100917593.4%
Gemini 3.1 Pro (Preview)100100100986893.2%
Arcee AI: Trinity Large (Preview)100100100827791.7%
Arcee AI: Trinity Mini100100100876991.2%
GPT-4o, May 13th (temp=1)100100100866389.8%
Claude 3.5 Haiku100100100915589.2%
ByteDance Seed 1.61001001001004488.9%
Claude 3 Haiku10010087817588.6%
Hermes 3 405B10010091865987.1%
Claude 3.5 Sonnet100100100726387.0%
Hermes 3 70B10010092715483.3%
Llama 3.1 Nemotron 70B1009889794983.2%
GPT-4o, Aug. 6th (temp=1)1009190864582.3%
Stealth: Aurora Alpha1008482776782.0%
GPT-5 Nano907873664570.3%
Mistral Small 3.2 24B1001005249962.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
GPT-4.11001001001009999.8%
Qwen 3.5 Plus (2026-02-15)1001001001009899.5%
o4 Mini1001001001009799.4%
DeepSeek-V2 Chat1001001001009699.3%
Claude Sonnet 41001001001009498.8%
o4 Mini High1001001001009498.7%
GPT-4.1 Nano1001001001009298.4%
Qwen 2.5 72B1001001001009097.9%
Gemma 3 12B100100100949497.5%
Mistral NeMO100100100998797.2%
GPT-4.1 Mini1001001001008597.0%
Gemma 3 4B1001001001008597.0%
Grok 4 Fast1001001001008396.6%
GPT-4o, May 13th (temp=1)10010098949096.4%
Cohere Command R+ (Aug. 2024)1001001001006292.3%
Rocinante 12B1001001001006091.9%
Gemini 2.5 Flash Lite100100100966191.5%
Grok 41009692908091.5%
GPT-4o Mini (temp=1)10010093837991.1%
GPT-4o, Aug. 6th (temp=0)10010091837890.3%
GPT-4o, Aug. 6th (temp=1)10010094787489.3%
Gemini 2.5 Flash1009797826889.0%
Llama 3.1 70B100100100964688.4%
Arcee AI: Trinity Large (Preview)1009287867287.5%
GPT-5 Nano1009486717084.1%
Hermes 3 405B1009680785982.7%
Claude 3 Haiku969379785680.3%
Hermes 3 70B1008875706479.5%
Llama 3.1 8B100100100571874.9%
Mistral Small 3.2 24B100957157064.6%
Stealth: Aurora Alpha825348362949.7%
Llama 3.1 Nemotron 70B655757482149.6%
Arcee AI: Trinity Mini514839382239.4%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Claude 3.7 Sonnet1001001001009999.8%
DeepSeek V3.21001001001009699.3%
GPT-4o, May 13th (temp=0)1001001001009699.2%
Z.AI GLM 4.61001001001009498.9%
Qwen 3.5 Plus (2026-02-15)1001001001009498.7%
Gemma 3 4B1001001001009398.6%
Claude 3.5 Haiku1001001001009298.4%
o4 Mini100100100989298.1%
Ministral 8B1001001001009097.9%
GPT-4.1 Mini100100100949397.5%
Qwen 2.5 72B100100100959297.3%
o4 Mini High1001001001007795.5%
Z.AI GLM 4.51001001001007494.8%
GPT-4o Mini (temp=1)100100100898494.6%
Mistral Large1001001001006893.6%
Claude Sonnet 41001001001006693.2%
GPT-4.1 Nano100100100877893.0%
Llama 3.1 Nemotron 70B10010096917692.7%
Claude Haiku 4.51001001001003987.8%
Mistral NeMO10010098825987.7%
GPT-4o, May 13th (temp=1)100100100776087.5%
GPT-4o, Aug. 6th (temp=1)1009288856585.9%
Stealth: Aurora Alpha100100100813282.7%
Cohere Command R+ (Aug. 2024)100100100693881.5%
GPT-5 Nano1009875745981.3%
Llama 3.1 8B100100100861981.2%
Arcee AI: Trinity Mini10010099673880.9%
GPT-4o, Aug. 6th (temp=0)10010080645880.3%
Rocinante 12B10010097624079.6%
Claude 3 Haiku948181724975.6%
Claude 3.5 Sonnet1007872655974.7%
Arcee AI: Trinity Large (Preview)988783484672.1%
Hermes 3 70B1008767574571.2%
Llama 3.1 70B956864574766.2%
Gemini 2.5 Flash100696963360.8%
Gemini 2.5 Pro776356553757.6%
Gemini 2.5 Flash Lite1001006523057.5%
Hermes 3 405B84834741552.0%
Mistral Small 3.2 24B81726811146.5%