Adverbs in dialogue tags

Test: Bad Writing Habits

Avg. Score
89.8%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Stealth: Aurora Alpha99.3%$0.00009.8s89%
2ByteDance Seed 1.6 Flash99.2%$0.001327.3s88%
3Grok 4.1 Fast98.7%$0.001837.8s89%
4Ministral 8B98.2%$0.000410.4s78%
5Qwen 2.5 72B98.6%$0.001036.7s81%
6Gemini 3 Flash (Preview)98.5%$0.007819.6s78%
7o4 Mini97.4%$0.01525.7s83%
8Ministral 3 3B96.8%$0.000511.1s70%
9Mistral NeMO96.1%$0.000510.1s69%
10Qwen 3.5 Plus (2026-02-15)95.9%$0.006031.5s70%
11Z.AI GLM 4.696.9%$0.006551.5s73%
12Ministral 3B94.8%$0.00018.1s63%
13GPT-4o, May 13th (temp=0)96.0%$0.03514.1s73%
14WizardLM 2 8x22b96.0%$0.00261.8m80%
15Mistral Small Creative94.2%$0.00079.1s57%
16Gemini 2.5 Pro96.2%$0.03636.2s73%
17GPT-4o Mini (temp=0)94.2%$0.001234.8s62%
18Writer: Palmyra X594.2%$0.01122.0s62%
19Gemini 2.5 Flash Lite91.0%$0.00099.5s57%
20Z.AI GLM 4.591.8%$0.005142.1s65%
21GPT-4o, Aug. 6th (temp=0)93.1%$0.02322.7s63%
22Mistral Medium 3.193.4%$0.004836.5s58%
23Grok 4 Fast92.2%$0.001724.1s55%
24Gemini 2.5 Flash90.8%$0.005210.6s53%
25GPT-4.194.5%$0.01844.7s60%
26Cohere Command R+ (Aug. 2024)94.0%$0.02052.5s63%
27Ministral 3 8B90.8%$0.000819.6s52%
28Z.AI GLM 4.794.2%$0.0101.4m64%
29Gemini 3 Pro (Preview)95.7%$0.05554.4s71%
30Mistral Large 292.0%$0.01329.4s54%
31Qwen 3.5 397B A17B98.9%$0.0143.0m79%
32o4 Mini High93.3%$0.02547.2s59%
33Claude Sonnet 4.593.5%$0.03538.1s60%
34Gemma 3 4B89.6%$0.000220.0s49%
35DeepSeek V3.294.2%$0.00141.9m64%
36Mistral Large92.1%$0.01430.9s50%
37GPT-5 Mini90.7%$0.010057.4s57%
38GPT-5.195.4%$0.0541.8m78%
39GPT-5.296.7%$0.0561.5m73%
40Z.AI GLM 4.7 Flash91.8%$0.00171.2m55%
41Ministral 3 14B88.8%$0.000711.7s43%
42Z.AI GLM 590.6%$0.00841.2m57%
43Llama 3.1 8B90.9%$0.00031.3m51%
44Claude Sonnet 4.691.5%$0.03139.3s52%
45Gemma 3 27B87.8%$0.000652.6s49%
46Mistral Large 388.9%$0.003330.3s41%
47Claude 3.5 Haiku87.3%$0.003510.8s37%
48Claude 3 Haiku85.5%$0.002514.9s39%
49GPT-4o Mini (temp=1)82.8%$0.001234.8s47%
50Arcee AI: Trinity Mini83.6%$0.00039.2s39%
51DeepSeek V3 (2025-03-24)86.7%$0.001439.4s41%
52ByteDance Seed 1.694.4%$0.0132.5m59%
53Claude Opus 4.695.2%$0.0781.2m63%
54Llama 3.1 70B85.1%$0.001529.4s38%
55GPT-596.8%$0.0652.8m76%
56Claude Opus 4.591.5%$0.07053.4s58%
57DeepSeek V3.188.1%$0.00201.8m48%
58Llama 3.1 Nemotron 70B84.1%$0.003831.7s36%
59Minimax M2.584.3%$0.00341.3m45%
60DeepSeek V3 (2024-12-26)84.5%$0.002154.6s39%
61DeepSeek-V2 Chat84.0%$0.002153.3s39%
62Gemma 3 12B82.5%$0.000441.3s36%
63Grok 492.2%$0.0481.7m53%
64MoonshotAI: Kimi K2.593.6%$0.0193.2m60%
65Claude Sonnet 484.8%$0.03243.7s41%
66Claude 3.5 Sonnet86.1%$0.04835.5s42%
67GPT-4o, May 13th (temp=1)83.0%$0.03314.4s37%
68Rocinante 12B81.7%$0.001438.4s32%
69GPT-4o, Aug. 6th (temp=1)80.5%$0.01824.4s37%
70Claude 3.7 Sonnet82.8%$0.04246.7s46%
71Claude Haiku 4.576.9%$0.01121.6s32%
72Arcee AI: Trinity Large (Preview)78.8%$0.000043.6s28%
73Gemini 3.1 Pro (Preview)93.3%$0.1071.8m59%
74GPT-4.1 Mini72.0%$0.002719.0s29%
75Hermes 3 405B74.1%$0.003253.2s22%
76GPT-5 Nano72.2%$0.00421.4m27%
77Mistral Small 3.2 24B92.8%$0.00695.7m50%
78Hermes 3 70B67.6%$0.00101.2m19%
79Claude Opus 491.7%$0.2091.4m60%
80GPT-4.1 Nano50.7%$0.000713.3s8%
89.76%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Mistral NeMO100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Gemini 2.5 Flash1001001001009298.4%
Claude Sonnet 4.61001001001009198.2%
GPT-4o, Aug. 6th (temp=0)1001001001008997.8%
o4 Mini1001001001008697.1%
Mistral Large1001001001008697.1%
Mistral Small 3.2 24B1001001001008596.9%
WizardLM 2 8x22b100100100928495.2%
Llama 3.1 70B1001001001007595.0%
Gemini 2.5 Flash Lite1001001001007595.0%
Gemma 3 12B1001001001007093.9%
Arcee AI: Trinity Large (Preview)1001001001007093.9%
DeepSeek V3.21001001001006793.3%
Mistral Large 31001001001006593.0%
Claude Opus 4100100100867592.3%
GPT-4.1 Mini1001001001005791.4%
Ministral 3B1001001001005791.4%
Gemini 3.1 Pro (Preview)1001001001005290.4%
Claude Haiku 4.51001001001004288.4%
Llama 3.1 Nemotron 70B1001001001003386.7%
Claude 3.7 Sonnet100100100706486.6%
GPT-5 Nano100100100874486.1%
GPT-4o Mini (temp=1)100100100854084.9%
GPT-5 Mini1001001001002484.7%
GPT-4o, May 13th (temp=1)100100100824084.5%
DeepSeek V3 (2024-12-26)100100100674682.6%
Claude Sonnet 4100100100575281.8%
GPT-4o, Aug. 6th (temp=1)100100100100080.0%
Ministral 3 3B100100100100080.0%
DeepSeek V3 (2025-03-24)10010010089077.8%
Minimax M2.51009787574276.7%
Cohere Command R+ (Aug. 2024)100100100571875.1%
ByteDance Seed 1.610010010075075.0%
Z.AI GLM 4.5100100100422974.1%
Z.AI GLM 5100100100571073.3%
MoonshotAI: Kimi K2.510010010018063.6%
Hermes 3 405B100100890057.8%
Hermes 3 70B100100330046.7%
GPT-4.1 Nano100100330046.7%
Rocinante 12B1005200030.4%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Claude 3.5 Sonnet1001001001009598.9%
Gemini 3.1 Pro (Preview)1001001001008997.8%
Claude Opus 4.51001001001008997.8%
ByteDance Seed 1.6 Flash1001001001008997.8%
Z.AI GLM 4.51001001001008296.5%
GPT-5.11001001001007595.0%
Gemma 3 12B1001001001005791.4%
Cohere Command R+ (Aug. 2024)1001001001005791.4%
Llama 3.1 8B100100100757590.0%
Grok 41001001001003386.7%
GPT-4o, Aug. 6th (temp=1)1001001001003386.7%
Claude 3.7 Sonnet10010089825084.2%
ByteDance Seed 1.61001001001001883.6%
GPT-4.1100100100100080.0%
Grok 4 Fast100100100100080.0%
DeepSeek-V2 Chat100100100100080.0%
Claude Haiku 4.5100100100100080.0%
DeepSeek V3.1100100100100080.0%
Mistral Large 2100100100100080.0%
Llama 3.1 Nemotron 70B100100100100080.0%
Hermes 3 70B100100100100080.0%
Rocinante 12B100100100100080.0%
Ministral 3 8B10010010089077.8%
Writer: Palmyra X510010010082076.5%
Mistral Large10010010082076.5%
Hermes 3 405B10010010075075.0%
Gemma 3 4B1001008975072.8%
Gemini 2.5 Flash10010010057071.4%
GPT-5 Nano10010010026065.2%
GPT-4.1 Mini1001006757064.8%
GPT-5 Mini1001001000060.0%
Claude Opus 4.61001001000060.0%
Minimax M2.51001001000060.0%
DeepSeek V3 (2024-12-26)1001001000060.0%
Ministral 3 14B1001001000060.0%
Claude 3 Haiku1001001000060.0%
Claude Sonnet 4100100670053.3%
GPT-4o, May 13th (temp=1)1001003318050.3%
GPT-4o Mini (temp=1)67333333033.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Large100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
GPT-5 Nano1001001001009799.3%
Grok 4 Fast1001001001009799.3%
DeepSeek V3 (2024-12-26)1001001001009598.9%
Mistral Small Creative1001001001009398.6%
Gemini 2.5 Flash1001001001009298.4%
Gemini 3 Pro (Preview)1001001001009098.1%
Mistral NeMO1001001001008997.8%
Z.AI GLM 51001001001008797.5%
Grok 4.1 Fast1001001001008596.9%
Claude Sonnet 4.61001001001008296.5%
GPT-4o, May 13th (temp=0)1001001001008296.5%
Gemini 2.5 Pro1001001001008096.0%
Llama 3.1 70B100100100898995.6%
Claude Sonnet 41001001001007595.0%
Llama 3.1 8B1001001001007595.0%
Z.AI GLM 4.610010097978095.0%
DeepSeek-V2 Chat100100100868694.3%
Claude Opus 4.51001001001007194.2%
Claude Opus 41001001001006593.0%
Grok 41001001001006292.4%
GPT-4o Mini (temp=0)10010092897991.9%
WizardLM 2 8x22b100100100926491.3%
Mistral Medium 3.11001001001004388.6%
Arcee AI: Trinity Mini1001001001004288.4%
Claude 3.7 Sonnet10010090826988.2%
Mistral Large 31001001001003887.6%
GPT-4o, Aug. 6th (temp=0)1001001001003887.6%
DeepSeek V3.21001001001003386.7%
Gemini 2.5 Flash Lite100100100755786.4%
GPT-4.1 Mini10010097716285.9%
Gemma 3 12B100100100824685.7%
GPT-4o, Aug. 6th (temp=1)100100100675784.8%
o4 Mini High1001001001002184.3%
GPT-4o, May 13th (temp=1)10010093636083.1%
Ministral 3 8B100100100654381.6%
DeepSeek V3.1100100100792981.5%
Rocinante 12B100100100100080.0%
Ministral 3 14B100100100673179.4%
Z.AI GLM 4.510010077535176.1%
Gemma 3 4B10010010080076.0%
Qwen 3.5 Plus (2026-02-15)100100100393675.0%
Claude 3 Haiku100100100363373.9%
Hermes 3 405B10010010067073.3%
Gemma 3 27B1009595481871.2%
GPT-4o Mini (temp=1)100939357068.7%
Hermes 3 70B1001008946067.0%
Minimax M2.5100897339060.2%
Arcee AI: Trinity Large (Preview)1001005033056.7%
Llama 3.1 Nemotron 70B100100750055.0%
Claude Haiku 4.510071403042.8%
GPT-4.1 Nano402400012.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Mistral Large 2100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B1001001001008997.8%
Ministral 3 3B1001001001008296.5%
Gemini 2.5 Flash1001001001008096.0%
Mistral NeMO1001001001008096.0%
Claude 3.7 Sonnet1001001001007995.8%
Writer: Palmyra X51001001001007995.8%
Claude Sonnet 4.51001001001007895.5%
Ministral 3 8B1001001001007595.0%
WizardLM 2 8x22b100100100977794.6%
GPT-4o, May 13th (temp=1)1001001001007093.9%
Claude Sonnet 4.61001001001006793.3%
GPT-4o Mini (temp=0)1001001001006492.9%
GPT-5 Nano1001001001006292.4%
Minimax M2.5100100100857391.5%
GPT-4o Mini (temp=1)1001001001005791.4%
Cohere Command R+ (Aug. 2024)1001001001005090.0%
Grok 4 Fast100100100797190.0%
DeepSeek V3 (2024-12-26)1001001001003186.1%
Mistral Medium 3.1100100100785285.9%
Z.AI GLM 4.51001001001002985.7%
Arcee AI: Trinity Large (Preview)1001001001002584.9%
Claude 3.5 Sonnet100100100863383.8%
Mistral Large1001001001001883.6%
Hermes 3 70B1001001001001883.6%
GPT-4o, Aug. 6th (temp=1)100100100892983.5%
Gemma 3 12B100100100100080.0%
Mistral Small 3.2 24B100100100100080.0%
Claude 3 Haiku100100100100080.0%
DeepSeek V3 (2025-03-24)10010010095078.9%
Gemini 2.5 Flash Lite10010010040068.0%
Hermes 3 405B1001007157065.6%
GPT-4.1 Mini1007972571464.4%
Claude 3.5 Haiku1001001000060.0%
GPT-4.1 Nano716700027.5%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Z.AI GLM 4.71001001001009799.5%
Mistral Small 3.2 24B1001001001009799.5%
Grok 41001001001009598.9%
DeepSeek V3 (2025-03-24)1001001001009598.9%
Grok 4 Fast1001001001009598.9%
Ministral 3 14B1001001001008997.8%
WizardLM 2 8x22b10010098959597.5%
Gemini 2.5 Flash1001001001008797.4%
Claude Sonnet 4.51001001001008697.3%
Z.AI GLM 4.7 Flash1001001001007995.8%
o4 Mini High1001001001007895.5%
DeepSeek V3 (2024-12-26)100100100987995.4%
DeepSeek V3.11001001001007595.0%
Llama 3.1 8B1001001001007595.0%
Mistral Large 2100100100868293.6%
GPT-5 Mini100100100986993.4%
Minimax M2.51001001001006593.0%
DeepSeek V3.21001001001006092.1%
Claude Haiku 4.510010097847891.8%
Z.AI GLM 4.5100100100956391.6%
Claude Sonnet 41001001001005290.4%
Mistral Medium 3.1100100100975590.2%
Gemma 3 12B1001001001004388.6%
Ministral 3 8B1001001001002685.2%
GPT-4o, May 13th (temp=0)10010097852481.1%
GPT-4.1 Mini10010097555080.4%
GPT-4o, Aug. 6th (temp=1)100100100891380.3%
Claude 3.5 Haiku100100100100080.0%
Arcee AI: Trinity Mini100100100100080.0%
GPT-4o Mini (temp=1)1009589852879.2%
Gemma 3 4B10010097573377.6%
Rocinante 12B10010010082076.5%
GPT-5 Nano10010093681575.2%
GPT-4o, Aug. 6th (temp=0)10010067575275.1%
Claude 3 Haiku100100100541874.4%
Claude 3.7 Sonnet100100100432673.8%
Hermes 3 70B10010010013062.5%
Gemma 3 27B1001005746561.6%
DeepSeek-V2 Chat10079628049.8%
Llama 3.1 70B10075670048.3%
Llama 3.1 Nemotron 70B100753318045.3%
Arcee AI: Trinity Large (Preview)10010000040.0%
GPT-4.1 Nano6218130018.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
Ministral 3B100100100100100100.0%
Z.AI GLM 4.71001001001009799.5%
DeepSeek V3 (2024-12-26)1001001001009598.9%
Gemma 3 27B1001001001009598.9%
WizardLM 2 8x22b1001001001009598.9%
Z.AI GLM 51001001001009298.4%
o4 Mini1001001001008997.8%
Claude Opus 41001001001008997.8%
Mistral Medium 3.11001001001008296.5%
Gemini 3 Flash (Preview)1001001001007595.0%
DeepSeek V3 (2025-03-24)1001001001007194.2%
Minimax M2.5100100100927593.4%
o4 Mini High1001001001006793.3%
Claude Sonnet 4.51001001001006292.4%
Grok 4 Fast1001001001005791.4%
MoonshotAI: Kimi K2.51001001001005290.4%
Z.AI GLM 4.61001001001003386.7%
Llama 3.1 70B1001001001003386.7%
Claude Sonnet 4100100100923385.0%
Hermes 3 70B1001001001001883.6%
Gemini 2.5 Flash1001001001001081.9%
DeepSeek V3.1100100100574680.7%
DeepSeek-V2 Chat100100100891480.6%
Grok 4100100100100080.0%
GPT-4.1100100100100080.0%
GPT-5 Nano100100100100080.0%
Claude Haiku 4.5100100100100080.0%
DeepSeek V3.2100100100100080.0%
Hermes 3 405B100100100100080.0%
Arcee AI: Trinity Mini100100100100080.0%
Gemma 3 4B100100100100080.0%
Ministral 8B100100100100080.0%
Qwen 3.5 Plus (2026-02-15)10010010079075.8%
GPT-4.1 Mini100100100571875.1%
Gemma 3 12B100100100521874.0%
GPT-4.1 Nano10010010033066.7%
Rocinante 12B10010010033066.7%
Claude 3.7 Sonnet1001001008061.5%
GPT-4o, May 13th (temp=1)1001001000060.0%
Claude 3.5 Sonnet100100820056.5%
GPT-4o, Aug. 6th (temp=1)100100180043.6%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
o4 Mini100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 3B100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)1001001001009799.5%
Claude 3 Haiku1001001001009799.5%
Ministral 8B1001001001009799.5%
ByteDance Seed 1.6 Flash1001001001009799.3%
o4 Mini High1001001001009498.9%
GPT-4.11001001001009498.9%
Claude Sonnet 41001001001009198.2%
Grok 41001001001008997.8%
Grok 4.1 Fast1001001001008997.8%
Gemini 2.5 Pro1001001001008897.5%
GPT-4o, May 13th (temp=1)1001001001008897.5%
WizardLM 2 8x22b1001001001008897.5%
Grok 4 Fast1001001001008697.1%
Claude 3.7 Sonnet1001001001008396.7%
Mistral Large 31001001001008296.5%
DeepSeek V3.2100100100948896.4%
DeepSeek V3.11001001001007995.7%
Z.AI GLM 51001001001007595.0%
ByteDance Seed 1.61001001001007595.0%
GPT-4.1 Mini100100100917993.9%
Gemini 3 Pro (Preview)100100100947593.9%
Claude Opus 4.51001001001006793.3%
Writer: Palmyra X51001001001006793.3%
Claude Haiku 4.510010094917992.8%
DeepSeek V3 (2025-03-24)100100100976792.8%
Claude Opus 4100100100927092.5%
Ministral 3 14B1001001001005791.4%
Claude Opus 4.610010089858291.1%
DeepSeek V3 (2024-12-26)100100100856790.3%
Rocinante 12B1001001001005090.0%
GPT-4o, May 13th (temp=0)1001001001004789.3%
GPT-4o, Aug. 6th (temp=1)100100100796789.0%
Z.AI GLM 4.510010097806388.1%
Llama 3.1 Nemotron 70B1001001001003987.8%
Minimax M2.51009791836787.7%
Mistral Medium 3.11001001001003887.6%
Arcee AI: Trinity Mini100100100795987.5%
GPT-510010099904787.1%
Gemma 3 12B1001001001003386.7%
GPT-4o Mini (temp=1)100100100715685.5%
Gemini 2.5 Flash1001001001002585.0%
Llama 3.1 70B100100100505080.0%
GPT-5.110010083783879.8%
Hermes 3 405B10010089624579.2%
Z.AI GLM 4.71009782793678.8%
Claude Sonnet 4.610010010079777.1%
Arcee AI: Trinity Large (Preview)10010090543976.5%
GPT-4.1 Nano888888595976.1%
GPT-5 Nano1009471694175.2%
Gemini 3.1 Pro (Preview)10010010067073.3%
Hermes 3 70B10010094393273.1%
Claude 3.5 Haiku1001001007061.4%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Grok 4100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Z.AI GLM 4.51001001001009498.9%
Claude Opus 41001001001008897.5%
GPT-4o Mini (temp=1)100100100887993.2%
GPT-5 Mini100100100887392.1%
o4 Mini1001001001005090.0%
Qwen 2.5 72B1001001001005090.0%
ByteDance Seed 1.6 Flash1001001001004689.2%
Stealth: Aurora Alpha100100100885087.5%
Arcee AI: Trinity Large (Preview)100100100885087.5%
Hermes 3 70B100100100676786.7%
GPT-5100100100675584.3%
Z.AI GLM 5100100100942583.9%
GPT-5.110010083825083.0%
Claude 3.7 Sonnet10010091882580.7%
Claude Opus 4.6100100100100080.0%
Claude Opus 4.5100100100100080.0%
Gemini 2.5 Pro100100100100080.0%
Z.AI GLM 4.7100100100100080.0%
Claude Sonnet 4.6100100100100080.0%
Gemini 3 Flash (Preview)100100100100080.0%
DeepSeek V3 (2025-03-24)100100100100080.0%
Grok 4 Fast100100100100080.0%
Claude 3.5 Sonnet100100100100080.0%
DeepSeek V3 (2024-12-26)100100100100080.0%
DeepSeek V3.1100100100100080.0%
GPT-4o Mini (temp=0)100100100100080.0%
Gemma 3 12B100100100100080.0%
Llama 3.1 Nemotron 70B100100100100080.0%
Mistral Small Creative100100100100080.0%
GPT-4.1 Nano100100100100080.0%
GPT-5.210010010088778.9%
Claude Sonnet 410010010094078.9%
Writer: Palmyra X510010010079075.7%
o4 Mini High1001007967069.0%
Llama 3.1 8B10010010039067.8%
GPT-5 Nano10010010025065.0%
DeepSeek-V2 Chat10010010025065.0%
Claude Haiku 4.510010010025065.0%
GPT-4.1 Mini10010010025065.0%
Minimax M2.51001001000060.0%
Claude Sonnet 4.51001001000060.0%
Mistral Large1001001000060.0%
Mistral Medium 3.1100100950058.9%
Z.AI GLM 4.7 Flash100100790055.7%
Mistral Large 2100100250045.0%
Mistral Large 310010000040.0%
GPT-4o, Aug. 6th (temp=1)10010000040.0%
Ministral 3 14B10010000040.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
ByteDance Seed 1.6100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4o, May 13th (temp=0)1001001001009599.1%
Writer: Palmyra X51001001001009599.1%
DeepSeek V3 (2025-03-24)1001001001009498.9%
Llama 3.1 Nemotron 70B1001001001009498.9%
GPT-4.11001001001009398.7%
Claude Opus 4.6100100100969498.2%
Gemini 3 Flash (Preview)100100100979297.9%
GPT-51001001001008797.3%
GPT-4o Mini (temp=0)1001001001008697.2%
Z.AI GLM 4.61001001001008597.0%
Gemini 2.5 Flash Lite1001001001008597.0%
Claude Sonnet 4.61001001001008496.9%
Ministral 8B1001001001007995.7%
Claude Sonnet 4.510010093929195.4%
Claude 3 Haiku10010097918895.2%
Gemini 2.5 Flash1001001001007194.2%
o4 Mini High100100100908194.2%
Mistral Medium 3.1100100100898093.8%
Qwen 3.5 Plus (2026-02-15)100100100897793.2%
Z.AI GLM 4.5100100100937293.1%
Z.AI GLM 4.7100100100946892.5%
Ministral 3 8B1001001001006292.4%
Z.AI GLM 4.7 Flash10010097857992.3%
WizardLM 2 8x22b100100100887392.1%
Llama 3.1 8B1001001001005991.8%
GPT-5.110010099936491.2%
Gemini 2.5 Pro100100100856590.1%
GPT-5.2100100100866389.8%
o4 Mini100100100816489.0%
Minimax M2.5100100100974287.9%
Grok 4.1 Fast1009789855785.6%
GPT-4o, Aug. 6th (temp=0)1009479777585.1%
Qwen 2.5 72B1001001001002585.0%
DeepSeek V3.210010090755984.9%
Mistral Large 2100100100735084.6%
MoonshotAI: Kimi K2.5100100100913184.4%
Mistral Large 310010089785083.3%
Ministral 3B1001001001001583.0%
Claude Sonnet 410010085705782.6%
Z.AI GLM 510010088675782.2%
Claude Opus 4969188696782.1%
Arcee AI: Trinity Large (Preview)10010085754580.9%
DeepSeek V3 (2024-12-26)100100100881480.4%
Qwen 3.5 397B A17B100100100100080.0%
Mistral Small 3.2 24B100100100100080.0%
Mistral Small Creative10010086803480.0%
Claude 3.5 Sonnet1009283735079.7%
GPT-5 Nano1009683694278.2%
Ministral 3 3B100100100523377.0%
GPT-5 Mini100100100443976.6%
GPT-4o Mini (temp=1)1009479624175.3%
Hermes 3 70B10010097641274.7%
GPT-4o, Aug. 6th (temp=1)948380595574.4%
Rocinante 12B10010010062072.4%
Ministral 3 14B1009078553772.0%
Gemini 3 Pro (Preview)1007967614770.5%
Claude Opus 4.51008180464470.0%
Claude Haiku 4.51001007557066.4%
Hermes 3 405B1001009725064.5%
GPT-4o, May 13th (temp=1)1001006253063.0%
Grok 4 Fast100857054061.6%
DeepSeek V3.1897159503961.6%
Gemini 3.1 Pro (Preview)1001001000060.0%
Claude 3.7 Sonnet757469512759.0%
Claude 3.5 Haiku10010039252557.8%
Grok 4100100820056.5%
Arcee AI: Trinity Mini1001005425055.8%
DeepSeek-V2 Chat100795442054.9%
Gemma 3 12B1001004618052.9%
Gemma 3 27B897843312252.5%
Gemma 3 4B10064280038.2%
GPT-4.1 Mini674730023.2%
GPT-4.1 Nano532000014.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
o4 Mini100100100100100100.0%
GPT-5.2100100100100100100.0%
Grok 4100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Grok 4 Fast100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Gemini 2.5 Flash1001001001009799.5%
GPT-4o Mini (temp=0)1001001001009799.5%
Qwen 3.5 Plus (2026-02-15)1001001001009699.3%
Grok 4.1 Fast1001001001009598.9%
GPT-51001001001009498.9%
GPT-4o, Aug. 6th (temp=1)1001001001009498.9%
Gemini 3 Pro (Preview)1001001001009498.7%
Ministral 8B1001001001009198.2%
o4 Mini High1001001001008997.9%
Minimax M2.5100100100949397.6%
Mistral Large1001001001008897.5%
Gemini 2.5 Pro100100100978997.4%
Claude Sonnet 4.6100100100998897.3%
Gemma 3 27B1001001001008697.1%
Z.AI GLM 4.7100100100978796.8%
Llama 3.1 Nemotron 70B1001001001008396.7%
Mistral Small Creative1001001001008396.7%
Ministral 3B1001001001008396.7%
MoonshotAI: Kimi K2.5100100100978396.1%
Z.AI GLM 5100100100938595.8%
DeepSeek V3 (2024-12-26)100100100918895.7%
Mistral Large 2100100100938395.2%
DeepSeek-V2 Chat1001001001007394.6%
Arcee AI: Trinity Large (Preview)1001001001007394.6%
WizardLM 2 8x22b1001001001006793.3%
GPT-5 Mini100100100897693.1%
Claude Sonnet 4.51001001001006492.9%
GPT-5.1100100100867792.6%
Gemini 2.5 Flash Lite100100100966792.5%
Claude Opus 4.5100100100916891.9%
Claude Opus 41001001001005991.8%
GPT-4o, May 13th (temp=1)100100100857191.3%
Claude Opus 4.6100100100896490.7%
Z.AI GLM 4.510010093857390.1%
Hermes 3 70B1001001001005090.0%
Gemma 3 12B1001001001004689.2%
Rocinante 12B100100100796789.0%
GPT-4o Mini (temp=1)10010096915388.2%
Writer: Palmyra X5100100100895087.9%
GPT-5 Nano10010088797387.8%
Arcee AI: Trinity Mini1001001001003987.8%
DeepSeek V3.210010089895686.9%
Claude Sonnet 41001001001002585.0%
Llama 3.1 70B100100100100781.4%
DeepSeek V3 (2025-03-24)100100100100080.0%
Mistral Large 3100100100100080.0%
GPT-4.110010094881278.8%
Claude 3 Haiku10010079674778.5%
Gemini 3.1 Pro (Preview)10010092573075.8%
Hermes 3 405B1007975733973.1%
Claude 3.7 Sonnet1009579642773.0%
Claude 3.5 Sonnet1008179505071.9%
Claude Haiku 4.5998362433965.1%
Claude 3.5 Haiku10010010025065.0%
GPT-4.1 Mini1007950503963.5%
GPT-4.1 Nano915900030.1%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Large100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Qwen 3.5 397B A17B1001001001009999.7%
Qwen 2.5 72B1001001001009799.5%
Z.AI GLM 4.61001001001009198.2%
GPT-5.21001001001008997.9%
Grok 4 Fast1001001001008997.8%
DeepSeek V3.21001001001008997.8%
Mistral Large 21001001001008897.5%
Ministral 3 14B1001001001008797.4%
Ministral 3B1001001001008597.1%
Minimax M2.5100100100959097.0%
Claude 3.5 Sonnet100100100968896.7%
Ministral 8B100100100998396.4%
GPT-510010099958896.4%
o4 Mini High1001001001008296.4%
Writer: Palmyra X5100100100998296.3%
Cohere Command R+ (Aug. 2024)1001001001008196.2%
Ministral 3 8B1001001001008095.9%
GPT-4o Mini (temp=0)100100100988095.6%
Z.AI GLM 4.7100100100968095.3%
GPT-4.110010097918895.2%
Claude Opus 4.6100100100888895.0%
o4 Mini10010093918994.8%
Grok 4100100100898594.7%
Rocinante 12B100100100947994.6%
WizardLM 2 8x22b1001001001007194.2%
DeepSeek V3 (2025-03-24)10010097888594.1%
Gemma 3 12B1001001001006793.3%
Gemini 3.1 Pro (Preview)10010095898293.2%
Gemini 2.5 Flash100100100946792.2%
GPT-4o, Aug. 6th (temp=1)10010097946791.7%
Gemini 2.5 Pro1001001001005691.3%
Z.AI GLM 4.510010099995390.2%
GPT-5.11009691887590.0%
Gemini 3 Pro (Preview)1001001001005090.0%
Gemma 3 4B1001001001005090.0%
Claude Opus 4.51001001001004689.2%
ByteDance Seed 1.61001001001004288.4%
DeepSeek V3.1100100100964588.1%
Mistral NeMO1001001001003987.8%
GPT-4o, Aug. 6th (temp=0)10010089716785.4%
Claude 3.7 Sonnet100100100735385.2%
GPT-4o, May 13th (temp=0)10010091715984.3%
Arcee AI: Trinity Mini1008888835983.5%
Qwen 3.5 Plus (2026-02-15)1009682795983.1%
Hermes 3 70B100100100704382.6%
Z.AI GLM 51009187735781.6%
GPT-4o, May 13th (temp=1)100100100100781.4%
Arcee AI: Trinity Large (Preview)100100100100080.0%
Gemma 3 27B10010091604979.9%
Claude 3 Haiku10010010099079.7%
GPT-5 Mini10010068686179.4%
Claude Sonnet 4.510010094524778.5%
Llama 3.1 8B10010081723978.4%
Claude Sonnet 4.610010079763678.0%
Llama 3.1 Nemotron 70B10010010088077.5%
DeepSeek-V2 Chat1008675714875.9%
Z.AI GLM 4.7 Flash1009485574375.8%
Claude Sonnet 410010079623174.4%
GPT-4o Mini (temp=1)1007973705074.3%
Gemini 2.5 Flash Lite1009275594273.7%
MoonshotAI: Kimi K2.510010083352568.6%
GPT-4.1 Mini1007959474766.2%
Claude Haiku 4.5977565482061.0%
Mistral Small 3.2 24B1001001000060.0%
GPT-5 Nano897467292556.9%
Llama 3.1 70B100915917053.4%
Hermes 3 405B10077710049.6%
DeepSeek V3 (2024-12-26)100100350046.9%
GPT-4.1 Nano50000010.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Z.AI GLM 51001001001009799.5%
Cohere Command R+ (Aug. 2024)1001001001009799.5%
DeepSeek V3.21001001001009498.9%
Claude 3.7 Sonnet1001001001009298.5%
Minimax M2.51001001001009198.2%
ByteDance Seed 1.61001001001008997.8%
o4 Mini1001001001008897.5%
Gemini 2.5 Pro1001001001008897.5%
GPT-4o, May 13th (temp=1)1001001001008897.5%
GPT-4o Mini (temp=1)1001001001008897.5%
Grok 4.1 Fast1001001001008296.5%
Gemma 3 12B1001001001008296.5%
MoonshotAI: Kimi K2.51001001001007995.7%
Claude 3 Haiku1001001001007995.7%
Claude Sonnet 41001001001007394.6%
Mistral Large 21001001001007394.6%
Rocinante 12B1001001001007394.6%
Z.AI GLM 4.5100100100927994.2%
GPT-4.1100100100888394.2%
GPT-4o Mini (temp=0)1001001001006793.3%
Hermes 3 70B1001001001006793.3%
Writer: Palmyra X5100100100967093.2%
Z.AI GLM 4.710010094887992.1%
GPT-4o, May 13th (temp=0)100100100915990.1%
Arcee AI: Trinity Large (Preview)100100100836790.0%
Claude Haiku 4.51001001001004588.9%
DeepSeek V3 (2024-12-26)100100100736787.9%
GPT-5.11009285836785.4%
Arcee AI: Trinity Mini100100100793983.5%
GPT-4o, Aug. 6th (temp=1)100100100675083.3%
GPT-5.2100100100912583.2%
Claude Opus 4.510010077736282.3%
DeepSeek-V2 Chat100100100100781.4%
Mistral NeMO100100100100781.4%
Grok 4100100100100080.0%
Mistral Large 3100100100100080.0%
GPT-51009695911779.9%
Claude 3.5 Sonnet100100100791779.0%
o4 Mini High10010010079075.7%
DeepSeek V3.110010010079075.7%
Ministral 3 8B10010010075075.0%
Hermes 3 405B10010010073074.6%
GPT-4o, Aug. 6th (temp=0)10010010059773.2%
GPT-5 Mini10010059503969.6%
GPT-4.1 Nano10010079392568.5%
GPT-4.1 Mini1001001000060.0%
DeepSeek V3 (2025-03-24)10010025252555.0%
Claude Opus 489795925050.4%
GPT-5 Nano10050450038.9%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
o4 Mini100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
GPT-51001001001009799.3%
Gemini 2.5 Flash1001001001009598.9%
Z.AI GLM 4.61001001001008997.8%
Z.AI GLM 51001001001008296.5%
Minimax M2.51001001001008296.5%
Gemma 3 4B1001001001008296.5%
GPT-4o, May 13th (temp=1)1001001001005290.4%
DeepSeek V3.11001001001005290.4%
DeepSeek V3 (2024-12-26)1001001001004689.2%
WizardLM 2 8x22b1001001001004689.2%
Rocinante 12B1001001001004288.4%
Arcee AI: Trinity Mini100100100895288.1%
DeepSeek V3 (2025-03-24)1001001001003386.7%
GPT-4o, Aug. 6th (temp=1)10010082826786.3%
Claude Opus 41001001001001883.6%
GPT-4.1 Mini1001001001001382.5%
GPT-4o Mini (temp=1)100100100802480.7%
ByteDance Seed 1.6100100100100080.0%
Hermes 3 405B10010010095078.9%
DeepSeek-V2 Chat10010010092078.4%
Claude 3.7 Sonnet10010097524077.7%
Llama 3.1 Nemotron 70B100100100333373.3%
Claude Haiku 4.51001008967071.1%
GPT-4.1 Nano10089890055.6%
Claude 3.5 Sonnet100100750055.0%
Claude Sonnet 4.6100100590051.8%
Arcee AI: Trinity Large (Preview)100100130042.5%
GPT-5 Nano10062290038.3%
Hermes 3 70B1002600025.2%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Mistral NeMO100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Arcee AI: Trinity Large (Preview)1001001001009598.9%
GPT-4o, May 13th (temp=1)1001001001008997.8%
Gemma 3 4B1001001001008997.8%
DeepSeek V3 (2024-12-26)1001001001007595.0%
GPT-4.1 Mini1001001001007595.0%
GPT-5.21001001001005791.4%
Claude Opus 41001001001005791.4%
Minimax M2.51001001001005791.4%
Gemma 3 27B1001001001004689.2%
Claude Sonnet 4.51001001001003386.7%
Claude 3.7 Sonnet100100100755786.4%
GPT-4o Mini (temp=1)100100100891881.4%
o4 Mini High100100100100080.0%
Z.AI GLM 5100100100100080.0%
Z.AI GLM 4.7100100100100080.0%
Gemini 3 Pro (Preview)100100100100080.0%
Claude Sonnet 4100100100100080.0%
Claude Sonnet 4.6100100100100080.0%
GPT-4.1100100100100080.0%
Z.AI GLM 4.7 Flash100100100100080.0%
Grok 4 Fast100100100100080.0%
Claude 3.5 Sonnet100100100100080.0%
DeepSeek-V2 Chat100100100100080.0%
GPT-4o, Aug. 6th (temp=0)100100100100080.0%
Writer: Palmyra X5100100100100080.0%
Gemini 2.5 Flash100100100100080.0%
Hermes 3 405B100100100100080.0%
Llama 3.1 70B100100100100080.0%
Gemini 2.5 Flash Lite100100100100080.0%
Ministral 3 8B100100100100080.0%
Arcee AI: Trinity Mini100100100100080.0%
GPT-4.1 Nano100100100100080.0%
GPT-4o, Aug. 6th (temp=1)10010010075075.0%
Grok 410010089571872.8%
Mistral Large 210010010057071.4%
Llama 3.1 Nemotron 70B10010010057071.4%
Cohere Command R+ (Aug. 2024)10010010057071.4%
Mistral Large1001001000060.0%
Mistral Small Creative1001001000060.0%
Ministral 3 14B1001001000060.0%
Hermes 3 70B1001004633055.9%
Mistral Large 310010000040.0%
GPT-5 Nano100000020.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Ministral 8B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
GPT-5 Mini1001001001009799.4%
DeepSeek V3.21001001001009598.9%
DeepSeek-V2 Chat100100100989598.6%
Grok 4 Fast1001001001009398.6%
Claude Sonnet 4.61001001001009198.2%
GPT-5 Nano1001001001007595.0%
Mistral Large 21001001001007595.0%
Mistral Large1001001001006492.9%
Gemma 3 4B10010089868291.4%
Z.AI GLM 5100100100777390.0%
Gemini 2.5 Pro100100100975289.9%
Z.AI GLM 4.51001001001004989.8%
GPT-4o Mini (temp=1)10010097934887.8%
GPT-4o, May 13th (temp=0)1001001001003887.6%
Claude 3.5 Sonnet100100100893384.4%
Claude Sonnet 4100100100645784.3%
Cohere Command R+ (Aug. 2024)1001001001001883.6%
Z.AI GLM 4.7 Flash1009792754682.1%
Claude Opus 4100100100971382.0%
Llama 3.1 8B100100100891380.3%
Gemini 2.5 Flash Lite10010092575079.8%
DeepSeek V3.1100100100722679.7%
Llama 3.1 70B10010010089077.8%
Ministral 3B100100100622677.6%
GPT-4o, Aug. 6th (temp=1)100100100423375.1%
Mistral Small 3.2 24B10010010067073.3%
Ministral 3 8B1001009271072.6%
Hermes 3 405B10010010062072.4%
GPT-4o, May 13th (temp=1)10010010060072.1%
Mistral NeMO100100100332671.9%
DeepSeek V3 (2025-03-24)10010010040068.0%
Claude 3.5 Haiku10010010033066.7%
Rocinante 12B10010046464066.5%
Claude 3.7 Sonnet1001007454065.6%
Claude 3 Haiku100100100181065.5%
Gemini 2.5 Flash95756764060.0%
Hermes 3 70B100100970059.5%
Minimax M2.593676362858.5%
DeepSeek V3 (2024-12-26)100916222055.0%
Arcee AI: Trinity Mini100624233047.5%
GPT-4o Mini (temp=0)1008018181045.1%
Claude Haiku 4.510064430041.4%
Gemma 3 12B100432121037.2%
GPT-4.1 Mini8254435036.8%
Arcee AI: Trinity Large (Preview)1005000030.0%
GPT-4.1 Nano1004200028.4%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Mistral NeMO100100100100100100.0%
Ministral 8B100100100100100100.0%
GPT-5 Mini1001001001009598.9%
GPT-4.1 Mini100100100979598.4%
Gemini 3 Pro (Preview)1001001001008797.4%
Gemini 2.5 Flash Lite1001001001008296.5%
Arcee AI: Trinity Mini1001001001008296.5%
Gemini 2.5 Flash1001001001008096.0%
Claude Opus 4.51001001001007595.0%
DeepSeek-V2 Chat100100100977594.5%
Gemma 3 4B100100100977294.0%
Minimax M2.51001001001006793.3%
Gemini 2.5 Pro1001001001006292.4%
WizardLM 2 8x22b100100100877191.6%
GPT-4o, Aug. 6th (temp=0)1001001001005791.4%
Mistral Large1001001001005290.4%
Z.AI GLM 4.6100100100866289.6%
GPT-4o, May 13th (temp=1)1001001001004689.2%
Gemini 3.1 Pro (Preview)100100100796789.1%
DeepSeek V3 (2025-03-24)1001001001004088.0%
Claude 3.5 Sonnet1001001001003386.7%
Claude 3 Haiku1001001001003386.7%
Claude Sonnet 4.51001001001002184.3%
GPT-4.1 Nano10010089892981.3%
ByteDance Seed 1.6100100100100080.0%
Claude 3.5 Haiku100100100100080.0%
Hermes 3 405B100100100100080.0%
Rocinante 12B100100100100080.0%
Ministral 3 8B10010095673378.9%
GPT-5 Nano10010075655278.3%
Gemma 3 27B1001009542568.3%
Ministral 3B1001008218060.1%
Claude Haiku 4.51001006726058.6%
Hermes 3 70B10010000040.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
GPT-5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Mistral Large100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Mistral NeMO100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
MoonshotAI: Kimi K2.51001001001009598.9%
Claude Sonnet 4.61001001001009598.9%
Grok 4 Fast1001001001009298.4%
DeepSeek V3.11001001001009298.4%
Mistral Large 21001001001009298.4%
Claude 3.5 Sonnet1001001001008997.8%
Claude 3.7 Sonnet1001001001008997.8%
Writer: Palmyra X51001001001008997.8%
Gemma 3 4B1001001001008997.8%
Ministral 8B1001001001008697.1%
Claude Opus 41001001001008597.0%
GPT-4o Mini (temp=1)100100100978596.2%
Claude Haiku 4.51001001001008096.0%
o4 Mini High1001001001007595.0%
GPT-5 Nano1001001001007394.6%
DeepSeek-V2 Chat1001001001007094.0%
o4 Mini1001001001005490.7%
Z.AI GLM 4.7 Flash100100100817190.3%
Ministral 3 3B1001001001004088.0%
Z.AI GLM 4.51001001001003186.1%
Gemini 2.5 Flash1001001001003086.0%
GPT-5 Mini10010090706885.7%
Cohere Command R+ (Aug. 2024)100100100755285.4%
Gemma 3 27B100100100871881.0%
DeepSeek V3 (2024-12-26)100100100100080.0%
Llama 3.1 Nemotron 70B100100100100080.0%
Mistral Small 3.2 24B100100100100080.0%
Minimax M2.5100100100573979.2%
Z.AI GLM 5100100100573678.6%
GPT-4o, Aug. 6th (temp=1)100100100464077.2%
GPT-4.1 Mini100100100522976.1%
Claude Opus 4.5100100100382572.6%
Hermes 3 405B10010010052070.4%
Gemini 2.5 Flash Lite10010075571068.3%
Claude Sonnet 410010010018063.6%
Rocinante 12B10010010018063.6%
Arcee AI: Trinity Large (Preview)10010010013062.5%
GPT-4o, May 13th (temp=1)100100950058.9%
Gemma 3 12B1001004333055.3%
GPT-4.1 Nano10079710050.0%
Claude 3 Haiku1001003314049.5%
Arcee AI: Trinity Mini10089400045.8%
Hermes 3 70B10057460040.7%
Llama 3.1 8B1008900037.8%
Llama 3.1 70B75571818033.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
GPT-5100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Claude Opus 4.61001001001009598.9%
DeepSeek V3 (2024-12-26)1001001001009598.9%
Z.AI GLM 4.51001001001009298.4%
o4 Mini1001001001008296.5%
Minimax M2.51001001001007995.8%
GPT-4.11001001001007595.0%
Claude Haiku 4.51001001001007595.0%
Mistral NeMO1001001001007595.0%
Gemini 3 Pro (Preview)1001001001006793.3%
WizardLM 2 8x22b100100100897091.7%
DeepSeek-V2 Chat1001001001004689.2%
Grok 4 Fast1001001001003386.7%
MoonshotAI: Kimi K2.51001001001003386.7%
GPT-4o, May 13th (temp=1)1001001001001883.6%
Gemma 3 27B100100100951080.9%
Z.AI GLM 4.7100100100574680.7%
Llama 3.1 Nemotron 70B10010089575780.6%
Claude Opus 4.5100100100100080.0%
Claude Sonnet 4100100100100080.0%
Z.AI GLM 4.6100100100100080.0%
Z.AI GLM 4.7 Flash100100100100080.0%
Claude 3.7 Sonnet100100100100080.0%
DeepSeek V3.2100100100100080.0%
Mistral Medium 3.1100100100100080.0%
DeepSeek V3.1100100100100080.0%
Gemma 3 12B100100100100080.0%
Mistral Small Creative100100100100080.0%
Z.AI GLM 510010010089077.8%
GPT-4.1 Mini10010010067073.3%
GPT-4o Mini (temp=1)10010010067073.3%
GPT-5 Nano10010079503372.4%
Hermes 3 70B100100100381871.2%
DeepSeek V3 (2025-03-24)10010010033066.7%
GPT-4.1 Nano1001007557066.4%
Hermes 3 405B1001001000060.0%