Narrator intent-glossing

Test: Bad Writing Habits

Avg. Score
69.4%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Grok 4.1 Fast97.8%$0.001837.8s81%
2o4 Mini92.2%$0.01525.7s68%
3o4 Mini High92.7%$0.02547.2s61%
4DeepSeek V3 (2025-03-24)85.8%$0.001439.4s50%
5ByteDance Seed 1.6 Flash84.2%$0.001327.3s48%
6Hermes 3 405B89.5%$0.003253.2s50%
7Mistral Small Creative81.2%$0.00079.1s45%
8Claude 3 Haiku83.6%$0.002514.9s41%
9Rocinante 12B85.7%$0.001438.4s42%
10Mistral NeMO82.1%$0.000510.1s36%
11Writer: Palmyra X584.4%$0.01122.0s41%
12Mistral Medium 3.183.0%$0.004836.5s42%
13Mistral Large80.7%$0.01430.9s42%
14Grok 4 Fast77.9%$0.001724.1s37%
15Ministral 3 14B75.1%$0.000711.7s33%
16Qwen 2.5 72B77.4%$0.001036.7s34%
17GPT-4.179.6%$0.01844.7s40%
18GPT-4o, May 13th (temp=0)81.6%$0.03514.1s36%
19Gemini 3.1 Pro (Preview)95.7%$0.1071.8m72%
20DeepSeek V3 (2024-12-26)78.1%$0.002154.6s35%
21DeepSeek-V2 Chat78.1%$0.002153.3s34%
22Hermes 3 70B80.5%$0.00101.2m36%
23Mistral Large 374.6%$0.003330.3s31%
24Qwen 3.5 397B A17B89.1%$0.0143.0m58%
25GPT-4o Mini (temp=1)74.3%$0.001234.8s31%
26GPT-4o Mini (temp=0)73.6%$0.001234.8s30%
27Ministral 3 3B70.8%$0.000511.1s24%
28Ministral 3B70.4%$0.00018.1s22%
29Mistral Large 272.3%$0.01329.4s28%
30Ministral 3 8B70.1%$0.000819.6s21%
31Gemma 3 12B68.9%$0.000441.3s24%
32GPT-4o, Aug. 6th (temp=0)72.1%$0.02322.7s24%
33GPT-4o, May 13th (temp=1)71.4%$0.03314.4s27%
34DeepSeek V3.275.6%$0.00141.9m34%
35Qwen 3.5 Plus (2026-02-15)67.8%$0.006031.5s23%
36Ministral 8B65.3%$0.000410.4s18%
37Arcee AI: Trinity Mini66.1%$0.00039.2s16%
38Gemma 3 27B68.7%$0.000652.6s23%
39Gemini 2.5 Pro72.1%$0.03636.2s29%
40Z.AI GLM 4.772.6%$0.0101.4m31%
41Cohere Command R+ (Aug. 2024)74.3%$0.02052.5s24%
42Gemma 3 4B63.6%$0.000220.0s16%
43Gemini 3 Flash (Preview)61.2%$0.007819.6s21%
44GPT-4.1 Mini63.4%$0.002719.0s15%
45MoonshotAI: Kimi K2.582.7%$0.0193.2m45%
46Gemini 2.5 Flash58.6%$0.005210.6s18%
47GPT-5.179.4%$0.0541.8m41%
48Gemini 3 Pro (Preview)74.3%$0.05554.4s32%
49Claude Sonnet 4.669.8%$0.03139.3s23%
50Z.AI GLM 4.7 Flash66.1%$0.00171.2m23%
51GPT-4o, Aug. 6th (temp=1)64.3%$0.01824.4s17%
52ByteDance Seed 1.676.7%$0.0132.5m33%
53Z.AI GLM 565.9%$0.00841.2m22%
54Z.AI GLM 4.561.9%$0.005142.1s14%
55Llama 3.1 70B59.0%$0.001529.4s12%
56Claude Sonnet 4.564.8%$0.03538.1s21%
57GPT-4.1 Nano53.9%$0.000713.3s11%
58Arcee AI: Trinity Large (Preview)58.1%$0.000043.6s12%
59Z.AI GLM 4.658.5%$0.006551.5s14%
60WizardLM 2 8x22b66.3%$0.00261.8m19%
61Gemini 2.5 Flash Lite48.0%$0.00099.5s9%
62DeepSeek V3.163.9%$0.00201.8m19%
63Grok 471.9%$0.0481.7m26%
64Claude Opus 4.668.8%$0.0781.2m29%
65Claude 3.7 Sonnet61.0%$0.04246.7s16%
66Llama 3.1 Nemotron 70B48.3%$0.003831.7s9%
67Minimax M2.556.4%$0.00341.3m11%
68Claude 3.5 Haiku48.7%$0.003510.8s1%
69Claude 3.5 Sonnet60.4%$0.04835.5s13%
70Claude Haiku 4.545.4%$0.01121.6s7%
71Llama 3.1 8B50.4%$0.00031.3m9%
72Claude Sonnet 454.4%$0.03243.7s9%
73GPT-572.1%$0.0652.8m31%
74GPT-5 Mini39.1%$0.010057.4s8%
75Stealth: Aurora Alpha29.3%$0.00009.8s0%
76GPT-5.254.5%$0.0561.5m15%
77Claude Opus 4.552.8%$0.07053.4s11%
78Mistral Small 3.2 24B80.0%$0.00695.7m27%
79Claude Opus 471.2%$0.2091.4m29%
80GPT-5 Nano10.4%$0.00421.4m0%
69.41%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
o4 Mini High100100100100100100.0%
Grok 4.1 Fast1001001001009098.0%
Hermes 3 405B100100100949397.5%
DeepSeek V3.21001001001007595.1%
DeepSeek V3 (2025-03-24)1001001001007294.4%
Claude 3 Haiku100100100977394.0%
GPT-5.11001001001005691.1%
Claude Opus 4.610010085827989.0%
Grok 4 Fast10010092846087.1%
Gemini 3.1 Pro (Preview)100100100933685.8%
MoonshotAI: Kimi K2.510010098646284.8%
o4 Mini100100100784484.3%
Qwen 3.5 397B A17B1009390795984.3%
GPT-4.1 Mini100100100713881.7%
Mistral Large100100100752880.6%
Ministral 3 14B10010093842580.5%
GPT-4o Mini (temp=1)100100100742780.3%
DeepSeek V3 (2024-12-26)100100100100080.0%
Hermes 3 70B10010093673679.2%
GPT-4.11009089892578.7%
Mistral Small Creative10010089572975.1%
GPT-4o, Aug. 6th (temp=0)100100100541874.5%
Mistral NeMO1001009868073.2%
Gemini 3 Pro (Preview)1008373682569.9%
ByteDance Seed 1.61008964413666.1%
Arcee AI: Trinity Mini1001007654066.1%
Mistral Large 2100967250865.2%
Qwen 2.5 72B1007167443864.0%
ByteDance Seed 1.6 Flash10010051472264.0%
Claude Sonnet 4.51001009023062.5%
Writer: Palmyra X51001001008061.7%
Ministral 3B1001001000060.0%
GPT-4o, May 13th (temp=1)1007943413659.8%
Z.AI GLM 4.6100856447059.2%
Claude 3.5 Sonnet100938418059.2%
GPT-585848042058.3%
Claude Opus 4.5100746153157.7%
Cohere Command R+ (Aug. 2024)10093860055.8%
Gemini 3 Flash (Preview)90765446053.2%
Gemma 3 27B76756936352.1%
Grok 4100100590051.8%
Mistral Medium 3.1100685527049.8%
Gemini 2.5 Pro100704133048.9%
Gemini 2.5 Flash696940362748.5%
Qwen 3.5 Plus (2026-02-15)10093426048.3%
Z.AI GLM 4.7100100360047.3%
DeepSeek-V2 Chat100100310046.2%
Mistral Large 310077480045.0%
GPT-4o, Aug. 6th (temp=1)10071483044.3%
DeepSeek V3.19088430044.0%
GPT-4o, May 13th (temp=0)100100100042.1%
Z.AI GLM 510052503041.0%
Ministral 8B10010000040.0%
Gemma 3 12B10050440038.9%
Llama 3.1 70B1009300038.6%
Gemma 3 4B7971380037.5%
Llama 3.1 8B1008300036.7%
Claude Opus 479601712033.6%
GPT-5 Mini6758208030.6%
Claude Sonnet 4.65952390030.0%
Llama 3.1 Nemotron 70B10022150027.3%
Z.AI GLM 4.51003500026.9%
GPT-4o Mini (temp=0)685750026.1%
Arcee AI: Trinity Large (Preview)1002300024.7%
GPT-4.1 Nano1001400022.7%
Rocinante 12B100700021.4%
Ministral 3 8B98000019.6%
Ministral 3 3B90000018.1%
Gemini 2.5 Flash Lite503500016.9%
Claude 3.7 Sonnet76000015.3%
WizardLM 2 8x22b551000012.9%
Z.AI GLM 4.7 Flash401400010.8%
Minimax M2.548600010.7%
Stealth: Aurora Alpha3900007.8%
GPT-5.21730004.1%
Claude Haiku 4.5731002.3%
Claude Sonnet 4000000.0%
GPT-5 Nano000000.0%
Claude 3.5 Haiku000000.0%
Mistral Small 3.2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
o4 Mini High100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
GPT-5.1100100100949397.4%
Z.AI GLM 4.71001001001007695.1%
GPT-4o, May 13th (temp=0)100100100888895.1%
Claude 3.5 Sonnet1001001001006292.5%
DeepSeek V3 (2025-03-24)1001001001006192.1%
Rocinante 12B1001001001005791.5%
GPT-4o, Aug. 6th (temp=0)100100100797891.2%
Qwen 3.5 397B A17B10010098896390.0%
GPT-4o, Aug. 6th (temp=1)100100100806589.0%
WizardLM 2 8x22b100100100964588.2%
ByteDance Seed 1.6 Flash10010090886187.7%
DeepSeek-V2 Chat100100100815486.9%
o4 Mini100100100814685.3%
GPT-4.11008888836584.8%
Mistral Large 310010089795484.3%
Ministral 3 8B100100100655483.8%
GPT-4.1 Mini10010082785983.8%
Qwen 3.5 Plus (2026-02-15)1001001001001683.2%
Writer: Palmyra X5100100100654882.6%
Mistral Small Creative10010087675681.8%
Z.AI GLM 5100100100713581.1%
DeepSeek V3.210010098822480.7%
MoonshotAI: Kimi K2.5100100100732980.4%
Grok 4 Fast10010094822780.4%
Mistral Large100100100100080.0%
Mistral Small 3.2 24B100100100100080.0%
GPT-510010084654578.7%
DeepSeek V3 (2024-12-26)100100100613078.0%
Claude Opus 4100100100504077.9%
Gemini 3 Pro (Preview)10010081762776.7%
GPT-4o, May 13th (temp=1)10010099463375.5%
Mistral Large 210010071564874.9%
Claude Opus 4.51008072725074.7%
Claude 3 Haiku100100100571173.7%
Gemini 2.5 Pro1007574615773.5%
Arcee AI: Trinity Large (Preview)10010096551773.5%
GPT-5.21009569524872.9%
Claude Sonnet 4.51001008876072.8%
Claude Sonnet 4.610010010057372.1%
Claude Opus 4.6978581762071.9%
Ministral 3 14B1009696441670.2%
Qwen 2.5 72B95898275068.1%
GPT-4o Mini (temp=0)1008685531567.7%
Claude Sonnet 41009267651467.5%
DeepSeek V3.110010010027666.5%
ByteDance Seed 1.61007357544866.5%
Hermes 3 405B1001008448066.4%
Z.AI GLM 4.7 Flash96939043064.3%
Mistral NeMO1001008833064.2%
Hermes 3 70B10010048482263.5%
Z.AI GLM 4.61001001000060.0%
Z.AI GLM 4.51001001000060.0%
Claude 3.5 Haiku1001001000060.0%
Gemma 3 4B888459392258.3%
Gemini 2.5 Flash100855148056.9%
Gemma 3 27B867260501556.5%
Gemma 3 12B1001005625056.3%
Arcee AI: Trinity Mini100100810056.1%
Ministral 8B10092716053.7%
Claude Haiku 4.510090730052.8%
Gemini 3 Flash (Preview)100100640052.8%
Grok 41008150211152.6%
Cohere Command R+ (Aug. 2024)1001003625052.3%
Ministral 3 3B100814625050.3%
Ministral 3B10072577047.3%
Claude 3.7 Sonnet100733416345.1%
Llama 3.1 70B10084293043.2%
GPT-4o Mini (temp=1)100503328042.2%
Minimax M2.510073150037.6%
GPT-5 Mini8934230029.1%
GPT-4.1 Nano8225110023.7%
Llama 3.1 Nemotron 70B791860020.6%
Gemini 2.5 Flash Lite100100020.2%
GPT-5 Nano1295005.2%
Llama 3.1 8B1700003.3%
Stealth: Aurora Alpha000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Mistral Small Creative1001001001009999.8%
Claude 3 Haiku1001001001009999.8%
Z.AI GLM 4.71001001001008797.3%
Claude Sonnet 4.610010099949297.1%
DeepSeek V3 (2024-12-26)1001001001008296.5%
Writer: Palmyra X5100100100968696.3%
o4 Mini1001001001007995.7%
Z.AI GLM 4.61001001001007895.5%
Hermes 3 70B100100100968195.2%
Grok 41001001001007294.4%
Ministral 3 3B1001001001007294.4%
Claude Opus 4100100100907893.6%
Minimax M2.5100100100937293.0%
Claude Sonnet 4.5100100100828092.2%
Arcee AI: Trinity Mini1001001001005991.8%
Gemini 3 Pro (Preview)1001001001005991.7%
Grok 4 Fast1001001001005691.3%
GPT-510010096887291.1%
GPT-4.11001001001005490.8%
Mistral NeMO100100100856890.5%
WizardLM 2 8x22b1001001001005090.0%
Claude Sonnet 4100100100826589.4%
Rocinante 12B1001001001004689.3%
Claude 3.7 Sonnet1001001001004689.2%
MoonshotAI: Kimi K2.5100100100845688.0%
GPT-4o Mini (temp=0)1001001001004087.9%
Z.AI GLM 510010092747287.6%
Mistral Large100100100963886.7%
Qwen 2.5 72B10010092746185.4%
DeepSeek V3.11009182796282.6%
GPT-4o, May 13th (temp=1)100100100693681.1%
Ministral 3 8B100100100594480.4%
Claude 3.5 Sonnet100100100100080.0%
Hermes 3 405B100100100100080.0%
Ministral 8B100100100100080.0%
DeepSeek-V2 Chat10010072656279.9%
Arcee AI: Trinity Large (Preview)10010010099079.8%
Claude Opus 4.510010096822079.6%
Claude Opus 4.6100100100574079.4%
Gemini 2.5 Pro10010076675078.5%
Mistral Large 3100100100771378.0%
Stealth: Aurora Alpha1009594752377.5%
Ministral 3B10010093633177.4%
Qwen 3.5 Plus (2026-02-15)10010010085077.0%
DeepSeek V3.21009485812376.6%
o4 Mini High10010010080075.9%
Gemini 2.5 Flash10010082691873.9%
GPT-4o, May 13th (temp=0)10010010052170.5%
Ministral 3 14B10010099341769.9%
GPT-4o, Aug. 6th (temp=1)10010010048069.6%
Cohere Command R+ (Aug. 2024)10010010046069.2%
GPT-5.21007969573066.9%
Gemma 3 12B10010061412565.4%
Gemma 3 27B1001009529064.8%
Claude Haiku 4.5100767361062.1%
GPT-5.11006756513060.8%
Claude 3.5 Haiku1001001000060.0%
GPT-4.1 Mini1001001000060.0%
GPT-4o Mini (temp=1)100766835055.8%
Llama 3.1 70B1001004625354.8%
GPT-5 Mini92913633852.1%
Mistral Large 2100100468050.8%
GPT-4.1 Nano10065618046.9%
Gemini 3 Flash (Preview)8785468045.1%
Llama 3.1 8B100100250045.0%
Gemma 3 4B10082163340.9%
Gemini 2.5 Flash Lite10036200031.2%
Llama 3.1 Nemotron 70B8128220026.1%
GPT-5 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral NeMO100100100100100100.0%
ByteDance Seed 1.61001001001009699.1%
Rocinante 12B1001001001009398.5%
Qwen 2.5 72B1001001001009098.1%
Mistral Large100100100989298.0%
DeepSeek V3.21001001001008296.5%
o4 Mini100100100929096.5%
Llama 3.1 8B100100100998296.1%
Hermes 3 405B1001001001007995.7%
GPT-4.11001001001007695.3%
GPT-5.11001001001007695.1%
GPT-4.1 Mini100100100948195.0%
GPT-5.210010097918694.8%
Claude Sonnet 4.51001001001007394.6%
Mistral Small 3.2 24B1001001001006993.9%
GPT-4o, Aug. 6th (temp=0)1001001001006793.3%
GPT-51001001001006693.1%
Gemini 2.5 Pro1001001001006593.1%
Mistral Medium 3.11001001001006593.0%
Gemma 3 27B1001001001005991.8%
Gemma 3 12B100100100946491.6%
Llama 3.1 70B1001001001005490.8%
Ministral 3 3B1001001001005490.8%
Hermes 3 70B100100100886490.3%
DeepSeek V3 (2025-03-24)1001001001005090.0%
Arcee AI: Trinity Large (Preview)100100100905488.9%
Claude Sonnet 4.6100100100924888.0%
Mistral Large 31001001001003987.8%
GPT-4o, May 13th (temp=1)100100100885087.5%
Claude 3 Haiku100100100885087.5%
Claude Opus 4100100100736287.2%
Claude 3.5 Sonnet100100100725785.9%
GPT-4o Mini (temp=0)1001001001002785.4%
Gemma 3 4B100100100764784.7%
Qwen 3.5 Plus (2026-02-15)10010088736284.7%
Llama 3.1 Nemotron 70B100100100675484.1%
WizardLM 2 8x22b1009790686483.9%
Z.AI GLM 4.51009797892882.1%
Grok 410010085784681.7%
Z.AI GLM 4.710010097743581.3%
Minimax M2.5100100100574981.2%
Writer: Palmyra X510010081734880.3%
Claude 3.5 Haiku100100100100080.0%
GPT-4o Mini (temp=1)100100100100080.0%
Cohere Command R+ (Aug. 2024)100100100100080.0%
Ministral 3B100100100100080.0%
Mistral Small Creative10010010098079.6%
GPT-4.1 Nano10010099564179.2%
Claude Haiku 4.5100100100831178.9%
DeepSeek V3.1100100100612777.6%
Z.AI GLM 510010072712573.5%
Arcee AI: Trinity Mini1001009667072.5%
Claude Opus 4.61008380722572.0%
Gemini 2.5 Flash10010069611769.3%
Mistral Large 210010072441866.8%
Ministral 8B100927261365.5%
DeepSeek V3 (2024-12-26)1001006954064.6%
Gemini 3 Flash (Preview)10010062461163.7%
Z.AI GLM 4.7 Flash100100978061.2%
GPT-4o, Aug. 6th (temp=1)1001008916061.0%
Claude 3.7 Sonnet1001005735058.4%
Stealth: Aurora Alpha100100910058.2%
Grok 4 Fast100796733055.6%
Gemini 3 Pro (Preview)100865723754.8%
GPT-5 Mini1007257291654.6%
Ministral 3 8B100100500050.0%
Z.AI GLM 4.68074627044.6%
Ministral 3 14B100673618044.3%
Gemini 2.5 Flash Lite10080357044.3%
Claude Opus 4.5100544618043.6%
Claude Sonnet 410010000040.0%
GPT-5 Nano900001.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Mistral Large100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Z.AI GLM 4.7 Flash1001001001009999.9%
Claude 3 Haiku1001001001009899.6%
o4 Mini1001001001009799.4%
Mistral NeMO1001001001009799.4%
Mistral Large 31001001001009498.9%
Z.AI GLM 4.5100100100989698.7%
o4 Mini High1001001001009298.4%
Llama 3.1 70B1001001001008296.5%
DeepSeek V3.11001001001008196.1%
ByteDance Seed 1.6100100100898995.6%
Ministral 8B1001001001007795.4%
Gemini 3.1 Pro (Preview)1001001001007695.3%
Mistral Small 3.2 24B1001001001007695.3%
GPT-4o, May 13th (temp=1)100100100936591.7%
GPT-4o Mini (temp=0)1001001001005691.1%
Ministral 3 14B1001001001004689.2%
GPT-5.1100100100816589.1%
DeepSeek-V2 Chat100100100925489.1%
Claude Opus 41001001001004388.5%
GPT-4o, Aug. 6th (temp=1)100100100944888.5%
GPT-4.11001001001003586.9%
Cohere Command R+ (Aug. 2024)100100100924186.6%
ByteDance Seed 1.6 Flash1001001001002985.8%
Arcee AI: Trinity Mini100100100923485.1%
DeepSeek V3 (2025-03-24)100100100675784.8%
GPT-5100100100784684.8%
GPT-4.1 Mini1001001001002384.6%
Gemini 3 Pro (Preview)100100100685184.0%
Grok 4100100100744383.4%
Z.AI GLM 510010096645482.7%
Gemini 2.5 Flash10010079736182.6%
Gemma 3 12B100100100565481.9%
Claude Sonnet 4.51009692863581.6%
Mistral Small Creative10010097644681.4%
Writer: Palmyra X510010092882580.9%
Claude Sonnet 410010088625280.3%
Qwen 2.5 72B100100100100080.0%
Ministral 3 3B100100100613679.4%
Minimax M2.510010010093379.1%
WizardLM 2 8x22b10010010089077.8%
Arcee AI: Trinity Large (Preview)10010010081076.1%
Gemini 2.5 Pro10010071565175.5%
Claude 3.7 Sonnet100100100383574.5%
Mistral Large 210010076563673.7%
Gemini 3 Flash (Preview)10010078532771.4%
Claude Opus 4.510010010056071.3%
Gemma 3 27B10010073671170.2%
GPT-4o Mini (temp=1)10010062592769.7%
Claude Opus 4.61008481561166.3%
Ministral 3B10010064412566.0%
Claude Sonnet 4.692908462065.8%
Z.AI GLM 4.61001006952064.3%
Gemma 3 4B1001001008662.9%
Grok 4 Fast1008368461762.8%
GPT-4.1 Nano100979621062.6%
Stealth: Aurora Alpha1001007530061.0%
Ministral 3 8B100827444060.1%
Claude 3.5 Haiku1001001000060.0%
DeepSeek V3.2100907435059.7%
Gemini 2.5 Flash Lite100865932055.5%
Llama 3.1 Nemotron 70B8864446040.1%
GPT-5 Nano10047136033.3%
GPT-5 Mini10030200029.9%
Llama 3.1 8B6565110028.3%
GPT-5.251392814427.2%
Claude Haiku 4.51002900025.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
o4 Mini High100100100100100100.0%
o4 Mini100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Ministral 3B100100100100100100.0%
Cohere Command R+ (Aug. 2024)1001001001009999.8%
Claude 3 Haiku100100100999799.2%
GPT-4.11001001001009398.6%
Grok 4 Fast100100100978997.2%
Rocinante 12B1001001001008095.9%
Claude Opus 4.61009998988295.7%
Hermes 3 405B1001001001007194.1%
GPT-4o Mini (temp=1)1001001001006993.9%
GPT-4o Mini (temp=0)100100100838293.0%
Z.AI GLM 4.510010096888292.9%
DeepSeek-V2 Chat1001001001006492.8%
GPT-4o, May 13th (temp=1)100100100828292.6%
Qwen 2.5 72B1001001001006292.4%
Claude 3.5 Haiku1001001001005791.5%
GPT-5.2100100100847391.4%
GPT-4.1 Nano1001001001005691.1%
Gemma 3 4B100100100767590.4%
Claude Sonnet 4.51001001001004989.7%
GPT-5.1100100100875688.6%
Ministral 3 8B10010089747487.5%
Grok 410010098776287.4%
MoonshotAI: Kimi K2.5100100100736186.8%
Qwen 3.5 397B A17B100100100825186.6%
Z.AI GLM 510010090885185.9%
GPT-510010093755284.0%
Claude Sonnet 4.61001001001001883.7%
Mistral Small Creative100100100685083.6%
Qwen 3.5 Plus (2026-02-15)1009392676583.3%
Llama 3.1 70B1001001001001583.0%
ByteDance Seed 1.6 Flash1001001001001482.7%
DeepSeek V3 (2025-03-24)1001001001001182.2%
Ministral 3 14B10010098733581.1%
Z.AI GLM 4.7 Flash10010097614680.7%
Mistral Small 3.2 24B100100100100080.0%
Llama 3.1 8B100100100100080.0%
Claude Opus 410010090594979.5%
Z.AI GLM 4.710010089574678.4%
Mistral NeMO100100100503376.6%
Mistral Medium 3.1989381733375.6%
Claude Sonnet 410010078593674.5%
DeepSeek V3.21009862625074.4%
Z.AI GLM 4.61001008682073.7%
Gemma 3 12B1001009372072.9%
Llama 3.1 Nemotron 70B1001008181072.2%
DeepSeek V3.11001008970071.8%
Ministral 3 3B1001008269771.8%
Stealth: Aurora Alpha1008164575771.7%
Gemini 2.5 Pro10010069602270.3%
Claude 3.5 Sonnet1001008661069.3%
Mistral Large 310010091381568.8%
Arcee AI: Trinity Large (Preview)1001008454067.6%
WizardLM 2 8x22b837266565366.1%
Mistral Large10010010027065.4%
Mistral Large 21001009816062.8%
Claude Haiku 4.510010088131162.5%
Ministral 8B10010046411660.6%
Hermes 3 70B1001001000060.0%
Gemini 3 Pro (Preview)1006966321155.7%
Claude Opus 4.51006949332054.2%
Gemini 2.5 Flash93686246153.9%
GPT-5 Mini976745361552.0%
Claude 3.7 Sonnet9984660049.9%
Gemini 2.5 Flash Lite10078710049.6%
Minimax M2.5100100290045.8%
Gemini 3 Flash (Preview)10065640045.7%
GPT-5 Nano442900014.6%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100100100100.0%
o4 Mini100100100919096.2%
Gemini 3.1 Pro (Preview)1009594888392.0%
Mistral Small 3.2 24B1001001001005490.9%
Mistral NeMO100100100615983.9%
Claude 3 Haiku10010092724882.3%
Hermes 3 405B10010081675079.4%
Claude Sonnet 4.610010097751176.7%
Grok 4 Fast1008873665275.7%
ByteDance Seed 1.6100100100412272.6%
Ministral 3 14B1001009060069.9%
DeepSeek V3 (2025-03-24)10010061443467.6%
GPT-4o, May 13th (temp=0)1009469621167.3%
o4 Mini High100928856067.0%
Llama 3.1 8B100967659066.2%
Writer: Palmyra X510010072312164.7%
Rocinante 12B10010076221161.9%
GPT-4o Mini (temp=1)1001006841061.9%
DeepSeek-V2 Chat1001008420561.8%
ByteDance Seed 1.6 Flash786259575261.4%
Gemini 3 Pro (Preview)100716460058.9%
Qwen 3.5 397B A17B765756504957.4%
Mistral Large1007672181656.6%
MoonshotAI: Kimi K2.5816252472753.6%
Gemini 2.5 Pro1001005311052.7%
Gemma 3 27B100735435052.3%
Z.AI GLM 574726841051.1%
DeepSeek V3.11001003521051.1%
Gemma 3 12B76625644047.8%
GPT-4o Mini (temp=0)10069640046.6%
Mistral Small Creative10068590045.3%
Mistral Medium 3.110091340045.0%
Z.AI GLM 4.7 Flash944640281544.6%
Mistral Large 2100100230044.6%
GPT-510072404043.1%
Gemini 2.5 Flash Lite8882349042.7%
Claude Opus 4.689822317042.1%
Hermes 3 70B9696160041.5%
Ministral 3B10076253040.9%
Z.AI GLM 4.610086170040.5%
Gemini 2.5 Flash1009400038.9%
DeepSeek V3 (2024-12-26)10050330036.6%
GPT-4.17565301034.2%
DeepSeek V3.26253476033.5%
Grok 48151340033.1%
GPT-4.1 Nano1006400032.8%
Llama 3.1 70B817600031.4%
Mistral Large 364611616031.2%
Claude 3.7 Sonnet9338220030.4%
Cohere Command R+ (Aug. 2024)1005000030.0%
Minimax M2.5696270027.8%
GPT-4o, Aug. 6th (temp=1)6550166027.4%
Qwen 2.5 72B884400026.4%
Claude Opus 4.5864100025.4%
Llama 3.1 Nemotron 70B616100024.3%
Gemma 3 4B100860022.9%
Z.AI GLM 4.76829100021.2%
Gemini 3 Flash (Preview)594330020.9%
Claude Sonnet 4.5100000020.0%
GPT-4o, Aug. 6th (temp=0)100000020.0%
GPT-5 Nano3636260019.5%
GPT-5.190800019.4%
GPT-4.1 Mini33291611619.0%
WizardLM 2 8x22b94000018.9%
Claude Opus 43630200017.2%
Ministral 3 8B552910017.2%
Ministral 8B423500015.4%
Qwen 3.5 Plus (2026-02-15)472000013.5%
GPT-4o, May 13th (temp=1)501160013.4%
GPT-5 Mini333000012.6%
Arcee AI: Trinity Large (Preview)56000011.3%
Z.AI GLM 4.52313131010.0%
GPT-5.250000010.0%
Ministral 3 3B4800009.6%
Claude 3.5 Sonnet2800005.6%
Claude Haiku 4.52300004.6%
Claude Sonnet 4000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Haiku000000.0%
Arcee AI: Trinity Mini000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
DeepSeek V3 (2025-03-24)100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X51001001001008597.0%
o4 Mini High1001001001007795.4%
Claude 3 Haiku1001001001007494.8%
Mistral Small Creative10010096935789.4%
Claude Sonnet 4.6100100100845788.3%
Qwen 2.5 72B100100100765185.6%
o4 Mini10010081796284.3%
DeepSeek V3.210010096833683.0%
GPT-5.11009676756382.1%
Mistral Large 2100100100941682.0%
Grok 4.1 Fast100100100882081.5%
Rocinante 12B100100100693681.1%
Hermes 3 70B100100100100080.0%
MoonshotAI: Kimi K2.510010010097079.4%
Z.AI GLM 4.710010095564479.0%
GPT-4.11009690653877.6%
ByteDance Seed 1.6 Flash1008883832475.6%
Gemini 2.5 Pro10010076623975.3%
Gemma 3 12B1007979674974.5%
DeepSeek-V2 Chat100100100353473.8%
Mistral Large 3100100100431872.3%
Ministral 3 14B1008173644372.1%
Ministral 3 8B988378712070.0%
Hermes 3 405B100988459068.2%
Mistral Small 3.2 24B10010010039067.8%
Mistral Large1001008946067.0%
Qwen 3.5 397B A17B93898167066.0%
Cohere Command R+ (Aug. 2024)1001006449062.6%
Claude Opus 4.697858150062.5%
Z.AI GLM 4.7 Flash1008460393062.3%
Gemini 3 Flash (Preview)98736868061.6%
Gemini 3.1 Pro (Preview)1008170252359.8%
GPT-5100877932059.7%
Gemini 2.5 Flash10010037352359.0%
Gemini 2.5 Flash Lite100785345055.0%
Z.AI GLM 5100100740054.8%
Claude 3.7 Sonnet725551504254.0%
Ministral 8B100656528652.9%
Minimax M2.51001004313051.2%
GPT-4o, Aug. 6th (temp=1)1001003518050.6%
Claude Opus 4.5100100530050.6%
Arcee AI: Trinity Large (Preview)86804135849.9%
Ministral 3 3B100706118049.8%
Qwen 3.5 Plus (2026-02-15)8079770047.3%
Gemma 3 27B10085473047.1%
Grok 4 Fast10098320046.1%
Claude Opus 410082435046.1%
Claude Haiku 4.510082440045.0%
Ministral 3B96692218041.2%
Z.AI GLM 4.681624617041.0%
GPT-4o Mini (temp=1)10089130040.4%
Llama 3.1 70B1009660040.3%
DeepSeek V3.1100572717040.2%
Gemini 3 Pro (Preview)77713018039.2%
GPT-4o, May 13th (temp=1)78564911038.7%
Claude Sonnet 4.51008490038.7%
DeepSeek V3 (2024-12-26)83443622037.2%
GPT-5.262593524036.1%
Claude 3.5 Sonnet7452500035.2%
Mistral NeMO7856383034.8%
GPT-4.1 Nano7468250033.5%
Claude Sonnet 4893600025.1%
GPT-4o Mini (temp=0)5341250023.8%
Grok 41001070023.4%
Gemma 3 4B763300021.9%
GPT-5 Mini733600021.8%
Arcee AI: Trinity Mini4138219021.8%
Z.AI GLM 4.5881300020.1%
Llama 3.1 Nemotron 70B100000020.0%
GPT-4.1 Mini781500018.5%
GPT-4o, Aug. 6th (temp=0)68531015.4%
ByteDance Seed 1.6443110015.2%
GPT-4o, May 13th (temp=0)61700013.6%
Llama 3.1 8B62100012.6%
WizardLM 2 8x22b2318145011.9%
GPT-5 Nano332400011.5%
Claude 3.5 Haiku1800003.7%
Stealth: Aurora Alpha000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3 Haiku100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
Writer: Palmyra X51001001001009298.4%
Claude 3.7 Sonnet1001001001008597.0%
GPT-4o, Aug. 6th (temp=0)1001001001008496.8%
DeepSeek V3 (2024-12-26)100100100988496.4%
Gemini 3.1 Pro (Preview)1001001001007695.3%
Mistral Large 21001001001007695.3%
Grok 4.1 Fast1001001001007194.2%
Mistral Medium 3.11001001001006893.6%
GPT-4o, May 13th (temp=0)1001001001006593.1%
o4 Mini High10010092908092.4%
GPT-4o Mini (temp=0)1001001001004689.2%
ByteDance Seed 1.61001001001003687.3%
MoonshotAI: Kimi K2.51001001001003186.2%
Claude 3.5 Sonnet1001001001003186.2%
Mistral Large100100100814785.6%
DeepSeek V3 (2025-03-24)100100100893685.1%
Gemma 3 27B1008886796984.3%
Mistral Small 3.2 24B100100100823683.6%
Rocinante 12B100100100694883.5%
Gemini 2.5 Pro100100100912583.2%
ByteDance Seed 1.6 Flash10010083824782.6%
Ministral 3 3B1009688745482.2%
Ministral 8B10010085645681.0%
Hermes 3 70B10010084694980.4%
Gemini 3 Pro (Preview)1009288853680.3%
Hermes 3 405B100100100100080.0%
Ministral 3 8B100100100851479.7%
Qwen 2.5 72B10010010079777.1%
Minimax M2.510010010082377.1%
Mistral Small Creative1008682595777.0%
Ministral 3 14B10010092573576.8%
Z.AI GLM 4.7 Flash100100100552576.0%
Gemma 3 12B100100100503075.9%
GPT-5.11008777664174.2%
Claude Opus 4100100100471873.1%
o4 Mini1009487493272.4%
GPT-4o, May 13th (temp=1)1007467655672.4%
Gemini 2.5 Flash Lite1001008571071.1%
Llama 3.1 8B100969462070.5%
Claude Sonnet 41001007267969.5%
DeepSeek-V2 Chat10010010046069.2%
Grok 4 Fast10010067482968.7%
Mistral Large 310010010041068.3%
Z.AI GLM 4.510010081392168.0%
DeepSeek V3.210010095281768.0%
Z.AI GLM 510010059502767.2%
Z.AI GLM 4.71001008938867.0%
Claude Sonnet 4.610010010035066.9%
Claude Opus 4.61001008151066.4%
Gemini 3 Flash (Preview)857675504165.6%
Qwen 3.5 397B A17B1001008046065.1%
Arcee AI: Trinity Large (Preview)10010010017363.9%
Claude Opus 4.51001006051062.2%
GPT-4o, Aug. 6th (temp=1)1001005950061.8%
GPT-4.11007847434061.5%
Qwen 3.5 Plus (2026-02-15)100786761061.0%
Mistral NeMO1001001000060.0%
GPT-5100897925459.5%
Claude 3.5 Haiku1001007918059.4%
Grok 489806560058.6%
Ministral 3B1001007214057.1%
WizardLM 2 8x22b1001005035056.9%
Arcee AI: Trinity Mini10097850056.4%
Llama 3.1 Nemotron 70B100924841056.2%
Gemma 3 4B1001005623055.7%
Z.AI GLM 4.6100685947054.9%
DeepSeek V3.168636027043.7%
Llama 3.1 70B100100160043.2%
Gemini 2.5 Flash585540361340.5%
GPT-4.1 Nano10048460038.8%
GPT-4o Mini (temp=1)10031290032.0%
Claude Sonnet 4.57140218528.9%
Claude Haiku 4.561000012.1%
GPT-4.1 Mini411800011.9%
GPT-5.2312200010.6%
GPT-5 Mini3800007.6%
GPT-5 Nano1110002.4%
Stealth: Aurora Alpha000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Small Creative100100100100100100.0%
GPT-4o Mini (temp=0)1001001001009999.8%
GPT-4o Mini (temp=1)100100100999698.9%
Ministral 3 8B1001001001009398.6%
Qwen 2.5 72B1001001001009098.1%
Hermes 3 405B1001001001008997.8%
Claude 3 Haiku1001001001008697.2%
o4 Mini1001001001008296.5%
GPT-4.1 Mini100100100928996.2%
Ministral 3B100100100927493.2%
DeepSeek V3 (2025-03-24)1001001001006492.8%
Arcee AI: Trinity Mini1001001001006492.8%
Mistral NeMO10010098837992.0%
Claude 3.5 Haiku1001001001005791.5%
Writer: Palmyra X51001001001005490.8%
o4 Mini High100100100767490.1%
Llama 3.1 70B10010090827689.9%
Mistral Small 3.2 24B1001001001004789.4%
WizardLM 2 8x22b100100100826489.2%
GPT-5.110010094757588.8%
Hermes 3 70B10010093895086.4%
Grok 4100100100755586.1%
Gemma 3 4B1001001001002985.8%
Gemini 3 Pro (Preview)100100100745485.6%
DeepSeek V3 (2024-12-26)10010090716485.0%
Rocinante 12B1001001001001182.2%
GPT-4o, Aug. 6th (temp=0)100100100901881.8%
GPT-4.1100100100842080.9%
Gemma 3 27B10010072656480.2%
DeepSeek V3.1100100100653379.6%
ByteDance Seed 1.6100100100761879.0%
Claude Opus 4100100100623178.6%
Llama 3.1 Nemotron 70B100100100672277.7%
Minimax M2.510010065645676.9%
Z.AI GLM 4.7888684685776.7%
Mistral Large100100100413174.5%
Gemma 3 12B10010069682772.9%
Cohere Command R+ (Aug. 2024)10010073721872.7%
Z.AI GLM 4.7 Flash10010010062072.4%
GPT-5 Mini1009185711472.1%
GPT-51001008566972.0%
DeepSeek-V2 Chat10010090363171.6%
Ministral 3 3B100100100341770.1%
Mistral Large 310010010046069.2%
GPT-4.1 Nano100988457067.9%
Mistral Large 21009467482566.8%
DeepSeek V3.21008562612566.5%
GPT-4o, May 13th (temp=1)10010010016063.2%
Claude 3.7 Sonnet1001006549062.8%
Z.AI GLM 4.6100827650061.8%
Qwen 3.5 Plus (2026-02-15)10010047312761.1%
Ministral 3 14B1001001005061.0%
Gemini 2.5 Flash1001005747060.9%
Mistral Medium 3.1100826151660.0%
Ministral 8B100948916059.9%
Claude Sonnet 4.61001005738059.1%
GPT-5.285836854058.0%
Z.AI GLM 4.51005450443957.3%
Gemini 3 Flash (Preview)1008354271756.2%
Gemini 2.5 Flash Lite10093766055.1%
Claude Opus 4.6100814140052.4%
Claude Haiku 4.510098570051.1%
Gemini 2.5 Pro89725339050.5%
Claude Opus 4.5100100510050.3%
Claude Sonnet 4.5835743431849.0%
Llama 3.1 8B100744525048.8%
Arcee AI: Trinity Large (Preview)1005635311847.9%
MoonshotAI: Kimi K2.59373610045.3%
Claude Sonnet 410010000040.0%
Grok 4 Fast100402925038.7%
Z.AI GLM 59441315034.2%
Claude 3.5 Sonnet10036280032.9%
Stealth: Aurora Alpha1004000027.9%
GPT-4o, Aug. 6th (temp=1)98000019.6%
GPT-5 Nano431000010.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Mistral NeMO100100100100100100.0%
Rocinante 12B100100100100100100.0%
Mistral Small 3.2 24B1001001001009799.4%
Hermes 3 405B1001001001009699.1%
Ministral 3 8B1001001001009398.6%
Ministral 8B1001001001009398.5%
DeepSeek V3 (2025-03-24)100100100939096.7%
Claude Opus 41001001001008296.5%
GPT-4o Mini (temp=1)1001001001008196.1%
Ministral 3 3B100100100888895.0%
Claude 3.7 Sonnet10010093908593.6%
DeepSeek V3.2100100100946992.8%
Grok 4.1 Fast10010096897391.6%
Claude 3.5 Sonnet1009289888690.9%
Grok 4 Fast100100100926290.7%
GPT-4o, Aug. 6th (temp=1)1001001001005090.0%
Mistral Small Creative100100100885488.3%
Hermes 3 70B1001001001003486.7%
Gemini 2.5 Flash10010098973586.2%
Claude Sonnet 4100100100823382.9%
DeepSeek V3 (2024-12-26)10010092714982.2%
Mistral Large10010088655782.0%
Claude 3 Haiku100100100991182.0%
Gemma 3 12B10010074686280.9%
Grok 4100100100604480.7%
Claude 3.5 Haiku100100100100080.0%
Writer: Palmyra X5100100100100080.0%
Arcee AI: Trinity Mini1009796762578.8%
Ministral 3 14B10010010093078.6%
Gemini 2.5 Pro1001009890077.8%
Qwen 3.5 397B A17B10010076753577.4%
Llama 3.1 70B1001009488076.4%
Cohere Command R+ (Aug. 2024)100100100503176.2%
o4 Mini10010087593175.2%
GPT-4o, Aug. 6th (temp=0)10010093691174.7%
ByteDance Seed 1.6 Flash10010079771373.7%
Z.AI GLM 4.510010080651872.7%
Claude Sonnet 4.510010010052872.1%
Gemma 3 27B10010010052771.8%
DeepSeek-V2 Chat1001007971771.3%
Qwen 2.5 72B10010078562371.2%
o4 Mini High1009082492869.6%
Gemini 3 Flash (Preview)1008680473369.1%
Mistral Large 21009456543968.6%
Z.AI GLM 4.6100929060068.3%
Z.AI GLM 4.795937268466.6%
Claude Sonnet 4.6100978844065.6%
Claude Opus 4.61009056403664.6%
Ministral 3B908479462264.2%
Gemini 2.5 Flash Lite907976502564.1%
Gemma 3 4B100989725063.9%
GPT-5.11008544413861.7%
WizardLM 2 8x22b100926944061.0%
GPT-4.1 Nano100100970059.4%
Llama 3.1 8B1007958332258.4%
GPT-4.11009249291857.6%
GPT-4o, May 13th (temp=0)1005754383556.8%
Z.AI GLM 4.7 Flash796553513055.6%
Claude Haiku 4.5100686148055.3%
Minimax M2.51009835251855.2%
Qwen 3.5 Plus (2026-02-15)100736438055.0%
Claude Opus 4.5100100710054.1%
MoonshotAI: Kimi K2.5100100690053.9%
ByteDance Seed 1.61005244312750.7%
DeepSeek V3.1100905013050.7%
Mistral Large 310067360040.6%
GPT-566514322036.2%
GPT-5.2603330241732.8%
Z.AI GLM 5837510031.9%
Llama 3.1 Nemotron 70B827400031.3%
GPT-4.1 Mini5946410029.2%
GPT-5 Mini1700003.5%
Stealth: Aurora Alpha100000.3%
GPT-5 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
o4 Mini High100100100100100100.0%
o4 Mini100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Ministral 3B100100100100100100.0%
Mistral NeMO1001001001009899.6%
Claude 3 Haiku1001001001009799.4%
Grok 4.1 Fast100100100989598.6%
Qwen 2.5 72B100100100979598.5%
GPT-4o, May 13th (temp=0)1001001001008897.7%
Cohere Command R+ (Aug. 2024)100100100968996.9%
Rocinante 12B1001001001008296.5%
GPT-4.1 Mini1001001001008196.1%
Qwen 3.5 397B A17B1001001001008095.9%
Gemini 3.1 Pro (Preview)1001001001007695.3%
WizardLM 2 8x22b1001001001007695.3%
Hermes 3 70B100100100978095.3%
Arcee AI: Trinity Mini100100100928495.2%
ByteDance Seed 1.61001001001007595.1%
ByteDance Seed 1.6 Flash1001001001007595.0%
Llama 3.1 Nemotron 70B100100100868494.0%
Ministral 3 14B100100100916992.1%
Claude Sonnet 4.5100100100827891.8%
Z.AI GLM 4.7100100100936291.0%
GPT-510010092896589.3%
GPT-4o, May 13th (temp=1)100100100905388.6%
GPT-4.11009690866988.3%
DeepSeek V3.210010094826287.5%
Mistral Large 31009998924787.2%
GPT-4o Mini (temp=1)1001001001002985.8%
DeepSeek-V2 Chat10010084815984.8%
GPT-4o, Aug. 6th (temp=0)100100100624982.2%
Mistral Large10010073686781.6%
Gemini 2.5 Pro10010086725081.6%
Gemma 3 4B10010098733681.5%
DeepSeek V3.110010073676180.0%
Gemma 3 12B10010088713879.4%
MoonshotAI: Kimi K2.5928481746178.3%
Ministral 3 8B10010010079075.7%
GPT-4o, Aug. 6th (temp=1)100100100561674.3%
Mistral Small 3.2 24B1001009772073.8%
Mistral Medium 3.1100100100422373.1%
Grok 4 Fast1001009365071.7%
Mistral Large 21001008373071.4%
DeepSeek V3 (2024-12-26)1001009061070.2%
Mistral Small Creative10010075561869.8%
Claude Haiku 4.51001008659069.0%
Grok 41001008845667.6%
DeepSeek V3 (2025-03-24)1006761594866.9%
Gemini 3 Pro (Preview)10010081341766.4%
GPT-4.1 Nano10010062353566.3%
Claude 3.7 Sonnet10010052413064.6%
Z.AI GLM 4.51001009820063.6%
Claude Opus 41008988251463.2%
Z.AI GLM 4.7 Flash867470462760.7%
Qwen 3.5 Plus (2026-02-15)86837356059.8%
GPT-5 Mini747061572557.5%
GPT-4o Mini (temp=0)1008453292157.3%
Llama 3.1 70B10010039361157.3%
Gemma 3 27B1001006120056.2%
Ministral 8B100916717054.9%
GPT-5.1100824440253.7%
Z.AI GLM 5100686235053.1%
Gemini 2.5 Flash Lite10072647048.6%
Claude Sonnet 4100755214048.2%
Claude Sonnet 4.6100655911047.1%
Claude Opus 4.693575618044.9%
Arcee AI: Trinity Large (Preview)82744217343.8%
GPT-5.2100762118043.0%
Minimax M2.510073280040.2%
Llama 3.1 8B10061255038.1%
Claude Opus 4.510069131036.6%
Gemini 3 Flash (Preview)9759240035.8%
Z.AI GLM 4.686461716333.4%
Claude 3.5 Haiku1005400030.8%
Gemini 2.5 Flash8338250029.3%
Claude 3.5 Sonnet1004100028.3%
GPT-5 Nano4700009.3%
Stealth: Aurora Alpha4500008.9%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Rocinante 12B100100100100100100.0%
Grok 4.1 Fast1001001001008496.8%
MoonshotAI: Kimi K2.51001001001008196.1%
GPT-4o, May 13th (temp=0)100100100888294.0%
o4 Mini10010090826687.7%
Qwen 3.5 397B A17B1001001001003787.5%
GPT-4o Mini (temp=1)100100100696286.3%
o4 Mini High100100100893284.2%
DeepSeek V3 (2025-03-24)1001009997379.7%
Mistral Medium 3.1100100100552375.6%
Grok 4 Fast10010068674175.3%
Llama 3.1 70B1009882672273.8%
DeepSeek-V2 Chat1009994472372.9%
GPT-4.110010086472371.3%
Mistral Large1008380652270.0%
GPT-4o, Aug. 6th (temp=1)10010062523369.4%
Hermes 3 405B10010010039067.8%
Z.AI GLM 51001007853066.0%
GPT-5.1100987656065.9%
Gemini 3 Flash (Preview)100898144062.6%
GPT-51008657363262.1%
Mistral Small Creative100936252061.5%
GPT-4o, Aug. 6th (temp=0)1001006529860.6%
Ministral 3 14B1008859361860.4%
Qwen 2.5 72B100766954059.9%
Mistral Large 398846156059.8%
Claude Opus 4100907235059.2%
Mistral NeMO100100960059.1%
Gemini 2.5 Pro100100823057.1%
Hermes 3 70B100695757056.8%
ByteDance Seed 1.6 Flash1001005925056.8%
WizardLM 2 8x22b100100790055.8%
ByteDance Seed 1.6100100610052.1%
Claude Sonnet 4.610098566051.9%
Claude Opus 4.691655542050.7%
Grok 495913230049.6%
GPT-4.1 Mini100100416049.4%
Writer: Palmyra X51001003115049.2%
DeepSeek V3 (2024-12-26)100565236048.9%
Arcee AI: Trinity Mini100100440048.7%
Mistral Large 210094490048.5%
Gemini 3 Pro (Preview)100723532047.9%
Claude Opus 4.5100100353047.5%
GPT-4o Mini (temp=0)100794711047.4%
Cohere Command R+ (Aug. 2024)100100310046.2%
Gemma 3 12B10080470045.4%
Gemini 2.5 Flash72666421044.4%
Z.AI GLM 4.6100594116043.3%
Gemma 3 4B10064493043.1%
Mistral Small 3.2 24B10077280041.1%
Qwen 3.5 Plus (2026-02-15)8686301040.4%
GPT-4o, May 13th (temp=1)8171460039.4%
Z.AI GLM 4.7873631261739.4%
GPT-4.1 Nano7168560038.9%
Gemini 2.5 Flash Lite10072160037.6%
Ministral 3B1008600037.2%
Arcee AI: Trinity Large (Preview)1007400034.8%
Z.AI GLM 4.7 Flash8154360034.2%
Claude Haiku 4.510035218333.3%
Claude Sonnet 4.510036251032.4%
Ministral 3 3B7264187032.2%
Llama 3.1 8B767400030.1%
Gemma 3 27B7359180030.1%
Ministral 3 8B8154160030.1%
Claude 3 Haiku1004330029.2%
Llama 3.1 Nemotron 70B6764110028.3%
GPT-5.25149390027.8%
Claude Sonnet 4100000020.0%
Claude 3.5 Haiku100000020.0%
Ministral 8B363170014.9%
Minimax M2.573000014.6%
DeepSeek V3.2541350014.4%
Claude 3.7 Sonnet64100012.9%
GPT-5 Mini2900005.8%
DeepSeek V3.12500005.0%
Stealth: Aurora Alpha1500003.0%
GPT-5 Nano1200002.5%
Claude 3.5 Sonnet700001.4%
Z.AI GLM 4.5000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
o4 Mini High100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Rocinante 12B100100100100100100.0%
Grok 4 Fast1001001001009498.9%
Qwen 3.5 397B A17B1001001001009298.4%
Claude Sonnet 4.6100100100989097.7%
GPT-5.11001001001008096.0%
Claude Opus 4.61001001001007695.3%
o4 Mini1001001001007194.3%
Mistral Medium 3.11001001001006893.6%
Mistral Large 31001001001006292.5%
Hermes 3 405B1001001001006292.5%
MoonshotAI: Kimi K2.5100100100877492.2%
DeepSeek V3.2100100100926891.9%
Ministral 3 14B10010096906890.8%
Mistral Small Creative1001001001005490.8%
GPT-5100100100745585.8%
Grok 4100100100764985.1%
DeepSeek V3 (2024-12-26)100100100685684.8%
WizardLM 2 8x22b10010097765084.6%
Claude Opus 4100100100744984.6%
ByteDance Seed 1.6 Flash100100100781879.2%
Hermes 3 70B10010010094078.9%
Llama 3.1 8B10010086743178.2%
Mistral Large 210010092761676.9%
Gemini 3 Flash (Preview)10010085712876.7%
Qwen 2.5 72B10010082573875.4%
GPT-4o Mini (temp=0)100100100463075.1%
Ministral 3 3B10010067545474.9%
Claude 3 Haiku10010010069073.9%
GPT-4.1887874655672.2%
Gemini 2.5 Flash Lite1001008273070.9%
Ministral 8B10010081591570.9%
Gemini 2.5 Pro10010010047069.5%
GPT-5.21009862493869.4%
Mistral Large1008269643169.3%
Z.AI GLM 51001007363367.8%
Cohere Command R+ (Aug. 2024)1001008256067.6%
Claude 3.7 Sonnet100927666066.7%
Z.AI GLM 4.71001007160066.1%
Qwen 3.5 Plus (2026-02-15)1008461463966.0%
GPT-4o, May 13th (temp=1)908975581164.7%
Gemma 3 4B1001009423063.6%
Mistral NeMO100857259063.2%
DeepSeek V3 (2025-03-24)1001008622061.5%
Gemma 3 27B1001009215061.3%
Gemini 3 Pro (Preview)1008161342760.4%
Gemini 2.5 Flash1001005941060.1%
GPT-4o, May 13th (temp=0)1009873181160.0%
Mistral Small 3.2 24B1001001000060.0%
Minimax M2.5100905747058.8%
Ministral 3B82726967058.1%
DeepSeek-V2 Chat10097860056.5%
Claude Sonnet 4.5968836232253.1%
Z.AI GLM 4.5100815614050.2%
Claude Opus 4.59999463049.6%
Gemma 3 12B10088570049.0%
Z.AI GLM 4.7 Flash99544843048.8%
GPT-4o Mini (temp=1)846040381848.0%
Ministral 3 8B100100390047.8%
DeepSeek V3.1100100380047.5%
ByteDance Seed 1.6100564138047.1%
GPT-5 Mini8481690046.8%
Claude Sonnet 4100100270045.4%
GPT-4.1 Mini100614011042.3%
Z.AI GLM 4.6565536312841.3%
GPT-4o, Aug. 6th (temp=1)76722923040.1%
Claude Haiku 4.5100681715040.0%
Arcee AI: Trinity Large (Preview)1008690039.0%
Claude 3.5 Sonnet1003100026.2%
Llama 3.1 70B4844340025.1%
Claude 3.5 Haiku100000020.0%
GPT-4.1 Nano4133175019.2%
GPT-4o, Aug. 6th (temp=0)3530170016.2%
Llama 3.1 Nemotron 70B3600007.3%
Arcee AI: Trinity Mini3300006.6%
Stealth: Aurora Alpha000000.0%
GPT-5 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Grok 4100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Rocinante 12B100100100100100100.0%
Ministral 3 3B1001001001009999.8%
Grok 4 Fast1001001001009899.6%
Writer: Palmyra X5100100100999799.2%
Mistral NeMO100100100939397.1%
Gemini 3 Pro (Preview)1001001001008296.3%
o4 Mini High100100100948796.2%
DeepSeek V3 (2025-03-24)100100100938896.1%
Ministral 3B1001001001007695.3%
Ministral 3 8B100100100997695.1%
GPT-4o, May 13th (temp=0)1001001001007394.6%
Ministral 8B100100100987494.5%
MoonshotAI: Kimi K2.5100100100917693.5%
Claude Opus 41001001001006492.8%
Ministral 3 14B1001001001006192.1%
Minimax M2.51001001001005991.8%
Hermes 3 405B100100100827691.8%
o4 Mini1001001001005991.7%
Z.AI GLM 4.6100100100895789.3%
Mistral Large 3100100100865688.3%
Mistral Small 3.2 24B1001001001003286.5%
Hermes 3 70B100100100963686.4%
DeepSeek-V2 Chat1001001001003186.2%
Z.AI GLM 4.7100100100805186.2%
ByteDance Seed 1.61001001001002284.4%
Mistral Medium 3.1100100100595783.3%
Gemini 3 Flash (Preview)10010087864182.8%
GPT-5.210010089615981.9%
Claude 3.7 Sonnet100100100575281.9%
Cohere Command R+ (Aug. 2024)100100100100080.0%
Gemma 3 4B100100100100080.0%
Qwen 2.5 72B1001009994078.7%
DeepSeek V3.21008888853378.6%
Claude Sonnet 410010010092078.4%
Z.AI GLM 4.510010098672778.3%
GPT-4o Mini (temp=0)10010099642577.6%
DeepSeek V3.110010082821876.3%
DeepSeek V3 (2024-12-26)10010010081076.1%
GPT-5.110010010067875.0%
Claude Sonnet 4.5100100100412172.4%
GPT-4o Mini (temp=1)10010079433972.1%
Claude Opus 4.610010072483671.2%
Gemini 2.5 Pro10010081601170.3%
Arcee AI: Trinity Large (Preview)10010089391869.3%
Mistral Large10010071641169.1%
GPT-5999661553268.8%
Z.AI GLM 4.7 Flash1008275464068.5%
Z.AI GLM 510010010038067.6%
Arcee AI: Trinity Mini10010010034066.7%
Gemma 3 27B1001006248062.0%
GPT-4o, May 13th (temp=1)1001001008061.7%
Claude Opus 4.5100866247159.2%
Claude 3 Haiku1007146412757.0%
Gemma 3 12B100796836056.6%
GPT-4o, Aug. 6th (temp=0)1009062161456.3%
GPT-4.1100100740054.8%
GPT-4.1 Nano100964428053.5%
Llama 3.1 8B100983128051.4%
Claude Sonnet 4.690894134050.9%
WizardLM 2 8x22b9085800050.9%
GPT-5 Nano10094550049.8%
GPT-4.1 Mini100100340046.7%
Claude 3.5 Sonnet10067640046.1%
Llama 3.1 Nemotron 70B100921818045.7%
Stealth: Aurora Alpha10078410043.8%
GPT-4o, Aug. 6th (temp=1)96712114040.1%
Claude 3.5 Haiku10010000040.0%
Claude Haiku 4.5100291815032.5%
Llama 3.1 70B10022118028.3%
Gemini 2.5 Flash6254150026.2%
GPT-5 Mini44272510522.1%
Gemini 2.5 Flash Lite88800019.2%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
GPT-4o, May 13th (temp=0)1001001001009799.4%
GPT-4o, Aug. 6th (temp=0)1001001001009398.6%
Grok 41001001001008296.5%
Gemini 3 Pro (Preview)100100100948896.4%
Mistral NeMO1001001001008196.1%
Qwen 3.5 397B A17B10010094939295.9%
GPT-5 Mini100100100987594.5%
Ministral 3 3B1001001001007294.4%
Gemini 3 Flash (Preview)10010094928694.3%
Gemma 3 27B1001001001006793.3%
Claude Sonnet 4.61001001001006292.5%
Arcee AI: Trinity Mini1001001001006192.1%
Cohere Command R+ (Aug. 2024)1001001001005791.5%
Claude 3.5 Sonnet100100100906791.4%
Claude Sonnet 410010099985991.2%
GPT-4.1 Mini1001001001005290.4%
ByteDance Seed 1.6 Flash100100100955590.0%
Claude Opus 4.610010094787389.0%
Mistral Large100100100924888.0%
Z.AI GLM 4.7 Flash1001001001004087.9%
Claude 3 Haiku100100100974187.6%
DeepSeek V3 (2024-12-26)100100100904687.1%
GPT-5.2100100100825287.0%
GPT-4o, May 13th (temp=1)1001001001003486.7%
Llama 3.1 Nemotron 70B100100100696486.6%
Gemma 3 4B1001001001002785.4%
DeepSeek V3.210010079786283.7%
MoonshotAI: Kimi K2.510010089864183.2%
Mistral Medium 3.1100100100684682.8%
Gemini 2.5 Flash10010093843682.8%
DeepSeek-V2 Chat100100100100080.0%
Gemma 3 12B100100100100080.0%
Llama 3.1 70B100100100100080.0%
Mistral Small Creative100100100613979.9%
GPT-4.1 Nano100100100593979.6%
DeepSeek V3 (2025-03-24)10010082615479.4%
Z.AI GLM 4.610010010096079.1%
WizardLM 2 8x22b10010010088578.4%
Mistral Large 3100100100523978.2%
GPT-4o, Aug. 6th (temp=1)100100100484177.9%
Z.AI GLM 51009476764177.7%
DeepSeek V3.11001009989077.6%
Claude Sonnet 4.510010069655277.3%
Hermes 3 70B100100100571173.7%
Claude 3.7 Sonnet100100100541173.0%
Rocinante 12B10010010064072.8%
Qwen 3.5 Plus (2026-02-15)1008175534670.9%
Claude Haiku 4.510010010050070.0%
Ministral 3 14B10010089342569.5%
Mistral Large 210010010046069.2%
GPT-4o Mini (temp=0)1001008252668.0%
Gemini 2.5 Flash Lite10010010038067.6%
Gemini 2.5 Pro10010073412367.6%
Grok 4 Fast10010010035066.9%
Qwen 2.5 72B1009376362866.8%
Z.AI GLM 4.71008257353361.5%
Claude Opus 4.51006860561860.6%
Claude 3.5 Haiku1001001000060.0%
Ministral 3B1001006716357.1%
Z.AI GLM 4.510096890056.9%
Arcee AI: Trinity Large (Preview)10097740054.2%
Ministral 8B100100640052.8%
Writer: Palmyra X51001003128051.8%
Llama 3.1 8B100100540050.8%
Minimax M2.5100525039048.2%
Stealth: Aurora Alpha100844410047.5%
Claude Opus 459403936535.8%
GPT-5 Nano4800009.5%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
o4 Mini100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
GPT-4.1100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Mistral Large100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Ministral 8B100100100100100100.0%
Mistral Medium 3.11001001001009799.4%
o4 Mini High1001001001009699.2%
DeepSeek V3 (2025-03-24)100100100999799.2%
Mistral NeMO1001001001009298.4%
Mistral Small 3.2 24B1001001001009098.0%
Ministral 3B100100100998296.2%
Grok 4 Fast1001001001008095.9%
Hermes 3 70B1001001001007494.8%
Claude Opus 4100100100967694.4%
Writer: Palmyra X51001001001007294.4%
DeepSeek V3.21001001001006893.6%
ByteDance Seed 1.6 Flash100100100996793.2%
Gemma 3 4B1001001001006593.1%
Mistral Large 2100100100937293.0%
MoonshotAI: Kimi K2.51001001001006492.8%
Mistral Large 31001001001006492.8%
Qwen 3.5 Plus (2026-02-15)1001001001006192.1%
GPT-5.11001001001005490.8%
Claude 3.5 Sonnet1001001001005090.0%
DeepSeek V3.11009490906888.4%
Ministral 3 8B100100100944688.1%
Mistral Small Creative100100100696286.3%
Gemma 3 12B1001001001002985.8%
Gemini 2.5 Pro1001001001002384.6%
GPT-4o, May 13th (temp=0)100100100655784.5%
Claude Sonnet 4.510010089844884.2%
DeepSeek V3 (2024-12-26)10010096902582.2%
Llama 3.1 Nemotron 70B10010082794480.9%
Gemini 2.5 Flash100100100762780.6%
Ministral 3 14B10010010098380.2%
ByteDance Seed 1.6100100100100080.0%
GPT-4o, May 13th (temp=1)100100100100080.0%
Claude 3.5 Haiku100100100100080.0%
Cohere Command R+ (Aug. 2024)100100100100080.0%
Z.AI GLM 510010087733879.6%
Claude 3.7 Sonnet100100100781678.7%
WizardLM 2 8x22b100100100761377.9%
Ministral 3 3B100100100462975.0%
Rocinante 12B10010075543372.3%
GPT-5.21001008773072.0%
Stealth: Aurora Alpha1009887423271.9%
Z.AI GLM 4.710010082433171.2%
Gemini 3 Flash (Preview)1008875563470.5%
Gemma 3 27B10010072542169.3%
GPT-4o, Aug. 6th (temp=1)10010010046069.2%
GPT-4.1 Nano10010082342568.2%
Qwen 2.5 72B1001009639066.9%
Z.AI GLM 4.7 Flash1007672404065.4%
Claude Sonnet 4.61001008815060.5%
Arcee AI: Trinity Mini1001005439659.7%
Claude 3 Haiku100965448059.5%
Minimax M2.5100897118857.3%
GPT-5837662411455.3%
Z.AI GLM 4.6100100479051.2%
Gemini 2.5 Flash Lite10095445048.9%
GPT-4o, Aug. 6th (temp=0)100595718047.0%
Claude Opus 4.6100754015046.0%
Claude Opus 4.510082255042.5%
Llama 3.1 8B100100110042.2%
Claude Haiku 4.510059360039.1%
GPT-5 Mini10054300036.7%
GPT-4.1 Mini100282822035.6%
Llama 3.1 70B1006430033.3%
GPT-5 Nano900001.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
o4 Mini100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
GPT-4.1100100100100100100.0%
Grok 4 Fast100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Mistral Large 31001001001009899.6%
Mistral Large1001001001009699.1%
Z.AI GLM 4.7 Flash100100100969197.4%
Rocinante 12B1001001001007995.7%
Ministral 3B1001001001007695.3%
GPT-5.2100100100997594.8%
GPT-4o Mini (temp=1)1001001001007194.1%
MoonshotAI: Kimi K2.5100100100947594.0%
WizardLM 2 8x22b100100100887592.7%
Gemma 3 27B100100100828192.6%
Writer: Palmyra X51001001001005791.5%
Mistral Large 2100100100966191.3%
ByteDance Seed 1.61001001001005090.0%
Mistral NeMO1001001001004989.7%
Cohere Command R+ (Aug. 2024)100100100935689.6%
GPT-4o, Aug. 6th (temp=0)1001001001004689.2%
Claude 3 Haiku1001001001004488.7%
ByteDance Seed 1.6 Flash1001001001004188.3%
GPT-4o Mini (temp=0)100100100726988.2%
GPT-4o, May 13th (temp=1)1008888887287.3%
Gemini 2.5 Pro100100100904687.3%
Ministral 3 3B10010092895486.9%
Gemma 3 12B10010097884485.9%
GPT-4.1 Nano1001001001002985.8%
Ministral 8B100100100814485.0%
GPT-4o, Aug. 6th (temp=1)100100100626184.6%
DeepSeek V3.2100100100635984.3%
DeepSeek-V2 Chat100100100625783.8%
Z.AI GLM 510010094665983.7%
Claude Haiku 4.51008886736181.4%
Z.AI GLM 4.710010084655781.3%
Ministral 3 8B100100100100080.0%
Claude Opus 4100100100594079.8%
Claude Opus 4.610010085654779.3%
Llama 3.1 70B1009493821577.0%
Qwen 3.5 Plus (2026-02-15)10010093504176.8%
Z.AI GLM 4.5100100100592376.4%
GPT-5100100100503076.0%
DeepSeek V3 (2025-03-24)10010086642875.5%
Gemini 3 Flash (Preview)10010065594874.3%
GPT-5 Mini10010095552274.3%
Gemma 3 4B10010090493173.9%
Ministral 3 14B1009694443473.5%
Gemini 3 Pro (Preview)1001008473772.9%
Mistral Small Creative10010069642972.4%
Llama 3.1 8B100989761071.1%
DeepSeek V3.110010079413470.7%
Mistral Medium 3.110010010046069.2%
Minimax M2.510010010044068.9%
Claude 3.5 Sonnet1001008939767.0%
Gemini 2.5 Flash Lite10010061432866.3%
Claude Sonnet 4.61001006450062.8%
Grok 481787776062.5%
Gemini 2.5 Flash1009174251761.4%
Claude Sonnet 4.5797357411853.8%
Z.AI GLM 4.6100100553051.6%
Claude 3.7 Sonnet100744333050.0%
Stealth: Aurora Alpha8381697047.9%
Llama 3.1 Nemotron 70B1001002215047.3%
Claude 3.5 Haiku10010030040.6%
Claude Opus 4.5936400031.4%
Claude Sonnet 467312118027.3%
Arcee AI: Trinity Large (Preview)5236200021.7%
GPT-5 Nano522800016.1%