Em-dash & semicolon overuse

Test: Bad Writing Habits

Avg. Score
45.9%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1GPT-5.4 Mini97.4%$0.01516.8s81%
2GPT-5.4 Mini (Reasoning)98.0%$0.02228.1s85%
3GPT-5.4 Mini (Reasoning, Low)95.0%$0.01516.8s72%
4Mistral NeMO95.0%$0.000510.1s63%
5Qwen 3.5 Flash94.4%$0.002547.5s67%
6Qwen 2.5 72B93.0%$0.001036.7s64%
7Qwen 3.5 9B95.9%$0.00111.4m72%
8GPT-4o, May 13th (temp=0)92.9%$0.03514.1s64%
9Qwen 3.5 122B94.6%$0.0251.1m69%
10Qwen 3.5 35B94.3%$0.0181.0m64%
11Gemini 2.5 Flash86.7%$0.005210.6s51%
12Gemini 2.5 Flash Lite82.2%$0.00099.5s48%
13Qwen 3.5 27B94.5%$0.0201.6m66%
14Qwen 3.5 397B A17B98.3%$0.0143.0m78%
15Gemini 2.5 Flash (Reasoning)83.4%$0.01121.5s42%
16Z.AI GLM 4.684.0%$0.006551.5s43%
17Gemini 2.5 Pro83.7%$0.03636.2s50%
18GPT-5.4 (Reasoning, Low)91.3%$0.0551.4m62%
19Inception Mercury82.3%$0.01117.6s30%
20GPT-5.488.6%$0.0491.4m52%
21GPT-4o, Aug. 6th (temp=0)79.3%$0.02322.7s30%
22Gemma 3 27B73.5%$0.000652.6s30%
23Gemini 3.1 Pro (Preview)95.9%$0.1071.8m67%
24Qwen 3.5 Plus (2026-02-15)71.3%$0.006031.5s22%
25Hermes 3 405B73.8%$0.003253.2s24%
26ByteDance Seed 2.0 Lite83.7%$0.0122.2m38%
27GPT-4o Mini (temp=0)66.3%$0.001234.8s24%
28GPT-5.4 (Reasoning)91.6%$0.0892.6m63%
29Claude 3 Haiku60.2%$0.002514.9s12%
30Grok 4.20 (Beta)64.3%$0.01815.8s9%
31Gemini 2.5 Flash Lite (Reasoning)50.7%$0.002830.8s15%
32Stealth: Aurora Alpha50.9%$0.00009.8s8%
33Inception Mercury 252.0%$0.00327.0s7%
34Z.AI GLM 4.764.3%$0.0101.4m17%
35Grok 4.20 (Beta, Reasoning)69.7%$0.03934.0s10%
36Llama 3.1 70B54.4%$0.001529.4s6%
37Cohere Command R+ (Aug. 2024)60.2%$0.02052.5s12%
38DeepSeek V3.260.4%$0.00141.9m19%
39DeepSeek V3.159.9%$0.00201.8m19%
40Hermes 3 70B57.7%$0.00101.2m11%
41Z.AI GLM 5 Turbo55.8%$0.008133.2s6%
42Aion 2.055.9%$0.00641.3m13%
43GPT-5.4 Nano (Reasoning, Low)47.6%$0.005520.6s3%
44Stealth: Healer Alpha41.4%$0.000023.7s7%
45GPT-5.4 Nano46.2%$0.005726.3s4%
46GPT-5.4 Nano (Reasoning)45.9%$0.006124.5s3%
47Rocinante 12B46.1%$0.001438.4s4%
48Gemma 3 12B41.2%$0.000441.3s8%
49Z.AI GLM 4.7 Flash47.2%$0.00171.2m8%
50GPT-5.169.9%$0.0541.8m18%
51Mistral Small 3.2 24B88.0%$0.00695.7m43%
52WizardLM 2 8x22b54.5%$0.00261.8m11%
53Arcee AI: Trinity Mini32.0%$0.00039.2s2%
54Gemini 3 Flash (Preview, Reasoning)41.5%$0.01230.1s4%
55Arcee AI: Trinity Large (Preview)39.9%$0.000043.6s3%
56MiniMax M2.746.7%$0.00401.1m3%
57Llama 3.1 Nemotron 70B34.5%$0.003831.7s3%
58o4 Mini38.1%$0.01525.7s2%
59GPT-4o, May 13th (temp=1)40.0%$0.03314.4s5%
60Stealth: Hunter Alpha36.1%$0.000055.0s5%
61Claude 3.5 Haiku26.7%$0.003510.8s0%
62Claude Sonnet 4.546.5%$0.03538.1s2%
63o4 Mini High42.8%$0.02547.2s4%
64Gemini 3.1 Flash Lite (Preview)24.7%$0.00308.4s0%
65Ministral 3 3B22.9%$0.000511.1s0%
66Z.AI GLM 541.9%$0.00841.2m0%
67Gemini 3 Flash (Preview)25.7%$0.007819.6s0%
68Claude Haiku 4.527.6%$0.01121.6s0%
69Z.AI GLM 4.528.3%$0.005142.1s2%
70Ministral 3B17.9%$0.00018.1s0%
71Ministral 3 8B20.2%$0.000819.6s0%
72Mistral Large 323.8%$0.003330.3s0%
73LFM2 24B20.6%$0.000228.4s0%
74Nemotron 3 Nano32.5%$0.00101.1m0%
75Mistral Large 225.7%$0.01329.4s0%
76GPT-5.256.0%$0.0561.5m7%
77Ministral 8B14.0%$0.000410.4s0%
78Claude Sonnet 4.636.5%$0.03139.3s0%
79DeepSeek V3 (2024-12-26)27.1%$0.002154.6s0%
80Mistral Large25.4%$0.01430.9s0%
81Grok 4.1 Fast21.2%$0.001837.8s0%
82Mistral Small Creative12.2%$0.00079.1s0%
83GPT-569.5%$0.0652.8m18%
84Claude Sonnet 436.4%$0.03243.7s0%
85DeepSeek-V2 Chat24.3%$0.002153.3s1%
86Claude 3.5 Sonnet40.3%$0.04835.5s0%
87Llama 3.1 8B29.3%$0.00031.3m0%
88GPT-5 Mini27.9%$0.010057.4s0%
89Gemma 3 4B11.5%$0.000220.0s0%
90Ministral 3 14B8.0%$0.000711.7s0%
91MiniMax M2.524.0%$0.00341.3m0%
92GPT-4o Mini (temp=1)10.0%$0.001234.8s0%
93GPT-4.1 Mini5.0%$0.002719.0s0%
94Mistral Small 44.0%$0.001418.2s0%
95GPT-4.1 Nano2.2%$0.000713.3s0%
96Writer: Palmyra X58.0%$0.01122.0s0%
97Mistral Medium 3.19.1%$0.004836.5s0%
98ByteDance Seed 1.6 Flash4.7%$0.001327.3s0%
99Qwen 3 32B12.3%$0.001554.6s0%
100Gemini 3 Pro (Preview)33.2%$0.05554.4s3%
101Mistral Small 4 (Reasoning)4.2%$0.002230.2s0%
102DeepSeek V3 (2025-03-24)6.3%$0.001439.4s0%
103Grok 4 Fast1.4%$0.001724.1s0%
104Nemotron 3 Super17.5%$0.00001.4m0%
105GPT-4o, Aug. 6th (temp=1)7.0%$0.01824.4s0%
106Qwen3 235B A22B Instruct 25076.9%$0.001159.2s0%
107Claude Opus 4.536.6%$0.07053.4s0%
108Claude Sonnet 4.6 (Reasoning)33.3%$0.0601.2m0%
109GPT-4.15.6%$0.01844.7s0%
110Claude 3.7 Sonnet9.6%$0.04246.7s0%
111Claude Opus 4.633.4%$0.0781.2m0%
112GPT-5 Nano0.8%$0.00421.4m0%
113MoonshotAI: Kimi K2.537.7%$0.0193.2m0%
114Claude Opus 4.6 (Reasoning)33.9%$0.0881.4m0%
115ByteDance Seed 2.0 Mini42.2%$0.00454.9m3%
116Grok 45.5%$0.0481.7m0%
117ByteDance Seed 1.60.0%$0.0132.5m0%
118Claude Opus 434.6%$0.2091.4m0%
45.85%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-5.4100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Inception Mercury100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-5.4 Nano1001001001009899.5%
GPT-4o, Aug. 6th (temp=0)1001001001009799.4%
Qwen 3.5 Plus (2026-02-15)1001001001009498.8%
Z.AI GLM 51001001001006993.8%
GPT-5.4 Nano (Reasoning)1001001001005591.0%
Stealth: Aurora Alpha100100100706286.4%
Z.AI GLM 4.7100100100991883.3%
Claude Sonnet 4.6 (Reasoning)100100100100080.0%
Claude Sonnet 4.6100100100100080.0%
Claude 3 Haiku100100100100079.9%
GPT-4o Mini (temp=0)10010091831878.4%
Gemini 2.5 Pro100100100503877.6%
Claude Opus 4100100100641976.5%
Z.AI GLM 4.610010075653875.5%
o4 Mini High10010096661174.6%
Gemini 2.5 Flash100100100372973.1%
Gemini 2.5 Flash (Reasoning)10010057553870.1%
Gemini 3 Flash (Preview, Reasoning)10010010038067.6%
Gemini 2.5 Flash Lite10010060491965.4%
MiniMax M2.71001001000060.0%
Claude Sonnet 41001001000060.0%
Hermes 3 70B1001001000060.0%
Gemma 3 27B1006158561357.7%
GPT-5 Mini1008371221057.1%
DeepSeek V3.2100956724057.1%
ByteDance Seed 2.0 Mini100100739056.4%
Inception Mercury 2100797623055.7%
Stealth: Healer Alpha10096780054.9%
Nemotron 3 Nano10083797053.9%
GPT-4o, May 13th (temp=1)100696611049.0%
Llama 3.1 70B100100430048.5%
DeepSeek V3.1864843402548.4%
Gemini 2.5 Flash Lite (Reasoning)78545250046.6%
MoonshotAI: Kimi K2.5100100200044.1%
o4 Mini100613720043.6%
WizardLM 2 8x22b7866546040.6%
Z.AI GLM 4.7 Flash10060430040.4%
MiniMax M2.510010000040.0%
Claude 3.5 Haiku10010000040.0%
Hermes 3 405B10010000040.0%
Cohere Command R+ (Aug. 2024)10010000040.0%
Llama 3.1 8B1009200038.4%
Claude Haiku 4.51007600035.3%
Stealth: Hunter Alpha100272521034.6%
Aion 2.073502616033.1%
Llama 3.1 Nemotron 70B10048130032.1%
Claude 3.5 Sonnet905900029.8%
Rocinante 12B1004300028.6%
DeepSeek-V2 Chat803600023.2%
Arcee AI: Trinity Large (Preview)484400018.5%
DeepSeek V3 (2024-12-26)76700016.6%
GPT-4.178000015.7%
Qwen 3 32B67900015.1%
Z.AI GLM 4.5432400013.5%
Gemini 3.1 Flash Lite (Preview)471630013.2%
Gemma 3 12B52200010.7%
Arcee AI: Trinity Mini25240009.8%
Grok 4.1 Fast3200006.4%
Gemini 3 Pro (Preview)3100006.2%
GPT-4o, Aug. 6th (temp=1)3000006.0%
Mistral Large 22200004.4%
Nemotron 3 Super1400002.8%
Mistral Large 31100002.2%
Claude 3.7 Sonnet300000.7%
ByteDance Seed 1.6000000.0%
Grok 4000000.0%
Grok 4 Fast000000.0%
Gemini 3 Flash (Preview)000000.0%
GPT-4.1 Mini000000.0%
GPT-5 Nano000000.0%
Mistral Small 4 (Reasoning)000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Mistral Large000000.0%
Qwen3 235B A22B Instruct 2507000000.0%
Writer: Palmyra X5000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Medium 3.1000000.0%
Mistral Small 4000000.0%
ByteDance Seed 1.6 Flash000000.0%
Mistral Small Creative000000.0%
Ministral 3 14B000000.0%
GPT-4.1 Nano000000.0%
Ministral 3 8B000000.0%
Gemma 3 4B000000.0%
Ministral 3 3B000000.0%
Ministral 8B000000.0%
Ministral 3B000000.0%
LFM2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-5.4100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
GPT-5.4 Nano100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)1001001001009999.8%
ByteDance Seed 2.0 Mini100100100969497.9%
Gemini 2.5 Flash Lite1001001001008396.7%
Gemini 2.5 Flash1001001001008096.0%
Gemini 2.5 Flash (Reasoning)1001001001007093.9%
Gemini 2.5 Pro1001001001006893.7%
Hermes 3 405B10010099947593.5%
Gemma 3 12B100100100946391.3%
Z.AI GLM 4.6100100100866690.5%
Z.AI GLM 4.7 Flash100100100994789.2%
MoonshotAI: Kimi K2.51001001001002584.9%
DeepSeek V3.2100100100851981.0%
Grok 4.1 Fast100100100100080.0%
MiniMax M2.7100100100100080.0%
Claude Sonnet 4.5100100100100080.0%
Rocinante 12B10010010096079.1%
Hermes 3 70B10010010090078.0%
Gemma 3 27B100100100711877.8%
Claude Opus 410010010087077.4%
Claude 3.5 Sonnet100100100701176.2%
Gemini 3 Flash (Preview, Reasoning)10010068595075.3%
Z.AI GLM 4.7948977683673.0%
Arcee AI: Trinity Large (Preview)1001009341066.8%
DeepSeek V3.11008977292463.8%
Claude 3.5 Haiku10010010013062.6%
Aion 2.0100835754760.2%
o4 Mini100985250060.0%
Inception Mercury100100879059.2%
WizardLM 2 8x22b100100932059.0%
o4 Mini High1001007412057.3%
Z.AI GLM 4.51001004935257.1%
GPT-5 Mini100964641056.5%
Llama 3.1 8B10098750054.6%
Stealth: Hunter Alpha91665946753.8%
Llama 3.1 70B100100680053.5%
Gemini 3 Pro (Preview)10086770052.6%
GPT-4o, May 13th (temp=1)81815241051.0%
GPT-4o Mini (temp=0)10080530046.5%
Gemini 2.5 Flash Lite (Reasoning)83794211042.9%
Cohere Command R+ (Aug. 2024)10066440042.0%
Llama 3.1 Nemotron 70B10010070041.4%
Gemini 3 Flash (Preview)64383717031.1%
Claude 3 Haiku1005200030.4%
Gemini 3.1 Flash Lite (Preview)776380029.8%
Inception Mercury 21004700029.3%
Mistral Large5432320023.5%
DeepSeek V3 (2024-12-26)624100020.6%
Mistral Small 4100000020.0%
Stealth: Aurora Alpha84000016.8%
MiniMax M2.574000014.9%
Mistral Small 4 (Reasoning)64000012.8%
GPT-4.157000011.4%
Ministral 3 3B51300010.8%
Qwen 3 32B3290008.2%
Nemotron 3 Nano17110005.6%
Ministral 8B2600005.2%
Ministral 3 14B1400002.8%
Claude 3.7 Sonnet1200002.5%
DeepSeek-V2 Chat1200002.4%
ByteDance Seed 1.6000000.0%
Grok 4000000.0%
Grok 4 Fast000000.0%
Mistral Large 3000000.0%
Nemotron 3 Super000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-5 Nano000000.0%
Mistral Large 2000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Qwen3 235B A22B Instruct 2507000000.0%
Writer: Palmyra X5000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Medium 3.1000000.0%
ByteDance Seed 1.6 Flash000000.0%
Mistral Small Creative000000.0%
GPT-4.1 Nano000000.0%
Ministral 3 8B000000.0%
Arcee AI: Trinity Mini000000.0%
Gemma 3 4B000000.0%
Ministral 3B000000.0%
LFM2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Inception Mercury 2100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
GPT-5.4 Nano100100100100100100.0%
Mistral NeMO100100100100100100.0%
Grok 4.20 (Beta)1001001001009598.9%
ByteDance Seed 2.0 Lite1001001001009398.5%
Aion 2.01001001001009298.3%
Gemini 2.5 Flash Lite1001001001008997.8%
Gemini 2.5 Flash (Reasoning)1001001001008897.6%
Qwen 2.5 72B1001001001008396.7%
Rocinante 12B100100100958596.0%
Arcee AI: Trinity Large (Preview)100100100938495.4%
MiniMax M2.51001001001007695.3%
Hermes 3 405B1001001001007695.3%
Z.AI GLM 4.510010094858392.6%
WizardLM 2 8x22b10010096897592.0%
MoonshotAI: Kimi K2.5100100100827691.7%
Claude Opus 41001001001005891.6%
o4 Mini10010092787889.6%
Inception Mercury1001001001004889.6%
Claude 3.5 Haiku1001001001003987.8%
Z.AI GLM 4.7 Flash100100100953786.4%
GPT-4o, May 13th (temp=0)1001001001003286.4%
Claude 3 Haiku1001001001003086.0%
Claude Haiku 4.5100100100744784.1%
Hermes 3 70B100100100100080.0%
MiniMax M2.71001009898079.3%
Stealth: Aurora Alpha100100100602977.9%
ByteDance Seed 2.0 Mini100100100691777.1%
Gemma 3 27B1001009971074.0%
Cohere Command R+ (Aug. 2024)10010066545174.0%
Ministral 3 3B10010010070073.9%
Qwen3 235B A22B Instruct 25071008883681971.7%
Gemini 3.1 Flash Lite (Preview)1001007575069.9%
GPT-4o, May 13th (temp=1)100100100262169.4%
Stealth: Healer Alpha100100100271869.0%
Gemini 3 Flash (Preview)10010077282666.3%
GPT-4o Mini (temp=0)1001008546066.2%
GPT-5 Mini1001006857065.0%
o4 Mini High1001008735064.4%
Arcee AI: Trinity Mini10010010016063.2%
Stealth: Hunter Alpha1007765432261.4%
LFM2 24B100877039059.1%
Mistral Large 3826355544259.0%
Llama 3.1 70B100100930058.5%
DeepSeek V3 (2024-12-26)1001007314057.5%
Gemini 3 Pro (Preview)835851493454.9%
Nemotron 3 Nano100100660053.1%
Gemini 2.5 Flash Lite (Reasoning)10086760052.5%
Writer: Palmyra X5100983325251.6%
Qwen 3 32B100727114051.5%
Llama 3.1 Nemotron 70B10089680051.3%
Gemma 3 12B100754632050.7%
Mistral Small Creative100775911550.3%
Nemotron 3 Super10091590050.0%
DeepSeek-V2 Chat100894017049.2%
Grok 4100736013049.1%
GPT-4.11001003013048.7%
Claude 3.7 Sonnet100100280045.6%
Mistral Large 261565544243.9%
Mistral Large10059550042.8%
Ministral 3B68603734039.8%
Ministral 3 8B6451397533.3%
Ministral 3 14B1005670032.6%
Ministral 8B85372812032.3%
Mistral Small 4 (Reasoning)854300025.7%
Mistral Medium 3.1703800021.7%
DeepSeek V3 (2025-03-24)100000020.0%
GPT-4.1 Mini4532220019.9%
ByteDance Seed 1.6 Flash692700019.2%
Gemma 3 4B593700019.2%
Llama 3.1 8B542400015.5%
Mistral Small 468800015.1%
GPT-4o Mini (temp=1)66000013.3%
Grok 4 Fast1500003.0%
ByteDance Seed 1.6000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-5 Nano000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Aion 2.0100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Stealth: Healer Alpha100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-5.4100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
GPT-5.4 Nano100100100100100100.0%
Gemini 2.5 Flash (Reasoning)1001001001009599.0%
Mistral Small 3.2 24B1001001001009398.5%
Gemini 2.5 Flash1001001001009298.4%
GPT-4o, Aug. 6th (temp=0)1001001001007595.0%
WizardLM 2 8x22b100100100887893.1%
Grok 4.1 Fast1001001001006392.6%
Cohere Command R+ (Aug. 2024)100100100797190.2%
Hermes 3 405B1001001001004488.9%
MiniMax M2.71001001001004488.8%
GPT-4o, May 13th (temp=0)100100100736387.2%
Gemini 3 Flash (Preview, Reasoning)100100100706086.1%
DeepSeek V3.210010081777286.0%
Claude Haiku 4.5100100100893985.6%
o4 Mini100100100913284.6%
Claude Opus 4100100100595783.2%
Inception Mercury 210010086636282.2%
Gemma 3 27B100100100951181.0%
Claude Sonnet 4.6 (Reasoning)100100100100080.0%
Claude 3 Haiku100100100100080.0%
Mistral NeMO100100100100080.0%
Gemini 2.5 Flash Lite10010089653978.5%
GPT-5 Mini100100100821078.4%
DeepSeek V3.110010010079075.7%
Nemotron 3 Nano10010010062072.5%
Inception Mercury10010010062072.4%
Rocinante 12B100100100441171.1%
ByteDance Seed 2.0 Mini1001008270070.3%
Claude 3.5 Sonnet10010010043569.4%
MiniMax M2.510010092361669.0%
Gemini 2.5 Flash Lite (Reasoning)1001007466568.9%
GPT-4o, May 13th (temp=1)1008673701368.3%
Claude 3.5 Haiku10010061442966.8%
Llama 3.1 70B1001001007061.4%
Mistral Large 3100916254061.3%
Hermes 3 70B1001001000060.0%
Stealth: Aurora Alpha1001005931057.9%
Llama 3.1 8B100100790055.9%
Stealth: Hunter Alpha100100770055.4%
Llama 3.1 Nemotron 70B1007368161353.9%
Z.AI GLM 4.51001004415953.7%
Gemini 3.1 Flash Lite (Preview)10087700051.4%
GPT-4o Mini (temp=0)100834816049.4%
Mistral Large 2945540381648.5%
Nemotron 3 Super66666130044.5%
Gemini 3 Flash (Preview)73624732042.7%
GPT-4o, Aug. 6th (temp=1)1009900039.7%
Gemini 3 Pro (Preview)10071142037.3%
Arcee AI: Trinity Mini10059220036.2%
Mistral Large10032320032.8%
Arcee AI: Trinity Large (Preview)745800026.4%
DeepSeek-V2 Chat1002722026.3%
ByteDance Seed 1.6 Flash794200024.2%
LFM2 24B664600022.3%
DeepSeek V3 (2024-12-26)702750020.5%
Gemma 3 12B534800020.2%
Ministral 3 3B543730018.8%
Qwen 3 32B78400016.4%
Ministral 3 8B73000014.6%
Writer: Palmyra X529140008.4%
Qwen3 235B A22B Instruct 250729130008.3%
GPT-4.1 Mini3500007.0%
DeepSeek V3 (2025-03-24)3500007.0%
Mistral Small Creative2732006.5%
Gemma 3 4B3100006.2%
Claude 3.7 Sonnet2600005.2%
Ministral 3B1900003.7%
Ministral 8B1700003.4%
Mistral Medium 3.1100000.3%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
Grok 4000000.0%
Grok 4 Fast000000.0%
GPT-5 Nano000000.0%
Mistral Small 4 (Reasoning)000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Small 4000000.0%
Ministral 3 14B000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Claude Opus 4100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-5.4100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
GPT-5.4 Nano100100100100100100.0%
Aion 2.01001001001009999.8%
Gemini 2.5 Flash1001001001009999.8%
ByteDance Seed 2.0 Mini1001001001009899.6%
Gemini 3 Flash (Preview, Reasoning)1001001001009699.2%
Z.AI GLM 4.71001001001008997.9%
Z.AI GLM 4.61001001001008797.5%
Qwen 2.5 72B1001001001008596.9%
Stealth: Hunter Alpha100100100938996.4%
o4 Mini1001001001008196.3%
Grok 4.20 (Beta, Reasoning)1001001001007995.8%
Gemini 2.5 Pro10010097928695.1%
Inception Mercury 2100100100837491.5%
Hermes 3 405B1001001001005190.1%
GPT-4o Mini (temp=0)100100100945590.0%
Mistral Small 3.2 24B100100100943986.7%
Mistral NeMO100100100874586.6%
GPT-4o, May 13th (temp=0)100100100715785.5%
Claude Sonnet 41001001001002184.3%
Llama 3.1 Nemotron 70B10010097883583.9%
Gemini 2.5 Flash Lite1001001001001482.8%
Z.AI GLM 4.7 Flash10010079675780.4%
Grok 4.1 Fast100100100100080.0%
Inception Mercury100100100100080.0%
Gemma 3 12B10010077744379.0%
GPT-5 Mini1008774696178.1%
Cohere Command R+ (Aug. 2024)10010010082076.4%
DeepSeek V3.11008673685476.3%
Hermes 3 70B10010010079075.9%
DeepSeek V3.2100100100433475.3%
Stealth: Healer Alpha1001009269072.1%
o4 Mini High10010010060072.0%
WizardLM 2 8x22b10010010054070.9%
Stealth: Aurora Alpha10010094441470.4%
Llama 3.1 70B1001009848069.0%
MiniMax M2.510010010034066.9%
Claude 3 Haiku10010010029366.4%
Gemini 2.5 Flash Lite (Reasoning)1001008250066.4%
Gemma 3 27B1008961561964.9%
DeepSeek V3 (2024-12-26)978965332762.1%
Arcee AI: Trinity Large (Preview)1001005554061.9%
Gemini 3 Pro (Preview)100726965061.2%
DeepSeek-V2 Chat1008173261558.9%
Ministral 3 3B98845243055.3%
Mistral Large 2896843401951.6%
Nemotron 3 Nano10096500049.2%
Mistral Large10065630045.5%
Ministral 8B100653922045.3%
Ministral 3 8B98544716343.5%
Z.AI GLM 4.5100632514842.2%
Claude Haiku 4.510063410040.7%
Rocinante 12B10010000040.0%
Claude 3.5 Haiku6844410030.6%
Writer: Palmyra X56053255028.4%
Llama 3.1 8B10029100027.7%
Gemini 3.1 Flash Lite (Preview)7038200025.6%
Mistral Large 36350114025.4%
Qwen 3 32B6336240024.5%
Gemini 3 Flash (Preview)6346120024.2%
GPT-4o Mini (temp=1)793200022.2%
Ministral 3 14B6025185021.6%
Claude 3.7 Sonnet100500021.0%
Grok 4 Fast100000020.0%
GPT-4o, May 13th (temp=1)92700019.7%
Ministral 3B553600018.2%
Mistral Small Creative3733111016.5%
Arcee AI: Trinity Mini3732100015.7%
Qwen3 235B A22B Instruct 2507531100012.8%
LFM2 24B60100012.2%
ByteDance Seed 1.6 Flash29200009.9%
GPT-4.1 Mini3400006.8%
Nemotron 3 Super3100006.2%
Mistral Medium 3.12900005.8%
Gemma 3 4B2300004.5%
GPT-4.11440003.7%
DeepSeek V3 (2025-03-24)1100002.2%
ByteDance Seed 1.6000000.0%
Grok 4000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-5 Nano000000.0%
Mistral Small 4 (Reasoning)000000.0%
Mistral Small 4000000.0%
GPT-4.1 Nano000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
GPT-5.4100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Inception Mercury100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
GPT-5.4 Nano100100100100100100.0%
Mistral NeMO1001001001009498.7%
Gemini 2.5 Pro1001001001008897.6%
Qwen 2.5 72B100100100979097.3%
Qwen 3.5 Plus (2026-02-15)100100100957794.4%
Gemini 2.5 Flash (Reasoning)100100100967594.2%
Z.AI GLM 4.61001001001006893.5%
Aion 2.01001001001006593.0%
Gemma 3 27B1009494826086.2%
DeepSeek V3.2100100100963586.2%
ByteDance Seed 2.0 Mini100100100686286.0%
GPT-5 Mini100100100882081.6%
Gemini 3 Flash (Preview, Reasoning)100100100100080.0%
Claude 3.5 Sonnet100100100100080.0%
Claude 3.5 Haiku10010082793378.9%
GPT-4o Mini (temp=0)1009883761674.9%
Mistral Small 3.2 24B1001009875074.5%
Stealth: Healer Alpha1008760584870.6%
Claude Opus 410010010053070.6%
Claude Haiku 4.510010010046069.2%
Z.AI GLM 4.7 Flash1001009145768.5%
Gemini 2.5 Flash Lite1008776581667.5%
DeepSeek V3.110010074451266.2%
Llama 3.1 70B1001009021062.2%
Stealth: Hunter Alpha100100960059.2%
Hermes 3 405B100100880057.6%
MiniMax M2.51001006312054.9%
Inception Mercury 21006946311752.6%
o4 Mini93745935052.1%
Llama 3.1 Nemotron 70B10076713050.3%
Claude 3 Haiku100100372248.2%
WizardLM 2 8x22b10081591048.2%
Gemini 3 Pro (Preview)9778556047.1%
Rocinante 12B10099310046.0%
o4 Mini High100594525045.7%
Cohere Command R+ (Aug. 2024)100100130042.6%
Stealth: Aurora Alpha1009100038.1%
Hermes 3 70B8156290033.1%
Gemini 2.5 Flash Lite (Reasoning)7151340031.3%
Gemini 3 Flash (Preview)73362813029.9%
Z.AI GLM 4.597251312029.4%
Mistral Large6549290028.5%
Ministral 3 8B875110027.9%
Arcee AI: Trinity Large (Preview)855400027.8%
Gemma 3 12B6752140026.7%
Grok 4.1 Fast100000020.0%
DeepSeek V3 (2024-12-26)100000020.0%
Llama 3.1 8B100000020.0%
Writer: Palmyra X54528270020.0%
DeepSeek-V2 Chat97000019.4%
Ministral 3B88000017.5%
Nemotron 3 Nano503100016.1%
Qwen 3 32B2523220013.9%
Nemotron 3 Super63000012.7%
Arcee AI: Trinity Mini63000012.7%
Gemini 3.1 Flash Lite (Preview)471000011.4%
GPT-4o, May 13th (temp=1)52000010.3%
Mistral Small Creative4300008.5%
Ministral 3 3B3900007.8%
Claude 3.7 Sonnet2920006.1%
Ministral 3 14B1900003.7%
Mistral Medium 3.11800003.6%
ByteDance Seed 1.6 Flash1500003.0%
Qwen3 235B A22B Instruct 25071200002.4%
Ministral 8B1000002.0%
ByteDance Seed 1.6000000.0%
GPT-4.1000000.0%
Grok 4000000.0%
Grok 4 Fast000000.0%
Mistral Large 3000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-5 Nano000000.0%
Mistral Large 2000000.0%
Mistral Small 4 (Reasoning)000000.0%
DeepSeek V3 (2025-03-24)000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Small 4000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
LFM2 24B000000.0%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Mistral Small 3.2 24B100100100100100100.0%
GPT-4o, May 13th (temp=0)1001001001009498.9%
GPT-5.4 Mini (Reasoning)100100100988095.5%
GPT-4o, Aug. 6th (temp=0)100100100986893.2%
Mistral NeMO10010089888592.5%
Qwen 2.5 72B100100100837191.0%
Llama 3.1 70B100100100735685.9%
GPT-4o Mini (temp=0)100100100713882.0%
Gemini 2.5 Flash1009791724781.4%
Claude 3 Haiku100100100633078.7%
Qwen 3.5 397B A17B10010010081076.1%
GPT-5.4 Mini1009582633875.7%
Qwen 3.5 122B10010087542372.9%
Qwen 3.5 9B10010097491271.6%
Gemini 2.5 Flash Lite10010088343471.3%
GPT-5.4 (Reasoning, Low)908860514566.9%
GPT-4o, May 13th (temp=1)100858060065.0%
Gemma 3 27B10010010010262.3%
Gemini 3.1 Pro (Preview)1009961201959.8%
Gemini 2.5 Pro89756361057.7%
Qwen 3.5 Flash85816260057.3%
GPT-5.4 Mini (Reasoning, Low)1007567321257.0%
Cohere Command R+ (Aug. 2024)100100830056.5%
Qwen 3.5 35B1001006115055.3%
Llama 3.1 Nemotron 70B83716341051.8%
Hermes 3 405B1001003513049.6%
DeepSeek V3.2100824314047.9%
Gemini 2.5 Flash (Reasoning)1001002012046.3%
Qwen 3.5 27B95634032046.1%
Inception Mercury100100250045.0%
GPT-5.4 (Reasoning)776541251344.2%
Hermes 3 70B10080150039.0%
Z.AI GLM 4.67669400036.8%
ByteDance Seed 2.0 Lite73562924036.4%
WizardLM 2 8x22b8348372234.5%
Qwen 3.5 Plus (2026-02-15)1006400032.8%
Gemini 2.5 Flash Lite (Reasoning)604830022.0%
Arcee AI: Trinity Mini762900021.0%
DeepSeek V3.1841900020.5%
MiniMax M2.7100000020.0%
Claude 3.5 Sonnet94000018.9%
GPT-5.4333073014.6%
GPT-4o Mini (temp=1)71000014.1%
Gemini 3.1 Flash Lite (Preview)67000013.3%
Rocinante 12B561000013.2%
ByteDance Seed 2.0 Mini4900009.7%
Ministral 3 8B30110008.3%
DeepSeek V3 (2024-12-26)3600007.2%
Z.AI GLM 4.72780006.9%
Stealth: Hunter Alpha3200006.3%
Gemma 3 12B2900005.7%
o4 Mini High2500004.9%
Gemini 3 Pro (Preview)2300004.6%
GPT-5.11560004.1%
Stealth: Aurora Alpha1900003.7%
Aion 2.01060003.1%
Arcee AI: Trinity Large (Preview)1300002.6%
Gemini 3 Flash (Preview, Reasoning)1100002.1%
Qwen 3 32B1000002.0%
GPT-5700001.4%
DeepSeek-V2 Chat400000.8%
Claude Opus 4.6 (Reasoning)000000.0%
Z.AI GLM 5 Turbo000000.0%
Claude Sonnet 4.6 (Reasoning)000000.0%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
Grok 4.20 (Beta, Reasoning)000000.0%
Z.AI GLM 5000000.0%
Claude Sonnet 4.6000000.0%
MoonshotAI: Kimi K2.5000000.0%
ByteDance Seed 1.6000000.0%
GPT-5.2000000.0%
Claude Opus 4.5000000.0%
Grok 4.1 Fast000000.0%
Claude Sonnet 4000000.0%
MiniMax M2.5000000.0%
GPT-4.1000000.0%
o4 Mini000000.0%
Grok 4000000.0%
Claude Sonnet 4.5000000.0%
Claude Opus 4000000.0%
Z.AI GLM 4.5000000.0%
Grok 4 Fast000000.0%
Stealth: Healer Alpha000000.0%
Mistral Large 3000000.0%
Gemini 3 Flash (Preview)000000.0%
Claude Haiku 4.5000000.0%
Z.AI GLM 4.7 Flash000000.0%
Nemotron 3 Super000000.0%
Grok 4.20 (Beta)000000.0%
Inception Mercury 2000000.0%
Claude 3.5 Haiku000000.0%
Claude 3.7 Sonnet000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-5 Nano000000.0%
Mistral Large 2000000.0%
Mistral Small 4 (Reasoning)000000.0%
DeepSeek V3 (2025-03-24)000000.0%
GPT-5.4 Nano (Reasoning)000000.0%
Mistral Large000000.0%
Qwen3 235B A22B Instruct 2507000000.0%
Writer: Palmyra X5000000.0%
GPT-5.4 Nano (Reasoning, Low)000000.0%
Mistral Medium 3.1000000.0%
Nemotron 3 Nano000000.0%
Mistral Small 4000000.0%
GPT-5.4 Nano000000.0%
ByteDance Seed 1.6 Flash000000.0%
Mistral Small Creative000000.0%
Ministral 3 14B000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
Ministral 3 3B000000.0%
Ministral 8B000000.0%
Llama 3.1 8B000000.0%
Ministral 3B000000.0%
LFM2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral NeMO100100100100100100.0%
Qwen 3.5 397B A17B1001001001006492.8%
ByteDance Seed 2.0 Lite100100100937192.7%
Gemma 3 27B100100100937092.5%
Qwen 3.5 9B100100100976392.0%
GPT-5.4 Mini (Reasoning, Low)10010097866890.2%
GPT-4o, Aug. 6th (temp=0)1001001001004989.8%
Qwen 2.5 72B1001001001004488.9%
GPT-5.4 Mini (Reasoning)10010094826287.6%
Gemini 2.5 Flash100100100755686.2%
Qwen 3.5 Flash1009291785984.1%
GPT-5.4 (Reasoning)10010078655880.3%
Mistral Small 3.2 24B100100100100080.0%
Qwen 3.5 35B1008583733875.7%
Llama 3.1 70B1001009086075.1%
Qwen 3.5 27B10010077593874.8%
Z.AI GLM 4.6100949384074.1%
GPT-4o, May 13th (temp=0)1008666594471.1%
Qwen 3.5 122B1009078612470.6%
Gemini 2.5 Pro1008457562965.2%
GPT-5.4 (Reasoning, Low)1007845433961.2%
WizardLM 2 8x22b9894910056.5%
Gemini 2.5 Flash Lite746756512755.0%
Inception Mercury100100680053.6%
GPT-5.4747448422452.6%
Claude 3 Haiku100100547052.1%
Cohere Command R+ (Aug. 2024)100100508051.6%
Gemini 2.5 Flash (Reasoning)100943811349.3%
Hermes 3 70B10076620047.8%
DeepSeek V3.1100723320946.8%
Hermes 3 405B10078510045.7%
Gemma 3 12B796533321645.0%
Grok 4.20 (Beta)1009100038.2%
GPT-4o Mini (temp=0)7669440037.7%
Arcee AI: Trinity Large (Preview)1007000033.9%
Stealth: Hunter Alpha8542260030.8%
Aion 2.060363224030.3%
Ministral 3 3B6655260029.4%
DeepSeek V3.2393433231328.4%
GPT-567341614326.9%
Gemini 2.5 Flash Lite (Reasoning)7638130025.4%
Stealth: Healer Alpha4543277024.6%
Rocinante 12B575390023.8%
Claude Sonnet 4.5653000019.1%
GPT-5.136251712018.0%
Arcee AI: Trinity Mini444400017.7%
Grok 4.20 (Beta, Reasoning)79000015.9%
Qwen 3.5 Plus (2026-02-15)78000015.7%
Llama 3.1 Nemotron 70B73000014.6%
Llama 3.1 8B521900014.2%
Z.AI GLM 5 Turbo323000012.4%
Stealth: Aurora Alpha53000010.6%
Ministral 8B23140007.4%
GPT-4o, May 13th (temp=1)3400006.9%
GPT-5.23400006.8%
Mistral Small Creative17151006.6%
Z.AI GLM 4.72800005.6%
Z.AI GLM 4.7 Flash2800005.5%
Z.AI GLM 52500005.0%
Inception Mercury 21800003.5%
DeepSeek V3 (2024-12-26)1800003.5%
Claude Opus 41700003.4%
DeepSeek-V2 Chat400000.8%
Qwen 3 32B100000.3%
Claude 3.7 Sonnet100000.2%
Claude Opus 4.6 (Reasoning)000000.0%
Claude Sonnet 4.6 (Reasoning)000000.0%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
Claude Sonnet 4.6000000.0%
MoonshotAI: Kimi K2.5000000.0%
ByteDance Seed 1.6000000.0%
Gemini 3 Flash (Preview, Reasoning)000000.0%
o4 Mini High000000.0%
Claude Opus 4.5000000.0%
Grok 4.1 Fast000000.0%
MiniMax M2.7000000.0%
Gemini 3 Pro (Preview)000000.0%
Claude Sonnet 4000000.0%
MiniMax M2.5000000.0%
GPT-4.1000000.0%
o4 Mini000000.0%
Grok 4000000.0%
ByteDance Seed 2.0 Mini000000.0%
Z.AI GLM 4.5000000.0%
Grok 4 Fast000000.0%
Gemini 3.1 Flash Lite (Preview)000000.0%
Mistral Large 3000000.0%
Gemini 3 Flash (Preview)000000.0%
Claude Haiku 4.5000000.0%
Nemotron 3 Super000000.0%
Claude 3.5 Sonnet000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-5 Nano000000.0%
Mistral Large 2000000.0%
Mistral Small 4 (Reasoning)000000.0%
DeepSeek V3 (2025-03-24)000000.0%
GPT-5.4 Nano (Reasoning)000000.0%
Mistral Large000000.0%
Qwen3 235B A22B Instruct 2507000000.0%
Writer: Palmyra X5000000.0%
GPT-5.4 Nano (Reasoning, Low)000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Medium 3.1000000.0%
Nemotron 3 Nano000000.0%
Mistral Small 4000000.0%
GPT-5.4 Nano000000.0%
ByteDance Seed 1.6 Flash000000.0%
Ministral 3 14B000000.0%
GPT-4.1 Nano000000.0%
Ministral 3 8B000000.0%
Gemma 3 4B000000.0%
Ministral 3B000000.0%
LFM2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Mistral NeMO100100100100100100.0%
Inception Mercury1001001001009799.4%
Mistral Small 3.2 24B1001001001009098.0%
GPT-5.4 Mini1001001001008697.2%
GPT-5.4 Mini (Reasoning)100100100948596.0%
Gemini 2.5 Flash (Reasoning)1001001001007194.2%
Qwen 3.5 9B1001001001007194.2%
GPT-5.41001001001006693.3%
Gemini 2.5 Flash Lite1001001001004889.4%
Gemini 2.5 Pro1001001001003286.5%
Z.AI GLM 4.610010092855386.0%
Qwen 3.5 35B100100100982985.3%
Qwen 3.5 122B10010092904284.9%
ByteDance Seed 2.0 Lite10010096943384.7%
GPT-5.4 (Reasoning)10010082736584.0%
Gemini 2.5 Flash1009490884683.4%
GPT-4o Mini (temp=0)10010094843782.9%
GPT-5.4 (Reasoning, Low)10010098823182.1%
Aion 2.010010092734381.6%
Hermes 3 405B100100100100380.7%
GPT-4o, Aug. 6th (temp=0)100100100100080.0%
Qwen 2.5 72B100100100603879.8%
Hermes 3 70B10010010092078.4%
Gemma 3 27B1001009381074.8%
WizardLM 2 8x22b10010010054070.7%
DeepSeek V3.210010073473170.2%
Gemini 2.5 Flash Lite (Reasoning)10010010026065.2%
DeepSeek V3.1978063471961.3%
Claude 3.5 Sonnet1001006934060.6%
Arcee AI: Trinity Large (Preview)1001001000060.0%
Ministral 3 3B1006351483359.1%
Claude 3 Haiku100685246053.2%
Z.AI GLM 4.51001003624051.9%
MiniMax M2.79988630050.2%
Rocinante 12B8879760048.5%
Ministral 3B8583740048.3%
Gemma 3 12B1001002711047.5%
Qwen 3.5 Plus (2026-02-15)100852919046.4%
Z.AI GLM 5 Turbo100100128044.0%
Cohere Command R+ (Aug. 2024)66544734941.8%
Llama 3.1 8B10010000040.0%
Mistral Small Creative6963610038.7%
Inception Mercury 210059290037.8%
Arcee AI: Trinity Mini10043430037.0%
GPT-58370280036.3%
GPT-5.167652817035.6%
GPT-4o, May 13th (temp=1)10045200033.1%
Gemma 3 4B797730031.9%
Claude 3.7 Sonnet8839290031.3%
Llama 3.1 70B1003050026.9%
ByteDance Seed 2.0 Mini1002500025.1%
Llama 3.1 Nemotron 70B1001970025.1%
Z.AI GLM 4.7 Flash635900024.5%
DeepSeek V3 (2024-12-26)1002100024.3%
Mistral Large704700023.3%
Claude Sonnet 466221413022.9%
Ministral 3 8B5834210022.8%
Ministral 8B694500022.8%
Stealth: Healer Alpha5047140022.2%
Stealth: Aurora Alpha812800021.8%
Z.AI GLM 4.7763200021.7%
Stealth: Hunter Alpha5922160019.5%
MoonshotAI: Kimi K2.54735100018.3%
DeepSeek-V2 Chat652300017.6%
Qwen 3 32B612100016.4%
Mistral Large 2651610016.3%
Gemini 3 Pro (Preview)631340016.1%
GPT-4o Mini (temp=1)561100013.6%
Gemini 3 Flash (Preview)65000013.0%
Mistral Medium 3.158400012.5%
Claude Opus 4401900011.7%
Mistral Large 3371800011.0%
Gemini 3.1 Flash Lite (Preview)312300010.7%
Writer: Palmyra X54800009.6%
ByteDance Seed 1.6 Flash4700009.3%
Claude 3.5 Haiku4100008.2%
Ministral 3 14B3080007.8%
GPT-5.4 Nano (Reasoning, Low)3400006.9%
Gemini 3 Flash (Preview, Reasoning)3300006.6%
Claude Sonnet 4.62280006.0%
o4 Mini High2200004.4%
o4 Mini1600003.2%
MiniMax M2.51300002.6%
GPT-5.4 Nano (Reasoning)1300002.6%
GPT-4o, Aug. 6th (temp=1)900001.8%
Claude Sonnet 4.6 (Reasoning)100000.2%
Claude Opus 4.6 (Reasoning)000000.0%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
Grok 4.20 (Beta, Reasoning)000000.0%
Z.AI GLM 5000000.0%
ByteDance Seed 1.6000000.0%
GPT-5.2000000.0%
Claude Opus 4.5000000.0%
Grok 4.1 Fast000000.0%
GPT-4.1000000.0%
Grok 4000000.0%
Claude Sonnet 4.5000000.0%
Grok 4 Fast000000.0%
Claude Haiku 4.5000000.0%
Nemotron 3 Super000000.0%
Grok 4.20 (Beta)000000.0%
GPT-4.1 Mini000000.0%
GPT-5 Nano000000.0%
Mistral Small 4 (Reasoning)000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Qwen3 235B A22B Instruct 2507000000.0%
Nemotron 3 Nano000000.0%
Mistral Small 4000000.0%
GPT-5.4 Nano000000.0%
GPT-4.1 Nano000000.0%
LFM2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Qwen 3.5 35B10010010010010099.9%
GPT-5.4 Mini (Reasoning, Low)1001001001009799.3%
Gemini 3.1 Pro (Preview)1001001001009398.6%
Qwen 3.5 Flash1001001001009098.0%
Qwen 3.5 9B1001001001008997.8%
Gemini 2.5 Flash1001001001008697.3%
Gemini 2.5 Flash Lite1001001001008196.2%
Qwen 3.5 122B1001001001008196.1%
Qwen 3.5 27B1001001001007795.5%
Qwen 2.5 72B1001001001006693.1%
Hermes 3 405B10010095858192.2%
Gemini 2.5 Pro100100100807891.6%
GPT-5.4 Mini1001001001005791.4%
GPT-5.4 (Reasoning)10010086848390.7%
Z.AI GLM 4.61001001001005190.3%
GPT-5.4 (Reasoning, Low)1001001001002985.7%
GPT-5.4100100100544279.2%
Cohere Command R+ (Aug. 2024)10010095554178.2%
DeepSeek V3.2898482705876.6%
Gemini 2.5 Flash (Reasoning)1001008683073.7%
Mistral NeMO1001009368072.2%
DeepSeek V3.110010080493071.7%
Gemma 3 12B100938374571.2%
Gemma 3 27B1001007774070.1%
ByteDance Seed 2.0 Lite10010094211365.6%
Inception Mercury10010010026065.2%
Stealth: Healer Alpha989559403365.1%
Claude 3 Haiku100987724059.8%
Arcee AI: Trinity Mini1001006021056.3%
Z.AI GLM 4.7 Flash99838214055.7%
GPT-4o Mini (temp=0)1001004822054.0%
Z.AI GLM 4.71001002919049.7%
Grok 4.20 (Beta, Reasoning)100100430048.5%
Llama 3.1 70B100100400047.9%
Gemini 2.5 Flash Lite (Reasoning)91585535047.8%
Mistral Large 3100624031046.6%
Z.AI GLM 5 Turbo10091270043.6%
Stealth: Aurora Alpha10070420042.4%
Hermes 3 70B10061440041.1%
Ministral 3 8B100422720037.8%
Gemini 3 Pro (Preview)8874260037.6%
Grok 4.20 (Beta)1008700037.4%
Mistral Large8556117031.9%
Stealth: Hunter Alpha7155208131.0%
Mistral Large 21005050030.9%
Mistral Small 3.2 24B7850250030.7%
Aion 2.071382420030.6%
Qwen 3.5 Plus (2026-02-15)10033150029.7%
Rocinante 12B1004000027.9%
Ministral 3 3B1003200026.3%
GPT-4o, Aug. 6th (temp=0)1002470026.1%
Claude Haiku 4.5100000020.0%
Claude 3.5 Sonnet100000020.0%
Claude 3.5 Haiku100000020.0%
GPT-5.15820130018.2%
Arcee AI: Trinity Large (Preview)562700016.7%
GPT-5621610015.8%
GPT-4o, May 13th (temp=1)72000014.5%
DeepSeek V3 (2025-03-24)60000011.9%
WizardLM 2 8x22b371500010.4%
Claude Opus 4.551000010.3%
Mistral Small 4 (Reasoning)4700009.4%
MiniMax M2.54500009.0%
Gemini 3 Flash (Preview, Reasoning)31130008.8%
Ministral 3 14B3550008.0%
LFM2 24B3500007.0%
Mistral Small Creative3400006.8%
o4 Mini2950006.6%
Claude Opus 43200006.4%
Gemini 3 Flash (Preview)2600005.2%
Z.AI GLM 4.52500005.0%
GPT-5.22120004.7%
MiniMax M2.72100004.2%
Llama 3.1 8B1900003.7%
Gemini 3.1 Flash Lite (Preview)1800003.5%
Claude Sonnet 4.51200002.4%
Z.AI GLM 51000001.9%
Mistral Medium 3.1800001.5%
o4 Mini High700001.4%
Qwen 3 32B300000.5%
GPT-5.4 Nano (Reasoning)100000.3%
MoonshotAI: Kimi K2.5100000.2%
Claude Opus 4.6 (Reasoning)000000.0%
Claude Sonnet 4.6 (Reasoning)000000.0%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
Claude Sonnet 4.6000000.0%
ByteDance Seed 1.6000000.0%
Grok 4.1 Fast000000.0%
Claude Sonnet 4000000.0%
GPT-4.1000000.0%
Grok 4000000.0%
ByteDance Seed 2.0 Mini000000.0%
Grok 4 Fast000000.0%
DeepSeek-V2 Chat000000.0%
Nemotron 3 Super000000.0%
Inception Mercury 2000000.0%
DeepSeek V3 (2024-12-26)000000.0%
Claude 3.7 Sonnet000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-5 Nano000000.0%
Qwen3 235B A22B Instruct 2507000000.0%
Writer: Palmyra X5000000.0%
GPT-5.4 Nano (Reasoning, Low)000000.0%
GPT-4o Mini (temp=1)000000.0%
Nemotron 3 Nano000000.0%
Mistral Small 4000000.0%
Llama 3.1 Nemotron 70B000000.0%
GPT-5.4 Nano000000.0%
ByteDance Seed 1.6 Flash000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
Ministral 8B000000.0%
Ministral 3B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Qwen 3.5 122B1001001001009999.8%
Qwen 3.5 27B1001001001009398.5%
Gemma 3 27B1001001001009298.5%
Mistral NeMO1001001001009198.2%
Qwen 3.5 Flash100100100987895.2%
GPT-5.4 Mini (Reasoning)1001001001006392.7%
GPT-5.4 (Reasoning)100100100887492.5%
Qwen 3.5 35B10010097857992.3%
Inception Mercury1001001001005490.7%
Qwen 3.5 9B100100100925489.1%
Qwen 2.5 72B100100100865888.7%
Gemini 2.5 Flash10010094866188.3%
Gemini 3.1 Pro (Preview)100100100686887.2%
GPT-5.4 Mini (Reasoning, Low)10010083836586.3%
GPT-4o, Aug. 6th (temp=0)10010096745184.1%
GPT-4o, May 13th (temp=0)100100100744784.1%
Mistral Small 3.2 24B1001001001001382.6%
GPT-5.4 (Reasoning, Low)10010088824382.5%
GPT-5.410010083696082.5%
Gemini 2.5 Flash Lite10010082584977.8%
Gemini 2.5 Flash (Reasoning)1008986734077.5%
Hermes 3 405B100100100443074.8%
DeepSeek V3.110010010063072.7%
Cohere Command R+ (Aug. 2024)1001007563067.7%
Llama 3.1 70B10010010037067.4%
Gemini 2.5 Pro1009178402366.3%
Gemini 2.5 Flash Lite (Reasoning)10010058272161.3%
Ministral 3 8B10087863055.0%
GPT-4o Mini (temp=0)87816124050.8%
DeepSeek V3.291906110050.3%
Arcee AI: Trinity Large (Preview)10096510049.3%
Aion 2.085764822747.6%
Mistral Large 294743927046.8%
Z.AI GLM 4.610096293045.7%
Qwen 3.5 Plus (2026-02-15)100555411044.3%
Gemma 3 4B74605320041.4%
Claude 3 Haiku10056339039.6%
Arcee AI: Trinity Mini10049355037.8%
Z.AI GLM 4.7766126141037.4%
Stealth: Aurora Alpha10073130037.3%
Llama 3.1 Nemotron 70B1007573037.0%
ByteDance Seed 2.0 Lite1007370036.0%
Stealth: Hunter Alpha7658430035.4%
Rocinante 12B1006500032.9%
Mistral Large827000030.2%
Z.AI GLM 5 Turbo1004350029.7%
Ministral 8B6045300027.0%
Gemma 3 12B1003100026.2%
GPT-5834500025.8%
GPT-5.1655800024.6%
Claude 3.5 Sonnet723890023.9%
Hermes 3 70B4141370023.7%
Claude Opus 44238290021.8%
DeepSeek-V2 Chat5237170021.2%
DeepSeek V3 (2024-12-26)3736293020.9%
Z.AI GLM 5100000020.0%
DeepSeek V3 (2025-03-24)100000020.0%
WizardLM 2 8x22b100000020.0%
Ministral 3 3B100000020.0%
Gemini 3.1 Flash Lite (Preview)653400019.7%
Claude Sonnet 4.586000017.3%
Ministral 3B80000016.0%
Inception Mercury 278000015.6%
Mistral Small Creative76000015.1%
GPT-5.2611200014.7%
Stealth: Healer Alpha482100013.8%
Nemotron 3 Nano66000013.1%
Grok 4.20 (Beta)322090012.2%
ByteDance Seed 2.0 Mini511000012.1%
Z.AI GLM 4.7 Flash391900011.6%
MiniMax M2.7401800011.6%
Mistral Large 326240009.9%
Qwen 3 32B29190009.4%
GPT-4o, May 13th (temp=1)4200008.4%
Z.AI GLM 4.54200008.3%
Claude Sonnet 4.64000008.0%
Claude Opus 4.52700005.3%
Gemini 3 Pro (Preview)1183004.6%
Mistral Medium 3.12000004.1%
Claude Sonnet 4.6 (Reasoning)2000004.0%
Writer: Palmyra X51900003.7%
Claude 3.7 Sonnet1500003.0%
Grok 4.20 (Beta, Reasoning)1200002.3%
Mistral Small 4 (Reasoning)1100002.2%
GPT-4o Mini (temp=1)1000002.0%
Mistral Small 4700001.4%
Claude Opus 4.6 (Reasoning)000000.0%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
MoonshotAI: Kimi K2.5000000.0%
ByteDance Seed 1.6000000.0%
Gemini 3 Flash (Preview, Reasoning)000000.0%
o4 Mini High000000.0%
Grok 4.1 Fast000000.0%
Claude Sonnet 4000000.0%
MiniMax M2.5000000.0%
GPT-4.1000000.0%
o4 Mini000000.0%
Grok 4000000.0%
Grok 4 Fast000000.0%
Gemini 3 Flash (Preview)000000.0%
Claude Haiku 4.5000000.0%
Nemotron 3 Super000000.0%
Claude 3.5 Haiku000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-5 Nano000000.0%
GPT-5.4 Nano (Reasoning)000000.0%
Qwen3 235B A22B Instruct 2507000000.0%
GPT-5.4 Nano (Reasoning, Low)000000.0%
GPT-5.4 Nano000000.0%
ByteDance Seed 1.6 Flash000000.0%
Ministral 3 14B000000.0%
GPT-4.1 Nano000000.0%
Llama 3.1 8B000000.0%
LFM2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
Inception Mercury100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Mistral NeMO100100100100100100.0%
Qwen 2.5 72B1001001001007695.3%
GPT-5.4 Mini (Reasoning)1001001001005891.6%
GPT-5.4 Mini100100100856389.6%
Qwen 3.5 35B100100100945489.4%
Mistral Small 3.2 24B1001001001004288.3%
Qwen 3.5 27B100100100785386.1%
GPT-5.4 Mini (Reasoning, Low)10010085776785.9%
Qwen 3.5 9B10010085744881.2%
GPT-4o, May 13th (temp=0)100100100100480.8%
GPT-5.41008988695780.7%
Qwen 3.5 122B10010093555380.2%
Gemini 3.1 Pro (Preview)100100100100080.0%
Hermes 3 405B100100100100080.0%
Gemini 2.5 Pro10010072655578.3%
GPT-5.4 (Reasoning, Low)1009788733077.7%
Gemini 2.5 Flash10010010086077.3%
Gemini 2.5 Flash Lite10010090642776.3%
GPT-5.4 (Reasoning)100100100511873.8%
Gemma 3 27B1009468513769.9%
Qwen 3.5 Flash10010088322468.7%
Hermes 3 70B1001009541067.2%
Z.AI GLM 4.61008255544367.0%
Gemini 2.5 Flash (Reasoning)1008582361864.2%
Z.AI GLM 5 Turbo1001001000060.0%
Gemini 2.5 Flash Lite (Reasoning)100535041048.8%
DeepSeek V3.1757148191245.2%
GPT-4o, Aug. 6th (temp=0)97634222044.9%
ByteDance Seed 2.0 Lite100782613043.4%
Claude 3 Haiku10094172042.8%
Cohere Command R+ (Aug. 2024)10010070041.4%
Gemma 3 12B10024190028.5%
Qwen 3.5 Plus (2026-02-15)616000024.3%
Z.AI GLM 4.7 Flash44412111023.3%
Z.AI GLM 4.7535200020.9%
DeepSeek V3.2682890020.8%
GPT-5564700020.5%
Grok 4100000020.0%
Llama 3.1 70B100000020.0%
Arcee AI: Trinity Mini91000018.2%
Rocinante 12B681600016.7%
Ministral 3B80200016.5%
Claude Sonnet 467000013.3%
DeepSeek-V2 Chat63000012.6%
Ministral 3 3B421900012.1%
Stealth: Healer Alpha4200008.3%
GPT-4o Mini (temp=0)24180008.3%
Grok 4.20 (Beta)3700007.4%
Arcee AI: Trinity Large (Preview)3600007.2%
Aion 2.019112006.3%
Stealth: Hunter Alpha19120006.3%
Llama 3.1 Nemotron 70B13130005.2%
Mistral Large 21600003.3%
GPT-4o, May 13th (temp=1)1400002.9%
MiniMax M2.71100002.1%
GPT-5.1800001.7%
Mistral Large200000.4%
Claude Opus 4.6 (Reasoning)000000.0%
Claude Sonnet 4.6 (Reasoning)000000.0%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
Grok 4.20 (Beta, Reasoning)000000.0%
Z.AI GLM 5000000.0%
Claude Sonnet 4.6000000.0%
MoonshotAI: Kimi K2.5000000.0%
ByteDance Seed 1.6000000.0%
Gemini 3 Flash (Preview, Reasoning)000000.0%
o4 Mini High000000.0%
GPT-5.2000000.0%
Claude Opus 4.5000000.0%
Grok 4.1 Fast000000.0%
Gemini 3 Pro (Preview)000000.0%
MiniMax M2.5000000.0%
GPT-4.1000000.0%
o4 Mini000000.0%
Claude Sonnet 4.5000000.0%
Claude Opus 4000000.0%
ByteDance Seed 2.0 Mini000000.0%
Z.AI GLM 4.5000000.0%
Grok 4 Fast000000.0%
Gemini 3.1 Flash Lite (Preview)000000.0%
Mistral Large 3000000.0%
Gemini 3 Flash (Preview)000000.0%
Claude Haiku 4.5000000.0%
Nemotron 3 Super000000.0%
Claude 3.5 Sonnet000000.0%
Inception Mercury 2000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Haiku000000.0%
DeepSeek V3 (2024-12-26)000000.0%
Claude 3.7 Sonnet000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-5 Nano000000.0%
Mistral Small 4 (Reasoning)000000.0%
Qwen 3 32B000000.0%
DeepSeek V3 (2025-03-24)000000.0%
GPT-5.4 Nano (Reasoning)000000.0%
Qwen3 235B A22B Instruct 2507000000.0%
Writer: Palmyra X5000000.0%
GPT-5.4 Nano (Reasoning, Low)000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Medium 3.1000000.0%
Nemotron 3 Nano000000.0%
Mistral Small 4000000.0%
GPT-5.4 Nano000000.0%
ByteDance Seed 1.6 Flash000000.0%
Mistral Small Creative000000.0%
Ministral 3 14B000000.0%
GPT-4.1 Nano000000.0%
Ministral 3 8B000000.0%
Gemma 3 4B000000.0%
Ministral 8B000000.0%
Llama 3.1 8B000000.0%
LFM2 24B000000.0%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Inception Mercury100100100100100100.0%
Qwen 2.5 72B1001001001009999.8%
Mistral NeMO1001001001009498.9%
Qwen 3.5 Flash1001001001009498.7%
Cohere Command R+ (Aug. 2024)100100100989197.7%
GPT-5.4100100100939397.2%
Mistral Small 3.2 24B1001001001007394.6%
GPT-5.4 (Reasoning)100100100946391.5%
GPT-5.4 Mini (Reasoning, Low)100100100936290.9%
Claude 3 Haiku100100100796087.9%
GPT-5.4 (Reasoning, Low)100100100825687.7%
Inception Mercury 2100100100934186.8%
Hermes 3 405B100100100933585.6%
GPT-5.1100100100763381.9%
Z.AI GLM 4.610010098564880.6%
GPT-4o, May 13th (temp=1)100100100100080.0%
GPT-4o, Aug. 6th (temp=0)100100100100080.0%
GPT-4o Mini (temp=0)1009591605279.4%
ByteDance Seed 2.0 Lite100100100562476.0%
Gemini 2.5 Pro10010077653876.0%
Z.AI GLM 4.7969285692773.6%
Nemotron 3 Nano1001009070072.0%
Qwen 3.5 Plus (2026-02-15)1009775482368.5%
Gemini 2.5 Flash Lite1001006756064.6%
Gemini 2.5 Flash1001007925060.7%
Llama 3.1 70B1001001000060.0%
DeepSeek V3.2100885749359.5%
Stealth: Aurora Alpha93877638058.8%
Gemini 2.5 Flash (Reasoning)100855345758.1%
Rocinante 12B100100720054.4%
Llama 3.1 8B100100660053.1%
GPT-51001004520053.1%
Hermes 3 70B100100650052.9%
Arcee AI: Trinity Mini8987870052.5%
DeepSeek-V2 Chat10087560048.6%
Gemini 2.5 Flash Lite (Reasoning)100100187044.9%
o4 Mini High99584621044.7%
ByteDance Seed 2.0 Mini100363329039.5%
GPT-5.268643824039.0%
Llama 3.1 Nemotron 70B10054130033.3%
DeepSeek V3.17969140032.5%
Nemotron 3 Super10041160031.3%
Gemini 3.1 Flash Lite (Preview)796800029.4%
GPT-4o Mini (temp=1)76401713029.3%
Gemma 3 27B7648140027.5%
DeepSeek V3 (2024-12-26)1003300026.6%
o4 Mini695700025.3%
Z.AI GLM 4.7 Flash6232240023.6%
Gemini 3 Flash (Preview)100000020.0%
Grok 4.20 (Beta)100000020.0%
Ministral 3 8B100000020.0%
Stealth: Healer Alpha484300018.2%
WizardLM 2 8x22b85000017.1%
Aion 2.0402200012.3%
Gemma 3 12B25190008.7%
Claude Haiku 4.54300008.6%
Gemma 3 4B4300008.6%
Arcee AI: Trinity Large (Preview)4100008.2%
GPT-5.4 Nano (Reasoning)3700007.3%
Z.AI GLM 5 Turbo3000006.0%
LFM2 24B2800005.5%
Claude Sonnet 42400004.8%
Stealth: Hunter Alpha1250003.5%
GPT-4o, Aug. 6th (temp=1)1700003.4%
Ministral 3B1700003.4%
GPT-5.4 Nano700001.4%
Qwen 3 32B600001.1%
Z.AI GLM 4.5500001.0%
GPT-5.4 Nano (Reasoning, Low)200000.3%
Claude Opus 4.6 (Reasoning)000000.0%
Claude Sonnet 4.6 (Reasoning)000000.0%
GPT-5 Mini000000.0%
Claude Opus 4.6000000.0%
Z.AI GLM 5000000.0%
Claude Sonnet 4.6000000.0%
MoonshotAI: Kimi K2.5000000.0%
ByteDance Seed 1.6000000.0%
Gemini 3 Flash (Preview, Reasoning)000000.0%
Claude Opus 4.5000000.0%
Grok 4.1 Fast000000.0%
MiniMax M2.7000000.0%
Gemini 3 Pro (Preview)000000.0%
MiniMax M2.5000000.0%
GPT-4.1000000.0%
Grok 4000000.0%
Claude Sonnet 4.5000000.0%
Claude Opus 4000000.0%
Grok 4 Fast000000.0%
Mistral Large 3000000.0%
Claude 3.5 Sonnet000000.0%
Claude 3.5 Haiku000000.0%
Claude 3.7 Sonnet000000.0%
GPT-4.1 Mini000000.0%
GPT-5 Nano000000.0%
Mistral Large 2000000.0%
Mistral Small 4 (Reasoning)000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Mistral Large000000.0%
Qwen3 235B A22B Instruct 2507000000.0%
Writer: Palmyra X5000000.0%
Mistral Medium 3.1000000.0%
Mistral Small 4000000.0%
ByteDance Seed 1.6 Flash000000.0%
Mistral Small Creative000000.0%
Ministral 3 14B000000.0%
GPT-4.1 Nano000000.0%
Ministral 3 3B000000.0%
Ministral 8B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Mistral NeMO100100100100100100.0%
Gemini 3.1 Pro (Preview)10010010010010099.9%
GPT-4o, May 13th (temp=0)1001001001009799.4%
Qwen 3.5 122B1001001001009699.1%
ByteDance Seed 2.0 Lite1001001001008496.8%
GPT-5.41001001001007795.5%
Z.AI GLM 4.6100100100967794.7%
Gemma 3 27B1001001001007194.3%
GPT-4o Mini (temp=0)100100100888294.0%
GPT-5.4 (Reasoning)1001001001005991.8%
Gemini 2.5 Pro10010096864385.0%
GPT-5.1100100100616184.4%
Arcee AI: Trinity Large (Preview)10010094942582.4%
Grok 4.20 (Beta)100100100624681.7%
Mistral Small 3.2 24B100100100100080.0%
Llama 3.1 70B10010010085076.9%
Gemini 2.5 Flash10010079534374.8%
GPT-5.21008280634874.5%
Hermes 3 405B100928180070.5%
Aion 2.0100938648967.1%
Qwen 3.5 Plus (2026-02-15)100917441061.2%
o4 Mini High80777272060.2%
WizardLM 2 8x22b100706962060.1%
Inception Mercury1001001000060.0%
Stealth: Aurora Alpha100100880057.7%
MoonshotAI: Kimi K2.5100995924056.5%
GPT-4o, Aug. 6th (temp=0)100100695054.7%
Inception Mercury 21001005013052.6%
Rocinante 12B100100550051.0%
Z.AI GLM 4.7 Flash10089610050.0%
Gemini 2.5 Flash Lite (Reasoning)827042411450.0%
Z.AI GLM 4.793605637049.2%
GPT-5.4 Nano (Reasoning, Low)89703829045.3%
DeepSeek V3.19886420045.1%
Hermes 3 70B9891370045.1%
Nemotron 3 Nano10083370044.1%
GPT-5.4 Nano76594131943.2%
Claude Sonnet 4.510079340042.7%
Gemini 3 Pro (Preview)81575718042.5%
Gemma 3 12B664743352142.4%
DeepSeek V3.293553231042.2%
GPT-4o, May 13th (temp=1)78743225041.9%
Claude 3.5 Haiku10010000040.0%
Z.AI GLM 5 Turbo1009800039.6%
Gemini 3 Flash (Preview, Reasoning)7260590038.1%
LFM2 24B1007170035.5%
ByteDance Seed 2.0 Mini100272411733.8%
Arcee AI: Trinity Mini1006100032.2%
Stealth: Hunter Alpha1005900031.8%
Claude 3 Haiku1004670030.6%
Z.AI GLM 51005100030.3%
MiniMax M2.76752100025.7%
Gemini 3 Flash (Preview)74201615024.8%
Z.AI GLM 4.5843400023.6%
Llama 3.1 Nemotron 70B892900023.5%
DeepSeek V3 (2024-12-26)614500021.3%
Cohere Command R+ (Aug. 2024)555100021.1%
DeepSeek-V2 Chat583680020.5%
Mistral Large 2742150020.1%
Nemotron 3 Super891100020.0%
Ministral 3 3B752100019.2%
GPT-5.4 Nano (Reasoning)602300016.7%
Stealth: Healer Alpha611900016.0%
Mistral Large 374000014.7%
Mistral Large57000011.4%
o4 Mini56000011.3%
Mistral Small Creative55000011.0%
Ministral 8B292500010.8%
Gemini 3.1 Flash Lite (Preview)4000007.9%
Grok 43900007.7%
Claude 3.5 Sonnet3700007.4%
Gemma 3 4B3200006.4%
Claude Opus 4.516150006.3%
Ministral 3 14B2600005.2%
Claude Sonnet 41300002.6%
GPT-5 Mini1000002.1%
Ministral 3B900001.8%
Ministral 3 8B500000.9%
Claude Opus 4.6 (Reasoning)000000.0%
Claude Sonnet 4.6 (Reasoning)000000.0%
Claude Opus 4.6000000.0%
Claude Sonnet 4.6000000.0%
ByteDance Seed 1.6000000.0%
Grok 4.1 Fast000000.0%
MiniMax M2.5000000.0%
GPT-4.1000000.0%
Claude Opus 4000000.0%
Grok 4 Fast000000.0%
Claude Haiku 4.5000000.0%
Claude 3.7 Sonnet000000.0%
GPT-4.1 Mini000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-5 Nano000000.0%
Mistral Small 4 (Reasoning)000000.0%
Qwen 3 32B000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Qwen3 235B A22B Instruct 2507000000.0%
Writer: Palmyra X5000000.0%
GPT-4o Mini (temp=1)000000.0%
Mistral Medium 3.1000000.0%
Mistral Small 4000000.0%
ByteDance Seed 1.6 Flash000000.0%
GPT-4.1 Nano000000.0%
Llama 3.1 8B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
GPT-5.4100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
Inception Mercury 2100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Inception Mercury100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Mistral NeMO100100100100100100.0%
Z.AI GLM 4.610010010010010099.9%
ByteDance Seed 2.0 Lite1001001001009599.0%
Gemini 2.5 Pro1001001001008897.7%
GPT-5.11001001001008897.6%
Gemini 2.5 Flash1001001001008596.9%
GPT-4o Mini (temp=0)1001001001008396.7%
Gemini 2.5 Flash Lite1001001001006993.7%
Qwen 2.5 72B1001001001006993.7%
Gemini 3 Flash (Preview)100100100947393.2%
Qwen 3.5 Plus (2026-02-15)1001001001006492.8%
GPT-51001001001006392.6%
Aion 2.010010092848291.8%
Stealth: Aurora Alpha1001001001005991.8%
Cohere Command R+ (Aug. 2024)100100100827190.7%
Z.AI GLM 4.7100100100826489.2%
GPT-4o, Aug. 6th (temp=0)100100100736888.2%
o4 Mini High1001001001004088.0%
Gemini 3.1 Flash Lite (Preview)100100100676686.5%
Hermes 3 405B100100100893985.6%
GPT-5.4 Nano (Reasoning)1009589735482.5%
Arcee AI: Trinity Large (Preview)100100100100080.0%
Claude 3 Haiku1009886684879.8%
Gemini 3 Flash (Preview, Reasoning)10010010099079.8%
GPT-5.210010010095379.6%
Gemini 3 Pro (Preview)1009888623576.5%
Gemini 2.5 Flash Lite (Reasoning)1007876675875.9%
Gemma 3 27B10010064615175.4%
ByteDance Seed 2.0 Mini10010010069073.7%
Hermes 3 70B10010073444472.4%
WizardLM 2 8x22b10010068563070.8%
DeepSeek V3.11001009448068.3%
GPT-5.4 Nano (Reasoning, Low)938972483567.5%
GPT-4o, May 13th (temp=1)10010086321967.3%
MiniMax M2.710010010036067.2%
LFM2 24B1001008351066.7%
Mistral Large 2837769681762.9%
Nemotron 3 Nano1001008424562.6%
Z.AI GLM 5 Turbo1001009812062.0%
DeepSeek V3.2100858136060.4%
Llama 3.1 8B1001001000060.0%
Rocinante 12B1001001000060.0%
Gemma 3 12B100736946057.6%
Stealth: Hunter Alpha100967414056.9%
o4 Mini100100730054.6%
Claude Sonnet 4.5100100730054.5%
GPT-4o, Aug. 6th (temp=1)1001004725054.4%
Z.AI GLM 4.7 Flash1007169181454.3%
Z.AI GLM 4.582716454054.1%
GPT-5.4 Nano876244373753.3%
Mistral Large 31001004019753.1%
Llama 3.1 Nemotron 70B928235292652.8%
Mistral Large666043434150.5%
GPT-5 Mini96635732049.7%
Mistral Medium 3.191565347049.4%
Ministral 3 3B100645611647.3%
Arcee AI: Trinity Mini90624329044.6%
Gemma 3 4B100100200044.0%
DeepSeek-V2 Chat10099119043.9%
Ministral 3 8B58535246041.8%
Claude Sonnet 4.6584541382641.7%
Claude 3.5 Sonnet1009870040.9%
GPT-4o Mini (temp=1)10010000040.0%
Claude Sonnet 4725036201738.8%
DeepSeek V3 (2024-12-26)10052270035.7%
GPT-4.1 Nano10043270034.0%
Nemotron 3 Super10046240034.0%
Llama 3.1 70B1006500032.9%
Stealth: Healer Alpha866340030.8%
DeepSeek V3 (2025-03-24)10033140029.5%
Mistral Small 479322313029.3%
Ministral 3B875700028.9%
Ministral 8B7025259025.8%
Claude 3.5 Haiku1002900025.7%
Mistral Small Creative49442311025.3%
MiniMax M2.5793250023.2%
GPT-4.1 Mini743200021.1%
Grok 44137260020.8%
Ministral 3 14B5927140019.9%
Claude Haiku 4.585000017.0%
Qwen3 235B A22B Instruct 250774100015.0%
GPT-5 Nano3721130014.3%
Claude Sonnet 4.6 (Reasoning)312250011.5%
GPT-4.155000011.0%
ByteDance Seed 1.6 Flash55000011.0%
Claude Opus 454000010.7%
Claude 3.7 Sonnet2120100010.1%
Claude Opus 4.6 (Reasoning)4000008.0%
Qwen 3 32B2400004.8%
Writer: Palmyra X52400004.8%
Claude Opus 4.51800003.6%
Grok 4.1 Fast1500003.1%
Grok 4 Fast800001.5%
Mistral Small 4 (Reasoning)600001.2%
Claude Opus 4.6400000.8%
MoonshotAI: Kimi K2.5300000.6%
Z.AI GLM 5000000.0%
ByteDance Seed 1.6000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
GPT-5.1100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
GPT-5.4100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 2.5 Flash (Reasoning)1001001001009999.8%
GPT-4o, May 13th (temp=0)1001001001009198.3%
Z.AI GLM 4.71001001001008997.9%
Gemini 2.5 Flash Lite1001001001008597.0%
Gemini 2.5 Flash100100100929196.7%
Stealth: Aurora Alpha10010098918294.1%
DeepSeek V3.1100100100877893.0%
Qwen 3.5 Plus (2026-02-15)1001001001005490.9%
Gemini 2.5 Pro1001001001003787.4%
ByteDance Seed 2.0 Lite1001001001003687.1%
Hermes 3 405B100100100904386.6%
Hermes 3 70B100100100685985.3%
Gemma 3 27B10010090814984.0%
GPT-5100100100615783.6%
Mistral NeMO1001001001001182.2%
GPT-5.2100100100554880.6%
Qwen 2.5 72B100100100100280.5%
Claude 3 Haiku100100100584480.4%
Gemini 2.5 Flash Lite (Reasoning)10010071606078.4%
Arcee AI: Trinity Mini10010010078075.6%
GPT-4o Mini (temp=0)10010086533674.8%
Aion 2.01009288701172.2%
Cohere Command R+ (Aug. 2024)100100100461171.3%
GPT-5.4 Nano986565645469.1%
GPT-5.4 Nano (Reasoning)887777534166.9%
Inception Mercury1001007855066.8%
DeepSeek V3.21009662581265.7%
Gemini 3 Pro (Preview)1001007545064.1%
Z.AI GLM 4.7 Flash969361491763.3%
Mistral Large 21006556513260.7%
GPT-5.4 Nano (Reasoning, Low)95808030958.8%
Inception Mercury 2100100940058.8%
MiniMax M2.71001005142058.6%
Mistral Large 31001005933058.4%
MoonshotAI: Kimi K2.5100100820056.4%
Llama 3.1 8B10088850054.5%
LFM2 24B1006358341353.5%
o4 Mini10097650052.3%
Z.AI GLM 5 Turbo100100421048.7%
GPT-4o, Aug. 6th (temp=0)10076670048.6%
Mistral Large10086427047.0%
Gemma 3 12B92674029045.5%
DeepSeek V3 (2024-12-26)100100148044.6%
WizardLM 2 8x22b73564843044.0%
Llama 3.1 70B100100190043.7%
Claude Opus 481656010043.1%
Z.AI GLM 4.568665027342.9%
Claude Sonnet 4.586654811042.0%
Gemini 3 Flash (Preview)10053520040.9%
o4 Mini High96701717040.1%
Stealth: Healer Alpha10085110039.3%
GPT-4o, May 13th (temp=1)10069230038.3%
Stealth: Hunter Alpha8875230037.1%
Nemotron 3 Super10058160034.8%
Gemini 3 Flash (Preview, Reasoning)73483018033.8%
Z.AI GLM 58164150032.0%
DeepSeek-V2 Chat9044230031.4%
Ministral 3B6456350031.1%
Rocinante 12B1004900029.7%
Mistral Medium 3.16537250025.3%
Qwen 3 32B6727230023.5%
Claude 3.5 Sonnet932100022.8%
GPT-4o, Aug. 6th (temp=1)100000020.0%
Claude Haiku 4.599000019.7%
Claude 3.7 Sonnet544200019.1%
Nemotron 3 Nano632900018.6%
Llama 3.1 Nemotron 70B682400018.3%
GPT-5 Mini443400015.7%
Arcee AI: Trinity Large (Preview)433300015.2%
Ministral 3 8B74000014.8%
Mistral Small Creative541400013.6%
ByteDance Seed 2.0 Mini551100013.4%
Gemma 3 4B55900012.8%
Claude Opus 4.560000012.0%
GPT-4.1 Mini351900010.7%
Gemini 3.1 Flash Lite (Preview)21135008.0%
GPT-4o Mini (temp=1)3300006.6%
Mistral Small 43200006.4%
Ministral 8B3000006.0%
Claude Sonnet 4.6 (Reasoning)2000004.0%
GPT-4.1 Nano2000004.0%
MiniMax M2.51010002.2%
Claude Opus 4.6 (Reasoning)900001.8%
Ministral 3 3B900001.8%
Claude Opus 4.6000000.0%
Claude Sonnet 4.6000000.0%
ByteDance Seed 1.6000000.0%
Grok 4.1 Fast000000.0%
Claude Sonnet 4000000.0%
GPT-4.1000000.0%
Grok 4000000.0%
Grok 4 Fast000000.0%
Claude 3.5 Haiku000000.0%
GPT-5 Nano000000.0%
Mistral Small 4 (Reasoning)000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Qwen3 235B A22B Instruct 2507000000.0%
Writer: Palmyra X5000000.0%
ByteDance Seed 1.6 Flash000000.0%
Ministral 3 14B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
GPT-5.4100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Inception Mercury100100100100100100.0%
Mistral NeMO10010010010010099.9%
GPT-51001001001009699.1%
ByteDance Seed 2.0 Lite1001001001009498.8%
Qwen 3.5 Flash1001001001008897.6%
Gemini 2.5 Flash (Reasoning)100100100988296.1%
Nemotron 3 Nano100100100968195.4%
Cohere Command R+ (Aug. 2024)1001001001007294.5%
GPT-5.11001001001006993.8%
Qwen 3.5 Plus (2026-02-15)10010095898593.8%
Stealth: Aurora Alpha1001001001006893.7%
Grok 4.20 (Beta, Reasoning)1001001001006392.7%
Inception Mercury 2100100100866990.9%
GPT-4o, May 13th (temp=0)1001001001005390.5%
Qwen 2.5 72B100100100797290.3%
Gemini 2.5 Flash100100100905589.0%
Mistral Large 31001001001003386.6%
Gemini 2.5 Flash Lite10010099863483.9%
Mistral Large10010091695282.3%
GPT-4o, May 13th (temp=1)100100100704081.8%
GPT-5.21009492803680.4%
Gemma 3 27B100100100524579.4%
Z.AI GLM 4.710010082763378.3%
Ministral 3B1001009593077.6%
Claude 3.5 Sonnet99969491075.9%
Mistral Small 3.2 24B10010010077075.4%
Mistral Large 210010010063072.7%
LFM2 24B100100100431571.8%
Gemini 2.5 Pro10010069523671.3%
Llama 3.1 8B10010010056071.3%
Grok 4.20 (Beta)1001008273070.9%
Hermes 3 405B100100100371670.6%
WizardLM 2 8x22b1008362513967.1%
GPT-5.4 Nano (Reasoning, Low)797877712365.7%
ByteDance Seed 2.0 Mini1001008343065.1%
o4 Mini High1001007349064.4%
GPT-4o, Aug. 6th (temp=0)10010010017063.4%
Aion 2.01008785291262.7%
DeepSeek V3 (2024-12-26)817063603862.5%
Arcee AI: Trinity Large (Preview)1001001005060.9%
Hermes 3 70B878563541560.7%
GPT-4o Mini (temp=0)968150413660.6%
Claude Sonnet 4.5100846749059.9%
DeepSeek V3.1967360323158.4%
GPT-5.4 Nano926055453357.1%
Gemini 2.5 Flash Lite (Reasoning)91868124056.6%
o4 Mini1001005812755.4%
DeepSeek V3.2676651453853.6%
Llama 3.1 70B10079710050.2%
Gemini 3 Pro (Preview)100743938050.1%
Gemini 3 Flash (Preview, Reasoning)925752251347.8%
Z.AI GLM 5100565123045.9%
Llama 3.1 Nemotron 70B8675617045.7%
Ministral 3 14B93543829042.7%
Claude 3 Haiku100712415042.0%
Gemini 3.1 Flash Lite (Preview)9362450040.1%
Z.AI GLM 5 Turbo8483240038.3%
GPT-4o Mini (temp=1)10054297037.8%
Gemma 3 12B84602416036.7%
Mistral Medium 3.172712317036.6%
MiniMax M2.78056390035.0%
GPT-5.4 Nano (Reasoning)9446330034.5%
Z.AI GLM 4.577433020034.2%
Nemotron 3 Super10052170033.8%
Ministral 8B8543350032.5%
Qwen 3 32B10038240032.4%
Rocinante 12B1005230031.0%
Ministral 3 3B1005220030.9%
Z.AI GLM 4.7 Flash633024181429.8%
Claude Sonnet 462442019029.2%
Claude 3.7 Sonnet7448193028.7%
Gemma 3 4B57422019027.6%
MoonshotAI: Kimi K2.5785150026.8%
MiniMax M2.51003000025.9%
Mistral Small 4 (Reasoning)605900023.7%
Ministral 3 8B7129180023.6%
DeepSeek V3 (2025-03-24)615600023.5%
Gemini 3 Flash (Preview)773230022.3%
DeepSeek-V2 Chat484283120.6%
Claude Opus 4.5554800020.6%
Stealth: Hunter Alpha100000020.0%
Claude Sonnet 4.6 (Reasoning)4331200018.9%
Writer: Palmyra X5711370018.2%
Stealth: Healer Alpha483900017.5%
Qwen3 235B A22B Instruct 250771000014.1%
GPT-5 Mini521500013.5%
GPT-4.1 Mini67000013.3%
Mistral Small Creative411600011.3%
GPT-4.151000010.3%
Arcee AI: Trinity Mini3670008.5%
ByteDance Seed 1.6 Flash3800007.5%
Claude Opus 41150003.2%
Claude Haiku 4.51100002.3%
GPT-4.1 Nano1100002.3%
Claude Sonnet 4.61000002.0%
Grok 4720001.7%
Claude Opus 4.6 (Reasoning)000000.0%
Claude Opus 4.6000000.0%
ByteDance Seed 1.6000000.0%
Grok 4.1 Fast000000.0%
Grok 4 Fast000000.0%
Claude 3.5 Haiku000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-5 Nano000000.0%
Mistral Small 4000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
GPT-5.4100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
Inception Mercury100100100100100100.0%
Mistral NeMO100100100100100100.0%
GPT-5.11001001001009498.8%
GPT-51001001001007995.9%
Mistral Small 3.2 24B1001001001007695.1%
ByteDance Seed 2.0 Lite1001001001005691.3%
Grok 4.20 (Beta)1001001001005290.4%
Qwen 3.5 Plus (2026-02-15)100100100767389.9%
GPT-4o, May 13th (temp=0)1001001001004589.1%
Gemini 2.5 Flash (Reasoning)100100100825086.3%
Gemini 2.5 Flash Lite10010093824984.7%
Qwen 2.5 72B10010010098981.4%
Z.AI GLM 4.6100100100100080.0%
Gemini 2.5 Pro100100100100080.0%
GPT-4o, Aug. 6th (temp=0)100100100831780.0%
Inception Mercury 210010090753179.3%
Rocinante 12B10010010095079.0%
Z.AI GLM 4.71009472692972.8%
GPT-4o Mini (temp=0)10010050454467.9%
Gemini 2.5 Flash1001007634062.2%
MiniMax M2.7100100911058.5%
Stealth: Aurora Alpha1001005235057.4%
Gemma 3 27B9391800052.9%
o4 Mini100965211051.8%
Gemma 3 12B100834813048.7%
Gemini 3 Flash (Preview)82716030048.5%
o4 Mini High100745311047.5%
Claude 3 Haiku100603729045.1%
Nemotron 3 Super10078430044.1%
DeepSeek V3 (2024-12-26)9772470043.1%
Gemini 3 Pro (Preview)9761540042.5%
Z.AI GLM 4.7 Flash10085241042.1%
Hermes 3 405B10010070041.4%
DeepSeek-V2 Chat10079210040.0%
Llama 3.1 70B10082160039.6%
Hermes 3 70B10072190038.2%
Arcee AI: Trinity Mini7970380037.5%
LFM2 24B8583160036.7%
Ministral 8B1005900031.8%
DeepSeek V3.17058225331.5%
Gemini 2.5 Flash Lite (Reasoning)7435330028.4%
Nemotron 3 Nano1004100028.2%
MiniMax M2.57243240027.9%
GPT-5.253433011027.4%
Z.AI GLM 51003000026.1%
WizardLM 2 8x22b725800026.0%
DeepSeek V3.27530250025.8%
GPT-5.4 Nano (Reasoning)695120024.5%
Gemini 3.1 Flash Lite (Preview)1002200024.4%
Stealth: Healer Alpha5633250022.7%
Claude Opus 4605200022.5%
Gemini 3 Flash (Preview, Reasoning)4947150022.3%
Z.AI GLM 5 Turbo100000020.0%
Claude Sonnet 4.6100000020.0%
Claude 3.5 Haiku100000020.0%
Ministral 3 8B98000019.6%
Ministral 3B494700019.1%
Claude Sonnet 4.5811100018.3%
Llama 3.1 8B90000018.0%
Aion 2.03921201016.3%
Claude Haiku 4.5462130013.9%
GPT-5.4 Nano (Reasoning, Low)2621140012.4%
GPT-4o, May 13th (temp=1)58000011.6%
GPT-4.1 Mini54000010.7%
Stealth: Hunter Alpha4700009.4%
Mistral Small Creative4600009.2%
GPT-5.4 Nano4200008.4%
Mistral Large2970007.1%
Cohere Command R+ (Aug. 2024)2420005.1%
Arcee AI: Trinity Large (Preview)2500005.1%
GPT-5 Mini2300004.5%
Gemma 3 4B2200004.4%
Mistral Medium 3.11200002.5%
GPT-4o Mini (temp=1)1000002.0%
Qwen 3 32B500001.1%
Mistral Large 2500000.9%
Llama 3.1 Nemotron 70B300000.7%
Claude Opus 4.6 (Reasoning)200000.4%
Claude Sonnet 4.6 (Reasoning)000000.0%
Claude Opus 4.6000000.0%
MoonshotAI: Kimi K2.5000000.0%
ByteDance Seed 1.6000000.0%
Claude Opus 4.5000000.0%
Grok 4.1 Fast000000.0%
Claude Sonnet 4000000.0%
GPT-4.1000000.0%
Grok 4000000.0%
ByteDance Seed 2.0 Mini000000.0%
Z.AI GLM 4.5000000.0%
Grok 4 Fast000000.0%
Mistral Large 3000000.0%
Claude 3.5 Sonnet000000.0%
Claude 3.7 Sonnet000000.0%
GPT-4o, Aug. 6th (temp=1)000000.0%
GPT-5 Nano000000.0%
Mistral Small 4 (Reasoning)000000.0%
DeepSeek V3 (2025-03-24)000000.0%
Qwen3 235B A22B Instruct 2507000000.0%
Writer: Palmyra X5000000.0%
Mistral Small 4000000.0%
ByteDance Seed 1.6 Flash000000.0%
Ministral 3 14B000000.0%
GPT-4.1 Nano000000.0%
Ministral 3 3B000000.0%