Adverb-first sentence starts

Test: Bad Writing Habits

Avg. Score
51.8%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Writer: Palmyra X594.5%$0.01122.0s64%
2Mistral Small Creative82.4%$0.00079.1s46%
3Mistral Medium 3.183.9%$0.004836.5s51%
4Ministral 3 14B81.0%$0.000711.7s43%
5Mistral Large 377.9%$0.003330.3s33%
6Mistral Large79.7%$0.01430.9s35%
7Mistral Large 278.0%$0.01329.4s33%
8Gemma 3 4B71.3%$0.000220.0s28%
9GPT-4.1 Nano71.8%$0.000713.3s26%
10Ministral 8B70.1%$0.000410.4s22%
11Ministral 3 8B69.8%$0.000819.6s22%
12Gemma 3 27B67.9%$0.000652.6s25%
13Minimax M2.569.8%$0.00341.3m25%
14Claude Haiku 4.564.8%$0.01121.6s20%
15Claude Sonnet 4.571.0%$0.03538.1s26%
16Z.AI GLM 569.5%$0.00841.2m24%
17Gemma 3 12B65.3%$0.000441.3s18%
18Ministral 3 3B60.5%$0.000511.1s15%
19Ministral 3B61.2%$0.00018.1s13%
20DeepSeek V3 (2025-03-24)63.3%$0.001439.4s14%
21Gemini 2.5 Flash Lite54.3%$0.00099.5s12%
22ByteDance Seed 1.6 Flash53.3%$0.001327.3s14%
23GPT-4.159.6%$0.01844.7s18%
24o4 Mini53.5%$0.01525.7s17%
25o4 Mini High57.3%$0.02547.2s20%
26GPT-5.169.8%$0.0541.8m31%
27Gemini 2.5 Flash52.2%$0.005210.6s10%
28Rocinante 12B57.3%$0.001438.4s9%
29Mistral NeMO50.3%$0.000510.1s8%
30Claude 3 Haiku48.6%$0.002514.9s11%
31GPT-4.1 Mini49.8%$0.002719.0s10%
32GPT-5 Nano55.1%$0.00421.4m18%
33DeepSeek-V2 Chat56.8%$0.002153.3s10%
34GPT-4o, Aug. 6th (temp=1)54.9%$0.01824.4s10%
35Z.AI GLM 4.7 Flash50.4%$0.00171.2m17%
36Arcee AI: Trinity Large (Preview)53.1%$0.000043.6s8%
37GPT-4o Mini (temp=1)50.4%$0.001234.8s9%
38GPT-4o, May 13th (temp=1)51.9%$0.03314.4s13%
39Qwen 3.5 Plus (2026-02-15)45.0%$0.006031.5s12%
40Gemini 3 Flash (Preview)41.4%$0.007819.6s12%
41Claude Sonnet 454.7%$0.03243.7s15%
42Z.AI GLM 4.648.7%$0.006551.5s11%
43DeepSeek V3 (2024-12-26)50.8%$0.002154.6s8%
44Claude Opus 4.563.6%$0.07053.4s19%
45Claude Sonnet 4.656.6%$0.03139.3s9%
46Claude Opus 4.665.2%$0.0781.2m24%
47Hermes 3 70B53.2%$0.00101.2m9%
48GPT-5.259.9%$0.0561.5m20%
49Hermes 3 405B43.6%$0.003253.2s8%
50Grok 4 Fast32.5%$0.001724.1s8%
51Z.AI GLM 4.538.7%$0.005142.1s8%
52DeepSeek V3.250.1%$0.00141.9m12%
53Gemini 2.5 Pro48.0%$0.03636.2s9%
54Z.AI GLM 4.744.2%$0.0101.4m12%
55GPT-5 Mini39.9%$0.010057.4s10%
56Grok 4.1 Fast30.5%$0.001837.8s9%
57Claude 3.5 Sonnet49.2%$0.04835.5s9%
58Cohere Command R+ (Aug. 2024)42.4%$0.02052.5s8%
59Arcee AI: Trinity Mini33.6%$0.00039.2s0%
60Claude 3.5 Haiku33.3%$0.003510.8s0%
61Qwen 3.5 397B A17B54.6%$0.0143.0m20%
62Claude 3.7 Sonnet41.6%$0.04246.7s12%
63GPT-4o, May 13th (temp=0)32.0%$0.03514.1s10%
64DeepSeek V3.143.2%$0.00201.8m10%
65WizardLM 2 8x22b42.8%$0.00261.8m9%
66Gemini 3 Pro (Preview)43.2%$0.05554.4s13%
67Llama 3.1 Nemotron 70B29.8%$0.003831.7s0%
68Llama 3.1 8B38.1%$0.00031.3m0%
69Llama 3.1 70B26.1%$0.001529.4s0%
70GPT-4o Mini (temp=0)25.8%$0.001234.8s0%
71Qwen 2.5 72B25.0%$0.001036.7s0%
72GPT-4o, Aug. 6th (temp=0)23.1%$0.02322.7s0%
73Stealth: Aurora Alpha7.5%$0.00009.8s0%
74MoonshotAI: Kimi K2.547.1%$0.0193.2m10%
75Grok 434.3%$0.0481.7m9%
76GPT-540.7%$0.0652.8m17%
77Claude Opus 465.4%$0.2091.4m23%
78ByteDance Seed 1.620.3%$0.0132.5m0%
79Gemini 3.1 Pro (Preview)28.9%$0.1071.8m7%
80Mistral Small 3.2 24B19.1%$0.00695.7m0%
51.83%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X5100100100100100100.0%
Claude Sonnet 4.51001001001009398.5%
Claude Sonnet 4.610010098988295.7%
Gemma 3 4B100100100884686.6%
Z.AI GLM 5100100100804084.0%
Claude Opus 4.610010073726782.3%
DeepSeek-V2 Chat100100100100080.0%
Claude Opus 410010071646279.3%
WizardLM 2 8x22b10010083812878.4%
GPT-4o, Aug. 6th (temp=1)10010071645177.3%
Mistral Small Creative100100100443976.6%
Hermes 3 405B10010010071074.2%
Gemini 2.5 Flash100989478073.9%
MoonshotAI: Kimi K2.51009367494871.3%
GPT-5 Nano1009784423270.9%
Z.AI GLM 4.7 Flash10010079403570.8%
Rocinante 12B10010010046069.1%
DeepSeek V3 (2024-12-26)100908172068.8%
Gemma 3 12B10010010040067.9%
GPT-5.21009759433466.8%
Claude Opus 4.510010010030066.0%
Mistral Large989672262563.6%
GPT-5.1897272501659.9%
Ministral 3B100100950059.0%
Minimax M2.5100875552058.7%
Gemini 2.5 Flash Lite100100910058.3%
Ministral 3 14B100100870057.3%
Ministral 8B100100820056.5%
Z.AI GLM 4.6100915138056.1%
Qwen 2.5 72B9988870054.7%
GPT-4.1 Nano100635654054.3%
Mistral Large 391795340052.7%
Hermes 3 70B10095670052.4%
Claude Sonnet 4100655442052.2%
Cohere Command R+ (Aug. 2024)90715146051.6%
Ministral 3 8B100100570051.5%
Gemini 3 Pro (Preview)100843428049.3%
Qwen 3.5 397B A17B100745514048.8%
Arcee AI: Trinity Large (Preview)10096460048.3%
Gemini 2.5 Pro94724229047.6%
DeepSeek V3 (2025-03-24)10088490047.3%
GPT-4o, May 13th (temp=1)10095400047.1%
Mistral Large 2100100320046.5%
GPT-4.1 Mini85504946046.0%
Claude Haiku 4.510079510046.0%
GPT-4.198663231045.5%
Mistral Small 3.2 24B10080460045.3%
Mistral NeMO74674433043.4%
Gemma 3 27B79464340041.7%
Gemini 3 Flash (Preview)67614729040.8%
Claude 3.5 Haiku10010000040.0%
GPT-4o Mini (temp=1)1009700039.3%
DeepSeek V3.269653328038.9%
Mistral Medium 3.1613934322538.1%
Claude 3.7 Sonnet7567460037.4%
o4 Mini High78444322037.3%
Llama 3.1 Nemotron 70B1008500037.1%
Z.AI GLM 4.770413228034.3%
o4 Mini48454231033.4%
Qwen 3.5 Plus (2026-02-15)65353231032.7%
ByteDance Seed 1.6 Flash6655350031.3%
GPT-5 Mini89232320031.0%
GPT-557282114024.0%
GPT-4o, Aug. 6th (temp=0)634700022.0%
Grok 4584400020.5%
Claude 3.5 Sonnet100000020.0%
Ministral 3 3B100000020.0%
Llama 3.1 8B100000020.0%
Grok 4.1 Fast3431280018.8%
ByteDance Seed 1.690000018.0%
GPT-4o Mini (temp=0)434100016.8%
Llama 3.1 70B74000014.8%
Arcee AI: Trinity Mini72000014.5%
DeepSeek V3.1373300014.1%
Claude 3 Haiku68000013.6%
Gemini 3.1 Pro (Preview)312200010.6%
GPT-4o, May 13th (temp=0)50000010.0%
Z.AI GLM 4.54300008.7%
Stealth: Aurora Alpha3400006.9%
Grok 4 Fast000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Ministral 8B100100100100100100.0%
GPT-4o Mini (temp=1)1001001001009198.3%
Gemma 3 12B1001001001008997.9%
Gemini 2.5 Flash1001001001008396.5%
GPT-4.1 Nano1001001001008196.3%
Ministral 3 14B1001001001007795.3%
Z.AI GLM 4.71001001001007595.0%
Mistral Medium 3.11001001001006893.6%
Arcee AI: Trinity Large (Preview)100100100947393.5%
Ministral 3 3B1001001001006793.3%
WizardLM 2 8x22b100100100917192.3%
GPT-4.1 Mini100100100808092.1%
Mistral Small Creative1001001001005590.9%
Claude 3.7 Sonnet100100100837290.9%
Claude Sonnet 4.5100100100816789.7%
Z.AI GLM 4.51001001001004889.5%
Claude Sonnet 41001001001004789.4%
GPT-4o, Aug. 6th (temp=1)1001001001004689.3%
DeepSeek-V2 Chat100100100885889.2%
Claude Opus 4100100100816288.7%
Z.AI GLM 4.61001001001003887.7%
GPT-5 Nano100100100706887.5%
Grok 4 Fast10010085717085.3%
Claude Sonnet 4.6100100100784684.9%
DeepSeek V3 (2024-12-26)100100100715384.8%
GPT-5.110010090676083.6%
Claude 3 Haiku100100100605783.4%
Gemini 2.5 Flash Lite10010098853283.1%
Gemini 2.5 Pro100100100703380.7%
GPT-5.2979083746080.5%
Mistral NeMO100100100100080.0%
Grok 410010088822879.7%
Ministral 3 8B10010083694579.6%
Arcee AI: Trinity Mini1009583783878.9%
Hermes 3 70B10010010090078.0%
o4 Mini1008873655977.1%
Rocinante 12B10010010083076.6%
GPT-4.11009086693475.9%
ByteDance Seed 1.6 Flash10010083613175.0%
DeepSeek V3 (2025-03-24)10010010067073.3%
Minimax M2.510010094383473.2%
Llama 3.1 Nemotron 70B10010010062072.3%
Qwen 3.5 Plus (2026-02-15)1008577563771.2%
Cohere Command R+ (Aug. 2024)10010057534571.1%
Ministral 3B1001008569071.0%
Gemini 3.1 Pro (Preview)1008771682870.8%
Qwen 3.5 397B A17B1008766464468.7%
GPT-4o, May 13th (temp=1)10010078372968.7%
Gemini 3 Pro (Preview)10010061443167.2%
Z.AI GLM 4.7 Flash1008769473266.9%
Claude 3.5 Sonnet1008955454566.7%
GPT-51007165504766.5%
Claude Opus 4.6100948246064.4%
DeepSeek V3.11001007929061.5%
o4 Mini High1001007718059.0%
Hermes 3 405B100100880057.5%
Llama 3.1 8B100795843056.1%
GPT-4o, May 13th (temp=0)100806137055.6%
GPT-4o Mini (temp=0)100814341053.0%
MoonshotAI: Kimi K2.5100774137051.0%
Gemini 3 Flash (Preview)100626028050.1%
Grok 4.1 Fast8761490039.5%
GPT-4o, Aug. 6th (temp=0)9356425039.0%
GPT-5 Mini504839242236.6%
ByteDance Seed 1.66261600036.4%
Qwen 2.5 72B1006100032.2%
Llama 3.1 70B8848170030.5%
Stealth: Aurora Alpha5437230022.7%
Claude 3.5 Haiku100000020.0%
Mistral Small 3.2 24B651241016.4%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X5100100100100100100.0%
Mistral Large100100100100100100.0%
Ministral 3 8B1001001001009899.6%
Mistral Large 31001001001009298.3%
Mistral Large 21001001001008597.1%
Claude Haiku 4.5100100100838293.1%
Ministral 8B1001001001004889.7%
Gemma 3 27B10010098777289.4%
GPT-5 Nano100100100765786.6%
Ministral 3 14B10010098854385.2%
Gemma 3 4B100100100100080.0%
GPT-4.11008972705477.1%
Claude Opus 410010080752676.3%
Mistral Medium 3.110010071634676.0%
Minimax M2.510010081514074.5%
Hermes 3 70B10010010058071.7%
Ministral 3B10010010058071.7%
DeepSeek V3.210010065593471.6%
DeepSeek-V2 Chat10010010052070.4%
GPT-4.1 Nano10010010049069.8%
GPT-5.21007861574167.3%
o4 Mini High1007362574266.9%
Z.AI GLM 51007366563766.3%
Claude Sonnet 4.510010061353165.3%
Claude Opus 4.5100978840064.8%
Qwen 3.5 397B A17B928173492463.9%
o4 Mini1006863563063.5%
Claude Opus 4.6100938330061.2%
Ministral 3 3B1001005142158.9%
GPT-510010031292857.7%
Claude Sonnet 4.61005348444357.7%
Mistral Small Creative100936133057.3%
Claude 3.5 Sonnet10098850056.7%
DeepSeek V3 (2025-03-24)100100740054.8%
Claude Sonnet 4100884343054.7%
WizardLM 2 8x22b100933232051.3%
Z.AI GLM 4.7726842353250.0%
Arcee AI: Trinity Large (Preview)100100500050.0%
GPT-5 Mini686845412549.5%
Gemma 3 12B100100460049.3%
ByteDance Seed 1.6 Flash100783224047.0%
Cohere Command R+ (Aug. 2024)9083610046.8%
Gemini 2.5 Pro10084440045.7%
Z.AI GLM 4.677625535045.6%
Gemini 3 Pro (Preview)10088330044.1%
GPT-4o, Aug. 6th (temp=1)10064530043.4%
GPT-5.181723029042.3%
GPT-4o, May 13th (temp=1)10057490041.3%
Llama 3.1 70B10010000040.0%
Llama 3.1 Nemotron 70B10010000040.0%
Rocinante 12B10010000040.0%
Claude 3.5 Haiku1009800039.6%
GPT-4o Mini (temp=1)1009300038.5%
Llama 3.1 8B1008800037.5%
Gemini 2.5 Flash10049370037.3%
ByteDance Seed 1.6957200033.5%
DeepSeek V3.19731310031.6%
Mistral NeMO1005600031.1%
Hermes 3 405B1005500031.0%
MoonshotAI: Kimi K2.5896300030.4%
GPT-4o, May 13th (temp=0)1004800029.5%
Grok 4.1 Fast1004300028.5%
Gemini 2.5 Flash Lite954200027.4%
GPT-4.1 Mini766100027.3%
Qwen 3.5 Plus (2026-02-15)763300021.8%
Gemini 3 Flash (Preview)3835340021.3%
Z.AI GLM 4.7 Flash4131300020.5%
Claude 3.7 Sonnet100000020.0%
Grok 4 Fast563400017.9%
Gemini 3.1 Pro (Preview)353200013.5%
Qwen 2.5 72B53000010.6%
Mistral Small 3.2 24B50110010.5%
DeepSeek V3 (2024-12-26)4800009.5%
Z.AI GLM 4.54600009.3%
Arcee AI: Trinity Mini4200008.4%
GPT-4o Mini (temp=0)3600007.2%
Grok 43400006.7%
Stealth: Aurora Alpha000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Claude 3 Haiku000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X5100100100100100100.0%
Mistral Large100100100947092.8%
Mistral Small Creative1001001001005691.3%
Mistral Large 310010091867490.3%
Claude Sonnet 4.51001001001003987.8%
Mistral Medium 3.1100100100972784.8%
Arcee AI: Trinity Large (Preview)100100100564981.1%
Ministral 3 3B10010010093078.5%
Ministral 3 14B10010010091078.3%
Ministral 3B10010010065073.1%
Claude Opus 41009370494070.6%
Minimax M2.510010010048069.5%
Gemini 2.5 Flash Lite1001007863068.3%
Gemini 2.5 Flash1001009443067.3%
Z.AI GLM 4.71006968653367.1%
Ministral 3 8B1001009140066.3%
Gemma 3 4B100958346065.0%
Z.AI GLM 5888276413464.1%
Qwen 3.5 397B A17B1001008136063.3%
GPT-5.1977460434162.9%
Mistral Large 21001007440062.8%
Gemma 3 27B1008243434061.7%
Ministral 8B1001001000060.0%
DeepSeek V3 (2025-03-24)100100880057.5%
Z.AI GLM 4.7 Flash100807824056.6%
Hermes 3 70B100785542054.8%
Rocinante 12B95786534054.4%
Claude 3.5 Sonnet100100710054.2%
DeepSeek V3.210084780052.3%
GPT-4o, May 13th (temp=0)655849444352.0%
o4 Mini1006834332251.5%
GPT-5.2917335342050.6%
GPT-5 Nano88604444047.1%
o4 Mini High100734319046.9%
GPT-4o, May 13th (temp=1)100484641046.9%
Gemma 3 12B10083500046.6%
Claude Opus 4.59591460046.6%
Qwen 3.5 Plus (2026-02-15)686052312246.4%
DeepSeek V3.110089350044.8%
Arcee AI: Trinity Mini10065550044.0%
WizardLM 2 8x22b97583323042.3%
Claude Opus 4.610062430041.1%
Grok 4.1 Fast87533726040.5%
Claude Sonnet 4.610052480040.1%
Gemini 3 Flash (Preview)8973370039.8%
Gemini 3 Pro (Preview)100353128038.8%
GPT-4o, Aug. 6th (temp=0)7658560038.0%
Grok 410041400036.2%
Claude Haiku 4.57364300033.5%
Z.AI GLM 4.5937200033.0%
GPT-5504637141432.1%
MoonshotAI: Kimi K2.51005800031.7%
GPT-4.1 Nano1005400030.8%
GPT-4o Mini (temp=1)1005000030.0%
Claude 3.7 Sonnet6739390029.1%
GPT-4o, Aug. 6th (temp=1)746900028.7%
ByteDance Seed 1.6 Flash1004200028.3%
Gemini 2.5 Pro1004000028.0%
DeepSeek V3 (2024-12-26)1003400026.7%
GPT-5 Mini943700026.2%
Z.AI GLM 4.64738370024.5%
Claude Sonnet 4684200022.0%
GPT-4.1100000020.0%
Hermes 3 405B100000020.0%
Qwen 2.5 72B100000020.0%
Cohere Command R+ (Aug. 2024)100000020.0%
Grok 4 Fast454100017.1%
Gemini 3.1 Pro (Preview)383600014.8%
Llama 3.1 8B413100014.4%
GPT-4.1 Mini4600009.3%
GPT-4o Mini (temp=0)4400008.8%
Mistral NeMO4100008.2%
DeepSeek-V2 Chat4000007.9%
Mistral Small 3.2 24B100000.2%
ByteDance Seed 1.6000000.0%
Stealth: Aurora Alpha000000.0%
Claude 3.5 Haiku000000.0%
Llama 3.1 70B000000.0%
Llama 3.1 Nemotron 70B000000.0%
Claude 3 Haiku000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X5100100100939096.5%
Gemma 3 12B1001001001008296.5%
Claude Opus 41001001001006593.0%
Ministral 8B100100100827290.7%
Mistral Medium 3.1100100100796689.1%
GPT-5 Nano100100100756688.2%
Mistral Large 2100100100714683.3%
Gemma 3 4B100100100565281.7%
Claude 3.5 Haiku100100100100080.0%
Mistral Large100100100100080.0%
Gemini 2.5 Flash Lite1009173704876.6%
Rocinante 12B10010010079075.9%
Ministral 3 14B10010064635075.3%
Gemma 3 27B10010010074074.8%
Hermes 3 70B10010010074074.7%
Minimax M2.51001009174073.1%
Claude Opus 4.61008872683672.7%
Mistral Small Creative1001007772069.7%
Mistral Large 310010010048069.5%
WizardLM 2 8x22b10010064353066.0%
Ministral 3 3B10010074381765.8%
GPT-4.1 Nano1001006562065.4%
Z.AI GLM 51001008046065.3%
GPT-4o, Aug. 6th (temp=1)1001006461064.9%
Claude Sonnet 4100979332064.4%
Ministral 3 8B10010010014062.7%
GPT-5 Mini1008370282661.6%
ByteDance Seed 1.6 Flash1007661373161.0%
GPT-4.1 Mini1001006538060.4%
Grok 4 Fast100908228060.2%
o4 Mini797169532860.1%
GPT-5.1958569252259.1%
Ministral 3B1001007916058.9%
Claude Opus 4.51001005044058.8%
Claude Haiku 4.5100100890057.7%
Hermes 3 405B100696355057.4%
Claude Sonnet 4.51001004139056.0%
GPT-4o, May 13th (temp=0)100605744052.3%
Claude 3.5 Sonnet10085680050.7%
Gemini 2.5 Pro80724744048.8%
Z.AI GLM 4.7 Flash78725531047.3%
o4 Mini High656352282747.2%
MoonshotAI: Kimi K2.510093400046.6%
Qwen 3.5 397B A17B100534219042.8%
Arcee AI: Trinity Large (Preview)10078360042.8%
Grok 4.1 Fast100484124042.6%
Claude 3 Haiku10062480041.9%
Qwen 3.5 Plus (2026-02-15)10052520040.8%
GPT-4o, May 13th (temp=1)10052500040.4%
Claude Sonnet 4.610010000040.0%
Llama 3.1 Nemotron 70B10010000040.0%
Mistral NeMO10010000040.0%
DeepSeek V3 (2025-03-24)10061390040.0%
Llama 3.1 8B9355510039.5%
Gemini 3 Pro (Preview)563735343238.9%
Gemini 2.5 Flash9448390036.1%
DeepSeek V3.19545380035.7%
Claude 3.7 Sonnet10040380035.7%
GPT-4.18845420035.0%
Grok 410035340033.8%
Gemini 3 Flash (Preview)50474031033.4%
Arcee AI: Trinity Mini6351480032.5%
GPT-4o Mini (temp=1)1005400030.8%
Z.AI GLM 4.76445390029.6%
Z.AI GLM 4.61004800029.5%
ByteDance Seed 1.6786700028.8%
Cohere Command R+ (Aug. 2024)932800024.2%
DeepSeek-V2 Chat724200022.9%
GPT-4o, Aug. 6th (temp=0)100000020.0%
GPT-55921190019.9%
Mistral Small 3.2 24B88000017.7%
DeepSeek V3.2393200014.2%
GPT-5.2481800013.0%
DeepSeek V3 (2024-12-26)58000011.7%
Qwen 2.5 72B4800009.7%
Stealth: Aurora Alpha4400008.9%
Gemini 3.1 Pro (Preview)000000.0%
Z.AI GLM 4.5000000.0%
GPT-4o Mini (temp=0)000000.0%
Llama 3.1 70B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Z.AI GLM 5100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
GPT-4.1 Nano1001001001008997.8%
Mistral Medium 3.11001001001008797.3%
Ministral 3 14B1001001001008496.7%
Ministral 3B10010098958595.7%
Mistral Small Creative1001001001007895.7%
Mistral Large 31001001001003887.6%
Hermes 3 70B100100100696486.7%
GPT-5 Nano1008981817685.3%
Qwen 3.5 397B A17B1009591715582.4%
Mistral Large 2100100100574380.2%
Mistral Large100100100100080.0%
Arcee AI: Trinity Large (Preview)100100100100080.0%
Gemini 2.5 Flash1009883533373.5%
Ministral 3 8B1001008967071.4%
Gemma 3 27B10010085383371.3%
Z.AI GLM 4.510010072443871.0%
GPT-5.11008574474470.0%
Minimax M2.510010010050070.0%
Ministral 3 3B1001007472069.3%
Gemma 3 12B1001009735066.4%
Gemma 3 4B10010046454166.4%
Claude 3 Haiku1001007851065.8%
Rocinante 12B10010069292665.0%
Claude Sonnet 4.61001008243065.0%
GPT-4.1918160562963.3%
Mistral NeMO1006958493762.7%
Z.AI GLM 4.7 Flash96898444062.7%
GPT-4.1 Mini100888342062.6%
Claude Haiku 4.51008069362762.5%
Claude Opus 4.51006961433762.0%
Claude Opus 4866563524161.3%
Claude 3.5 Sonnet100906349060.4%
DeepSeek V3 (2025-03-24)1001001000059.9%
Claude Opus 4.6746564484559.4%
Arcee AI: Trinity Mini100716556058.4%
DeepSeek-V2 Chat100100910058.3%
Gemini 3 Flash (Preview)1001006031058.1%
Qwen 3.5 Plus (2026-02-15)100856144058.0%
Z.AI GLM 4.61008738313157.5%
Gemini 2.5 Pro100857031057.4%
o4 Mini High100885938057.0%
Hermes 3 405B100100810056.3%
WizardLM 2 8x22b100805738055.0%
DeepSeek V3.21006952292554.9%
Claude Sonnet 4.51005656312954.5%
Ministral 8B10089810054.0%
Cohere Command R+ (Aug. 2024)100646144053.8%
Z.AI GLM 4.7887552332153.8%
Claude Sonnet 480787337053.6%
GPT-5.21006235343152.3%
ByteDance Seed 1.6 Flash886756252351.9%
GPT-4o, Aug. 6th (temp=1)100545350051.3%
o4 Mini755845433551.1%
MoonshotAI: Kimi K2.5744948373548.6%
DeepSeek V3 (2024-12-26)100100360047.2%
GPT-4o, May 13th (temp=1)10081440045.0%
GPT-4o Mini (temp=1)9182490044.5%
Qwen 2.5 72B83613534042.7%
Grok 4.1 Fast100444123041.6%
Gemini 2.5 Flash Lite80403937039.2%
Grok 4553934322737.3%
DeepSeek V3.180492823036.1%
Llama 3.1 8B1006700033.3%
GPT-4o Mini (temp=0)47403333030.6%
GPT-4o, Aug. 6th (temp=0)1005200030.4%
ByteDance Seed 1.6766800028.8%
GPT-4o, May 13th (temp=0)1003900027.8%
Grok 4 Fast8329220026.7%
GPT-549272714925.3%
Gemini 3 Pro (Preview)6726200022.6%
Claude 3.7 Sonnet792700021.2%
Claude 3.5 Haiku100000020.0%
Mistral Small 3.2 24B79100016.1%
Llama 3.1 70B67000013.3%
Stealth: Aurora Alpha351900010.8%
Gemini 3.1 Pro (Preview)2700005.4%
GPT-5 Mini1900003.9%
Llama 3.1 Nemotron 70B000000.0%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Mistral Medium 3.1100100100848293.3%
Mistral Large 210010091896889.6%
GPT-5.2100100100935389.2%
Writer: Palmyra X51001001001004388.7%
GPT-4.110010085813780.7%
DeepSeek-V2 Chat10010076725280.1%
Minimax M2.510010078784480.0%
Z.AI GLM 5958987814579.4%
DeepSeek V3 (2025-03-24)100100100564179.2%
Gemma 3 12B100100100454477.8%
GPT-5.110010068655377.0%
GPT-4o, May 13th (temp=1)10010080574476.3%
MoonshotAI: Kimi K2.5100100100322671.5%
GPT-4.1 Nano10010010056071.1%
Ministral 3 14B10010068522569.0%
Claude 3.5 Sonnet100907974068.7%
Mistral Large1009383392968.7%
GPT-4o Mini (temp=1)1001008750067.3%
Gemma 3 4B1006357575666.7%
Claude Opus 4.61009365373465.8%
GPT-5817171624265.4%
Gemini 2.5 Flash1008967333164.0%
Gemma 3 27B100898445063.7%
Qwen 3.5 Plus (2026-02-15)1001007841063.6%
Hermes 3 405B1001006742061.7%
DeepSeek V3.21001006837061.1%
Claude Sonnet 4.6100847840060.6%
Mistral Large 3100935951060.6%
Ministral 8B100956441060.0%
Qwen 3.5 397B A17B947850403659.6%
GPT-5 Mini90757457059.2%
o4 Mini100877532058.7%
Claude Sonnet 4646257544856.9%
Mistral NeMO100757135056.3%
Llama 3.1 Nemotron 70B10095790054.9%
GPT-4o, May 13th (temp=0)1005748353254.6%
Llama 3.1 8B100100710054.2%
Ministral 3B100100680053.6%
Claude Opus 4.5100844239053.2%
GPT-5 Nano1006449331953.0%
Gemini 2.5 Pro1001003529052.8%
Claude Sonnet 4.5100100530050.6%
Claude Haiku 4.5100535247050.4%
Z.AI GLM 4.587725141049.9%
Hermes 3 70B74625655049.4%
Mistral Small Creative835449332348.2%
Claude 3.7 Sonnet10070690047.9%
DeepSeek V3 (2024-12-26)100564341047.9%
o4 Mini High10085530047.6%
Gemini 3.1 Pro (Preview)84763833046.4%
Z.AI GLM 4.610072460043.6%
GPT-4o, Aug. 6th (temp=1)10056510041.4%
Z.AI GLM 4.7100443231041.3%
GPT-4o Mini (temp=0)53504846039.5%
Claude 3 Haiku6756510034.7%
Gemini 2.5 Flash Lite10037350034.5%
Ministral 3 3B1006070033.4%
Gemini 3 Flash (Preview)857800032.8%
Gemini 3 Pro (Preview)6865280032.2%
ByteDance Seed 1.6856900031.0%
Z.AI GLM 4.7 Flash6442410029.3%
Ministral 3 8B874570027.7%
DeepSeek V3.15446370027.3%
Qwen 2.5 72B744600023.9%
Claude Opus 44436350023.0%
GPT-4o, Aug. 6th (temp=0)614900021.9%
Cohere Command R+ (Aug. 2024)624600021.5%
Mistral Small 3.2 24B100500021.1%
Claude 3.5 Haiku100000020.0%
GPT-4.1 Mini100000020.0%
Rocinante 12B100000020.0%
Grok 4 Fast623600019.7%
ByteDance Seed 1.6 Flash3231250017.6%
Arcee AI: Trinity Large (Preview)493700017.3%
Llama 3.1 70B65000013.1%
WizardLM 2 8x22b65000013.1%
Grok 43600007.2%
Grok 4.1 Fast000000.0%
Stealth: Aurora Alpha000000.0%
Arcee AI: Trinity Mini000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Minimax M2.5100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Mistral Large 31001001001009599.0%
DeepSeek V3.21001001001009598.9%
GPT-5.21001001001009498.9%
GPT-5.11001001001008797.3%
Grok 4 Fast100100100958696.3%
Claude Sonnet 41001001001008196.3%
Z.AI GLM 4.51001001001008196.3%
GPT-4o, Aug. 6th (temp=1)1001001001007995.9%
GPT-4o Mini (temp=1)10010098977994.8%
Ministral 3 8B1001001001007194.2%
GPT-5 Mini100100100907192.0%
Claude Opus 4.61001001001005590.9%
GPT-4o Mini (temp=0)1001001001004889.6%
Z.AI GLM 5100100100756988.8%
Ministral 3 14B100100100796488.7%
DeepSeek V3.1100100100696787.1%
Claude Sonnet 4.5100100100766087.1%
Mistral Small Creative1001001001003586.9%
DeepSeek-V2 Chat1001001001003486.9%
Claude Opus 4.51001001001003486.8%
Mistral NeMO100100100884486.3%
Qwen 3.5 Plus (2026-02-15)1001001001003186.3%
GPT-4.11001001001003186.1%
Z.AI GLM 4.71009787856085.8%
o4 Mini High10010081806885.7%
Mistral Large1001001001002685.3%
Z.AI GLM 4.6100100100933285.1%
Ministral 3 3B100100100942383.4%
Gemini 2.5 Flash Lite100100100743982.6%
Qwen 3.5 397B A17B1009995793882.2%
GPT-5 Nano1009988814181.7%
Mistral Medium 3.110010092694481.1%
GPT-4.1 Mini10010089724280.7%
o4 Mini10010080705280.3%
DeepSeek V3 (2025-03-24)100100100100080.0%
Hermes 3 70B100100100100080.0%
ByteDance Seed 1.6 Flash10010071656379.9%
DeepSeek V3 (2024-12-26)10010010093078.5%
Gemini 2.5 Pro100100100562476.0%
Claude 3.5 Sonnet10010061605875.7%
Gemini 3 Flash (Preview)878684585674.2%
Claude Opus 4100988874072.0%
Z.AI GLM 4.7 Flash1001008574071.7%
Grok 410010010057071.5%
Grok 4.1 Fast10010070592370.3%
Claude 3 Haiku1001008256067.6%
Rocinante 12B10010010037067.4%
Gemini 3.1 Pro (Preview)1008578432365.9%
Gemini 3 Pro (Preview)1008263473765.9%
Ministral 8B1001007847064.9%
Arcee AI: Trinity Mini1001009131064.6%
GPT-5986052494360.5%
Claude 3.5 Haiku1001001000060.0%
Cohere Command R+ (Aug. 2024)1007452413259.8%
Hermes 3 405B100796057059.3%
Claude Sonnet 4.6100846343058.0%
Claude 3.7 Sonnet100635856055.6%
Llama 3.1 Nemotron 70B100100560051.1%
Arcee AI: Trinity Large (Preview)10083710050.9%
MoonshotAI: Kimi K2.5100100500049.9%
Llama 3.1 8B10091560049.4%
Ministral 3B10068480043.2%
Llama 3.1 70B7467620040.5%
GPT-4o, Aug. 6th (temp=0)1009400038.8%
ByteDance Seed 1.61009100038.3%
GPT-4o, May 13th (temp=0)1007700035.3%
Mistral Small 3.2 24B10043285035.1%
Stealth: Aurora Alpha68522521033.1%
WizardLM 2 8x22b585600023.0%
Qwen 2.5 72B88000017.5%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Mistral Large100100100100100100.0%
Mistral Large 21001001001009599.0%
Ministral 3B100100100988195.9%
Mistral Medium 3.11001001001006793.3%
Gemini 2.5 Pro1001001001005891.7%
Writer: Palmyra X5100100100935689.6%
GPT-5.110010094936089.3%
Gemma 3 4B100100100935188.8%
Ministral 3 14B100100100875688.6%
Mistral Small Creative1001001001003687.2%
Claude Haiku 4.51009895914686.1%
GPT-5.21009384796884.9%
o4 Mini100100100683380.3%
GPT-5 Nano10010095672076.5%
Minimax M2.510010093403974.6%
GPT-4.11008276753774.0%
Ministral 8B100100100343273.2%
Claude Opus 4.610010070603272.5%
Llama 3.1 70B10010010060071.8%
Claude Opus 41008878484471.7%
DeepSeek V3 (2025-03-24)1001009362070.9%
Claude Sonnet 4918887483870.4%
o4 Mini High1007867663469.2%
GPT-4o, May 13th (temp=0)10010010044068.8%
Mistral Large 31001007946065.0%
Claude Sonnet 4.5100898451064.9%
Claude 3 Haiku1001006756064.4%
DeepSeek-V2 Chat1008055523364.1%
Ministral 3 8B100999620063.1%
Claude 3.5 Haiku100746964061.5%
Rocinante 12B1001005549060.8%
ByteDance Seed 1.6 Flash10010037323260.1%
Gemini 2.5 Flash Lite1009642342860.1%
GPT-4o, May 13th (temp=1)97905853059.6%
DeepSeek V3 (2024-12-26)1001005640059.3%
Arcee AI: Trinity Large (Preview)100856043057.5%
Z.AI GLM 5100906034056.7%
Claude Opus 4.588886740056.4%
Gemma 3 12B100786737056.2%
GPT-51007145441855.6%
Mistral NeMO100100720054.5%
Claude Sonnet 4.6100815237054.1%
Z.AI GLM 4.7 Flash986935333253.2%
Gemini 2.5 Flash100785631053.1%
Ministral 3 3B100615641051.7%
Gemini 3.1 Pro (Preview)100725329050.9%
Qwen 3.5 Plus (2026-02-15)100514744048.4%
Hermes 3 405B10076620047.5%
Gemini 3 Pro (Preview)93624635047.1%
DeepSeek V3.210098360046.8%
MoonshotAI: Kimi K2.510081480045.8%
GPT-4.1 Nano10060550042.8%
GPT-4o, Aug. 6th (temp=1)10057560042.8%
GPT-4o Mini (temp=1)52504945039.2%
Z.AI GLM 4.685423728038.5%
Qwen 3.5 397B A17B68523633037.8%
DeepSeek V3.19956290036.8%
Mistral Small 3.2 24B1007800035.6%
Gemma 3 27B7166410035.6%
Claude 3.5 Sonnet1007600035.2%
Llama 3.1 Nemotron 70B1007600035.2%
Gemini 3 Flash (Preview)9140390034.0%
GPT-5 Mini100232222033.4%
GPT-4o Mini (temp=0)6051480031.5%
Grok 4726600027.7%
Grok 4 Fast6538310026.7%
GPT-4.1 Mini676100025.5%
Grok 4.1 Fast5935310025.1%
Z.AI GLM 4.5784600024.6%
WizardLM 2 8x22b724800024.2%
Qwen 2.5 72B645100022.9%
Claude 3.7 Sonnet3838340022.0%
Z.AI GLM 4.73535310020.3%
Hermes 3 70B100000020.0%
Llama 3.1 8B98000019.6%
GPT-4o, Aug. 6th (temp=0)444000016.9%
Cohere Command R+ (Aug. 2024)64000012.8%
Arcee AI: Trinity Mini52000010.4%
Stealth: Aurora Alpha3300006.7%
ByteDance Seed 1.6000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X51001001001007795.3%
Claude 3.5 Sonnet1001001001007494.8%
Ministral 3 14B1001001001006793.3%
Mistral Medium 3.110010092886789.5%
Mistral Small Creative1009190897689.1%
Claude Sonnet 4.5100100100933886.1%
Mistral Large 3100100100814485.1%
Qwen 3.5 397B A17B10010077636280.4%
Claude Sonnet 4.610010097534278.2%
GPT-4.1 Nano1009169645776.5%
Claude Opus 410010010082076.5%
Mistral Large 21001009585076.1%
GPT-5.11008867635775.1%
Gemini 3 Flash (Preview)10010010075075.0%
Gemma 3 12B10010093413874.2%
DeepSeek-V2 Chat1009368624172.6%
DeepSeek V3 (2025-03-24)1001008172070.8%
Ministral 8B1001009046067.2%
ByteDance Seed 1.6 Flash1009065492165.0%
Minimax M2.5100978341064.1%
Gemini 3.1 Pro (Preview)100948534062.7%
Ministral 3 3B100848137461.2%
Mistral Large1007250443560.3%
GPT-4o, Aug. 6th (temp=1)1001001000060.0%
Claude 3 Haiku98746260058.7%
Rocinante 12B10095760054.2%
Ministral 3B100626046053.5%
Qwen 3.5 Plus (2026-02-15)100854239053.3%
Z.AI GLM 4.7 Flash696760372952.7%
GPT-5100853938052.5%
Gemma 3 4B585251494851.8%
Z.AI GLM 4.795715631050.5%
GPT-5.21006745171649.1%
Mistral NeMO10076630047.7%
o4 Mini100100360047.2%
Claude Opus 4.6845632312946.6%
Cohere Command R+ (Aug. 2024)9385510045.9%
Gemini 2.5 Pro1003331313045.0%
Gemini 3 Pro (Preview)8278620044.4%
GPT-4o, May 13th (temp=1)9178510044.2%
Claude Opus 4.510073360041.9%
GPT-4.1 Mini10056510041.4%
Z.AI GLM 510069310040.0%
o4 Mini High10067290039.2%
Ministral 3 8B10057361038.9%
Hermes 3 405B1009000038.0%
MoonshotAI: Kimi K2.59058400037.7%
Gemma 3 27B10039360035.0%
GPT-4o Mini (temp=1)5756550033.5%
DeepSeek V3.17744390031.9%
GPT-4.17840340030.4%
DeepSeek V3.26058330030.4%
Claude 3.7 Sonnet43373535030.1%
Gemini 2.5 Flash44333130027.6%
Grok 4696800027.5%
Grok 4 Fast4844370025.8%
Claude Sonnet 4626100024.5%
GPT-5 Nano5749150024.2%
Llama 3.1 8B794100024.0%
DeepSeek V3 (2024-12-26)833300023.3%
GPT-4o, Aug. 6th (temp=0)605400022.7%
Z.AI GLM 4.64137350022.5%
Claude Haiku 4.5565100021.4%
Claude 3.5 Haiku100000020.0%
Arcee AI: Trinity Mini98000019.6%
Arcee AI: Trinity Large (Preview)94000018.8%
Llama 3.1 Nemotron 70B90000018.0%
Gemini 2.5 Flash Lite423600015.6%
Llama 3.1 70B56000011.3%
GPT-4o, May 13th (temp=0)56000011.1%
ByteDance Seed 1.650000010.0%
Z.AI GLM 4.54300008.5%
Grok 4.1 Fast2600005.3%
GPT-5 Mini2100004.3%
Mistral Small 3.2 24B300000.5%
Stealth: Aurora Alpha000000.0%
GPT-4o Mini (temp=0)000000.0%
Qwen 2.5 72B000000.0%
Hermes 3 70B000000.0%
WizardLM 2 8x22b000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-4.1100100100896590.7%
Mistral Large 31001001001004388.5%
GPT-5.1100100100796288.3%
Mistral Large1001001001003486.7%
Mistral Medium 3.110010083746283.7%
Gemini 3 Flash (Preview)10010090833882.3%
Mistral Small Creative10010096624380.2%
Writer: Palmyra X5100100100100080.0%
Claude Opus 41008873726479.4%
Claude Opus 4.61001009879075.5%
Arcee AI: Trinity Large (Preview)10010089454175.0%
Minimax M2.510010094463374.5%
GPT-5 Mini10010089473774.5%
Ministral 3 14B10010071574474.5%
Gemini 3.1 Pro (Preview)10010010069073.8%
Mistral Large 2100100100442573.8%
Claude 3.5 Sonnet1007671626073.6%
DeepSeek V3 (2024-12-26)1008481524973.4%
Ministral 3 3B10010061534671.8%
GPT-5.21008270545171.4%
GPT-4o, May 13th (temp=1)10010056514870.9%
Claude 3 Haiku10010010051070.3%
o4 Mini1001008166069.5%
Claude Sonnet 4.510010010041068.1%
Z.AI GLM 4.61007870553567.7%
Ministral 3 8B96967761667.1%
o4 Mini High1001006965067.0%
GPT-5 Nano1007957552262.7%
Z.AI GLM 51001007337062.1%
Gemma 3 27B787568454261.7%
DeepSeek V3 (2025-03-24)100906356061.7%
Gemini 2.5 Pro100826258060.4%
Claude 3.5 Haiku1001001000060.0%
Rocinante 12B1001001000060.0%
Claude Opus 4.51001004442057.2%
Cohere Command R+ (Aug. 2024)1001005134056.9%
GPT-51005550413656.4%
Qwen 3.5 397B A17B100646351055.7%
Ministral 3B100100650053.1%
Z.AI GLM 4.7646357522752.9%
Claude 3.7 Sonnet90746237052.6%
GPT-4.1 Mini100100630052.6%
GPT-4.1 Nano100100630052.6%
DeepSeek V3.277767632052.0%
Claude Haiku 4.5100555151051.4%
Gemini 3 Pro (Preview)655452443450.0%
Grok 4 Fast98694236049.1%
DeepSeek V3.1100723835049.0%
Gemma 3 4B10095480048.6%
Grok 4.1 Fast95893026048.0%
Claude Sonnet 4100100400047.9%
Qwen 3.5 Plus (2026-02-15)10093460047.7%
Llama 3.1 8B100100190043.8%
ByteDance Seed 1.6 Flash10080360043.3%
Z.AI GLM 4.7 Flash78723026041.2%
Grok 491423732040.3%
GPT-4o Mini (temp=1)10051440039.0%
GPT-4o, May 13th (temp=0)10046450038.1%
Arcee AI: Trinity Mini1009000038.0%
DeepSeek-V2 Chat9344390035.2%
Llama 3.1 70B1007200034.5%
Gemini 2.5 Flash8456260033.4%
GPT-4o, Aug. 6th (temp=1)1006500033.1%
GPT-4o, Aug. 6th (temp=0)1006500033.1%
Gemini 2.5 Flash Lite9640280032.8%
Llama 3.1 Nemotron 70B857600032.2%
Hermes 3 70B747100029.0%
Qwen 2.5 72B5348440029.0%
Stealth: Aurora Alpha1004300028.5%
Ministral 8B1004100028.2%
MoonshotAI: Kimi K2.51003600027.2%
Hermes 3 405B785100025.8%
Z.AI GLM 4.5884000025.6%
WizardLM 2 8x22b695400024.6%
Claude Sonnet 4.6100000020.0%
Mistral NeMO100000020.0%
GPT-4o Mini (temp=0)93000018.5%
Gemma 3 12B93000018.5%
ByteDance Seed 1.665000013.1%
Mistral Small 3.2 24B4500009.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X5100100100100100100.0%
Mistral Medium 3.11001001001008597.1%
GPT-5.2100100100897793.2%
Gemini 2.5 Flash Lite100100100887191.7%
Claude Haiku 4.5100100100946491.6%
GPT-4.1 Mini1001001001005691.1%
GPT-4.1 Nano1001001001004989.8%
Gemma 3 4B100100100905889.7%
Mistral Large 3100100100795787.4%
Mistral NeMO100100100933986.3%
Mistral Small Creative100100100805186.3%
Claude 3.5 Sonnet100100100625884.0%
Ministral 3 8B10010094783982.1%
Llama 3.1 8B100100100100080.0%
DeepSeek-V2 Chat1009878715279.7%
GPT-5.1100100100682779.1%
Claude Sonnet 4.510010010089077.8%
Gemma 3 27B10010092633177.2%
Minimax M2.5100100100413976.0%
ByteDance Seed 1.610010010062072.3%
DeepSeek V3 (2025-03-24)10010010052070.4%
Ministral 3 3B10010090372370.1%
GPT-4.11001009160070.1%
Arcee AI: Trinity Large (Preview)10010010044068.8%
Claude Sonnet 4.61001009444067.8%
Ministral 8B10010069343166.7%
Qwen 3.5 397B A17B897863544566.0%
Ministral 3B957965563365.7%
Ministral 3 14B10010052423265.1%
Mistral Large100878156064.7%
Rocinante 12B100837249060.9%
MoonshotAI: Kimi K2.51001006137059.4%
o4 Mini High1008168242359.3%
DeepSeek V3.2867856502559.0%
Claude Opus 4100877233058.4%
Z.AI GLM 5100906238058.0%
Claude 3.5 Haiku100100900058.0%
Gemini 2.5 Flash1001004938057.5%
Z.AI GLM 4.7 Flash1006954352857.2%
Claude Opus 4.590865632052.8%
GPT-4o, Aug. 6th (temp=1)100100600051.9%
DeepSeek V3 (2024-12-26)10094560050.2%
ByteDance Seed 1.6 Flash100605135049.0%
Mistral Large 2100100420048.3%
GPT-5 Mini665553442147.7%
Claude Opus 4.681605634046.2%
Z.AI GLM 4.510085420045.3%
Hermes 3 70B10067570044.8%
GPT-5914543241643.7%
Llama 3.1 70B7271710042.9%
Claude 3 Haiku10071390041.9%
Claude 3.7 Sonnet724037292941.5%
DeepSeek V3.177483128036.8%
Gemini 3 Flash (Preview)8165350036.3%
GPT-4o, May 13th (temp=1)10043370036.1%
Qwen 3.5 Plus (2026-02-15)65393837035.7%
Gemini 3.1 Pro (Preview)57553229034.7%
Gemma 3 12B1007300034.7%
Grok 4.1 Fast7863310034.4%
Claude Sonnet 46056480032.5%
Hermes 3 405B857200031.6%
GPT-4o Mini (temp=1)5653460031.0%
o4 Mini6852330030.6%
Qwen 2.5 72B8235310029.7%
Z.AI GLM 4.749482524029.3%
Arcee AI: Trinity Mini796700029.2%
Gemini 3 Pro (Preview)54292826027.5%
Z.AI GLM 4.637353227026.2%
GPT-5 Nano74261413025.4%
GPT-4o, Aug. 6th (temp=0)544800020.4%
WizardLM 2 8x22b100000020.0%
GPT-4o Mini (temp=0)514200018.5%
Llama 3.1 Nemotron 70B88000017.5%
GPT-4o, May 13th (temp=0)363600014.5%
Gemini 2.5 Pro52000010.5%
Grok 4252500010.0%
Cohere Command R+ (Aug. 2024)4600009.1%
Grok 4 Fast2700005.4%
Mistral Small 3.2 24B630001.9%
Stealth: Aurora Alpha000000.0%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X51001001001007294.3%
GPT-4.1 Nano1001001001005490.8%
Mistral Small Creative100100100100080.0%
Z.AI GLM 510010010085077.1%
Claude Opus 4.510010081494274.5%
Claude Sonnet 4.51001008860069.4%
Minimax M2.51001008360068.6%
Claude Opus 4100908961068.0%
Ministral 3 14B1008558564067.8%
Mistral Large100908956067.1%
Gemma 3 27B979052524366.7%
Claude Sonnet 41001006864066.4%
Claude Opus 4.61001006759065.3%
GPT-5.110010047403964.9%
Ministral 3 3B93887268064.2%
Mistral Large 310010047423063.7%
GPT-4.1 Mini100946356062.7%
o4 Mini High1008457462662.6%
Mistral Large 21001007635062.2%
Gemini 2.5 Flash Lite955856523759.8%
GPT-4o Mini (temp=1)100955151059.2%
Ministral 8B1001005640059.2%
Gemma 3 4B1005350484258.5%
Claude 3 Haiku100766351057.8%
WizardLM 2 8x22b100896134056.7%
Gemma 3 12B1001004140056.1%
Claude 3.7 Sonnet1001004236055.6%
GPT-5.21008040341954.6%
MoonshotAI: Kimi K2.5100964829054.5%
Gemini 2.5 Flash100725537052.7%
Z.AI GLM 4.7 Flash100716527052.5%
DeepSeek V3 (2025-03-24)100100600051.9%
GPT-5 Mini746058422251.0%
Ministral 3 8B100100540050.8%
GPT-4o, Aug. 6th (temp=1)100100510050.1%
DeepSeek V3.1100514746048.6%
Claude Haiku 4.510088530048.1%
DeepSeek V3 (2024-12-26)10088400045.5%
Mistral Medium 3.110097280045.1%
o4 Mini934330292043.1%
GPT-4o, May 13th (temp=1)9161440039.2%
Z.AI GLM 4.510047460038.6%
Mistral NeMO10058340038.6%
Hermes 3 70B10048390037.4%
GPT-5 Nano803834171536.6%
Cohere Command R+ (Aug. 2024)1008100036.3%
Qwen 2.5 72B7852470035.3%
Qwen 3.5 397B A17B69464117034.7%
Mistral Small 3.2 24B957700034.3%
GPT-4.110036330033.8%
Gemini 3 Flash (Preview)7253430033.6%
ByteDance Seed 1.6 Flash68313126031.3%
Claude Sonnet 4.61005600031.1%
Qwen 3.5 Plus (2026-02-15)45373634030.5%
Ministral 3B836800030.3%
Gemini 2.5 Pro984400028.5%
Z.AI GLM 4.65347400027.9%
Rocinante 12B1003700027.3%
Z.AI GLM 4.77038270027.0%
Gemini 3.1 Pro (Preview)45272625024.5%
Grok 4 Fast6232260024.1%
GPT-55633270023.3%
GPT-4o, Aug. 6th (temp=0)625100022.4%
Grok 4.1 Fast5826260021.9%
Claude 3.5 Sonnet100000020.0%
Hermes 3 405B100000020.0%
Arcee AI: Trinity Large (Preview)504600019.2%
GPT-4o Mini (temp=0)85000017.1%
DeepSeek V3.2433900016.4%
Grok 42826250015.8%
Llama 3.1 8B71000014.2%
GPT-4o, May 13th (temp=0)412700013.6%
Gemini 3 Pro (Preview)59000011.8%
DeepSeek-V2 Chat51000010.3%
ByteDance Seed 1.63200006.4%
Stealth: Aurora Alpha2500005.1%
Claude 3.5 Haiku000000.0%
Llama 3.1 70B000000.0%
Llama 3.1 Nemotron 70B000000.0%
Arcee AI: Trinity Mini000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.5100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Gemma 3 4B100100100100100100.0%
Gemma 3 12B1001001001009999.8%
Arcee AI: Trinity Mini1001001001009899.6%
GPT-4o, Aug. 6th (temp=1)1001001001009498.8%
Gemini 2.5 Flash Lite1001001001008997.8%
Claude Opus 41001001001008797.3%
Z.AI GLM 51001001001008697.2%
Rocinante 12B100100100958395.7%
Gemma 3 27B1001001001007795.3%
Claude 3 Haiku1001001001006492.8%
GPT-5.1100100100966492.1%
Claude Opus 4.51001001001005591.0%
Ministral 3 8B100100100876790.6%
Mistral Large 21001001001004889.7%
Ministral 8B1001001001004889.5%
Claude Opus 4.61001001001004689.3%
WizardLM 2 8x22b10010098905688.8%
Z.AI GLM 4.610010093896088.4%
DeepSeek V3 (2024-12-26)10010093816888.4%
Arcee AI: Trinity Large (Preview)1001001001004188.2%
Mistral Large 31001001001003887.7%
GPT-5.210010092925186.9%
Z.AI GLM 4.710010092875686.8%
GPT-4o, May 13th (temp=1)10010081787286.3%
Minimax M2.51009591885586.0%
Hermes 3 70B100100100645684.1%
DeepSeek V3.1100100100585783.1%
GPT-4.11009875706782.0%
Claude Sonnet 4100100100585081.6%
DeepSeek V3.210010095574980.4%
DeepSeek V3 (2025-03-24)100100100100080.0%
DeepSeek-V2 Chat100100100100080.0%
Z.AI GLM 4.5100100100100080.0%
Ministral 3 14B100100100100080.0%
Mistral NeMO100100100100080.0%
GPT-4o Mini (temp=0)10010083783979.9%
Gemini 3 Pro (Preview)10010084625379.9%
Claude Sonnet 4.610010010098079.6%
Gemini 2.5 Flash1008477736379.6%
GPT-4.1 Mini100100100574079.5%
Claude Haiku 4.510010072635778.6%
Grok 4100100100721878.0%
o4 Mini High10010010085077.1%
o4 Mini10010080594576.7%
Gemini 2.5 Pro1001009280074.4%
Z.AI GLM 4.7 Flash1007872655574.1%
Ministral 3B10010010061072.1%
Llama 3.1 70B100939371071.2%
ByteDance Seed 1.6 Flash100918774070.3%
GPT-4o Mini (temp=1)10010010041068.1%
Hermes 3 405B1001007843064.1%
Grok 4.1 Fast968664362862.1%
GPT-5 Mini100827139058.4%
Qwen 3.5 Plus (2026-02-15)100767244058.4%
Llama 3.1 Nemotron 70B100100810056.3%
Ministral 3 3B10093810054.8%
Mistral Small 3.2 24B100100418350.4%
Qwen 3.5 397B A17B756967231449.6%
Grok 4 Fast100595426047.8%
Gemini 3.1 Pro (Preview)875141252545.6%
Claude 3.7 Sonnet100672927044.7%
Llama 3.1 8B10079400043.8%
Qwen 2.5 72B10046430037.7%
GPT-5 Nano694134221736.5%
GPT-4o, Aug. 6th (temp=0)10044380036.3%
GPT-588431713032.3%
GPT-4o, May 13th (temp=0)6345330028.2%
Claude 3.5 Sonnet746100026.9%
MoonshotAI: Kimi K2.55832280023.8%
Gemini 3 Flash (Preview)5327220020.5%
Claude 3.5 Haiku100000020.0%
Stealth: Aurora Alpha333000012.7%
ByteDance Seed 1.6000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Ministral 3 8B100100100958295.5%
Ministral 3 14B1001001001007895.5%
Mistral Small Creative1001001001005891.6%
Mistral Large 21001001001004889.5%
Mistral Medium 3.110010089837288.8%
Writer: Palmyra X5100100100973786.7%
DeepSeek V3 (2024-12-26)100100100100080.0%
Mistral Large 3100100100100080.0%
Gemma 3 12B1009485833779.9%
Mistral Large100989781075.2%
Mistral NeMO10010010067073.3%
Z.AI GLM 510010083463472.6%
Claude Sonnet 4.610010010061072.1%
o4 Mini High10010065503569.9%
Gemma 3 27B100908537062.6%
Claude Opus 4.51007467393062.0%
Claude Sonnet 4.51001006148061.6%
Ministral 3 3B10010041292859.7%
DeepSeek-V2 Chat100100890057.8%
Claude Sonnet 4100895142056.3%
Minimax M2.5100100780055.5%
DeepSeek V3 (2025-03-24)100565655053.2%
MoonshotAI: Kimi K2.5100100620052.5%
Claude 3.7 Sonnet954845393552.4%
GPT-4.1100565146050.8%
ByteDance Seed 1.6 Flash100100410048.2%
Ministral 8B100484844048.1%
GPT-4.1 Nano67645851048.0%
Claude Haiku 4.5100514638047.2%
Z.AI GLM 4.6100100350046.9%
Gemma 3 4B10089400045.8%
Ministral 3B10080410044.3%
Qwen 2.5 72B10061580043.8%
Z.AI GLM 4.59867500042.9%
GPT-4o, May 13th (temp=1)54544644039.5%
Gemini 3 Pro (Preview)10063340039.5%
GPT-4o Mini (temp=1)10052440039.3%
o4 Mini10061260037.5%
GPT-4o, Aug. 6th (temp=1)6556550035.3%
Rocinante 12B1007400034.7%
GPT-5.157523925034.6%
Llama 3.1 Nemotron 70B1007200034.5%
Hermes 3 70B1006900033.9%
Llama 3.1 8B1006800033.6%
Claude Opus 4.610039280033.4%
DeepSeek V3.29334310031.5%
GPT-5.256532215029.2%
Gemini 2.5 Flash Lite1004600029.1%
WizardLM 2 8x22b6840350028.7%
Grok 4.1 Fast7140280027.9%
Claude Opus 44846420027.2%
GPT-540383122026.1%
GPT-4o, May 13th (temp=0)4542420025.8%
DeepSeek V3.1794800025.4%
Arcee AI: Trinity Large (Preview)744800024.3%
GPT-5 Nano6730240024.1%
Qwen 3.5 397B A17B46311413020.6%
Claude 3.5 Haiku100000020.0%
Gemini 2.5 Flash100000020.0%
Cohere Command R+ (Aug. 2024)100000020.0%
Llama 3.1 70B85000017.1%
Gemini 2.5 Pro79000015.9%
Hermes 3 405B78000015.5%
Gemini 3 Flash (Preview)403600015.2%
GPT-5 Mini2826220015.2%
Z.AI GLM 4.7393600014.9%
Claude 3 Haiku65000013.1%
Z.AI GLM 4.7 Flash62000012.3%
GPT-4.1 Mini62000012.3%
Grok 4332800012.2%
Grok 4 Fast342500011.8%
GPT-4o Mini (temp=0)53000010.6%
ByteDance Seed 1.63600007.2%
Mistral Small 3.2 24B2000004.0%
Gemini 3.1 Pro (Preview)000000.0%
Stealth: Aurora Alpha000000.0%
Qwen 3.5 Plus (2026-02-15)000000.0%
Claude 3.5 Sonnet000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
Arcee AI: Trinity Mini000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Mistral Medium 3.11001001001008897.5%
Mistral Small Creative1001001001007294.3%
Writer: Palmyra X5100100100100080.0%
Ministral 8B10010010094078.8%
Ministral 3B10010010076075.2%
Ministral 3 8B10010010065073.1%
DeepSeek V3 (2025-03-24)1001009568072.7%
Cohere Command R+ (Aug. 2024)10010010050070.0%
Rocinante 12B1001009551069.3%
Mistral Large 210010010036067.2%
Claude Haiku 4.51009354513566.4%
Ministral 3 14B1001006052062.3%
Claude Opus 4.6100847155062.0%
Mistral Large 31001001000060.0%
GPT-4.1 Nano1001001000060.0%
Gemma 3 27B100855338055.3%
Qwen 3.5 Plus (2026-02-15)1006941352854.6%
GPT-5.1696859532154.0%
Hermes 3 405B10081670049.6%
Mistral Large10096480048.8%
GPT-4.1100100360047.2%
Arcee AI: Trinity Large (Preview)8883610046.3%
ByteDance Seed 1.6 Flash10084470046.2%
GPT-4.1 Mini10078510045.8%
o4 Mini High88833223045.5%
Claude Opus 4.510067600045.4%
DeepSeek V3 (2024-12-26)10079440044.6%
GPT-4o Mini (temp=1)10071490044.0%
Gemma 3 4B9978350042.3%
Z.AI GLM 4.7 Flash10071410042.3%
MoonshotAI: Kimi K2.510066450042.2%
Hermes 3 70B10010000040.0%
Grok 492403433039.6%
Qwen 3.5 397B A17B9271310038.6%
o4 Mini8852510038.1%
Gemini 3 Flash (Preview)64563732037.9%
DeepSeek V3.1948300035.4%
Gemini 2.5 Flash Lite1007500035.0%
Gemma 3 12B8248400034.0%
Claude Sonnet 4.58551320033.6%
Mistral NeMO6358440033.2%
GPT-4o, Aug. 6th (temp=1)1006400032.8%
Z.AI GLM 510034290032.6%
Claude Opus 46648450031.9%
WizardLM 2 8x22b64372827031.1%
GPT-4o, Aug. 6th (temp=0)1004900029.8%
GPT-552393623029.7%
GPT-5 Nano5649380028.7%
Ministral 3 3B716500027.3%
DeepSeek V3.26438320026.8%
Claude 3 Haiku686100025.7%
Gemini 2.5 Pro794900025.7%
Gemini 2.5 Flash923500025.4%
Gemini 3 Pro (Preview)32313025023.6%
Z.AI GLM 4.63939380023.3%
Qwen 2.5 72B714100022.4%
Claude Sonnet 4.6624900022.1%
GPT-4o, May 13th (temp=1)564800020.8%
GPT-5.234272219020.3%
Claude 3.5 Haiku100000020.0%
Llama 3.1 70B100000020.0%
DeepSeek-V2 Chat534600019.7%
Claude 3.7 Sonnet514600019.4%
Llama 3.1 Nemotron 70B90000018.0%
Arcee AI: Trinity Mini90000018.0%
Grok 4 Fast543100017.0%
GPT-5 Mini3925190016.6%
Claude 3.5 Sonnet79000015.9%
ByteDance Seed 1.678000015.5%
Mistral Small 3.2 24B68100013.9%
Minimax M2.558000011.7%
Claude Sonnet 458000011.7%
Llama 3.1 8B58000011.7%
Z.AI GLM 4.550000010.0%
GPT-4o Mini (temp=0)4800009.7%
Grok 4.1 Fast26200009.2%
Gemini 3.1 Pro (Preview)000000.0%
Z.AI GLM 4.7000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, May 13th (temp=0)000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Mistral Medium 3.1100100100897893.3%
Minimax M2.5100100100976792.7%
Writer: Palmyra X51001001001005791.5%
Mistral Small Creative100100100737188.8%
Ministral 3 14B1001001001003787.3%
Claude Opus 4100100100843884.5%
Claude Opus 4.6100100100922984.1%
Ministral 8B100100100100080.0%
Ministral 3 8B10010082565177.8%
Gemma 3 27B10010010085077.0%
Mistral Large10010010081076.3%
Claude Opus 4.510010098423875.7%
Hermes 3 405B1008472615875.1%
ByteDance Seed 1.6 Flash100100100403474.7%
GPT-4o, Aug. 6th (temp=1)10010010055070.9%
o4 Mini High1009085562370.9%
Claude 3 Haiku1008160565369.9%
MoonshotAI: Kimi K2.51009673423769.7%
Mistral Large 21007167554266.8%
DeepSeek V3 (2025-03-24)100958950066.8%
Ministral 3 3B1001007953066.5%
GPT-5.11001008539064.8%
Claude Haiku 4.5896261535062.8%
Claude 3.7 Sonnet939043423761.0%
Llama 3.1 8B1001001000060.0%
Mistral Large 3100777537057.7%
Gemma 3 12B1001004438056.5%
GPT-4.1 Nano100745553056.3%
Claude 3.5 Sonnet10093780054.0%
GPT-5 Nano1006461261753.7%
GPT-4o, May 13th (temp=1)91635655052.9%
Z.AI GLM 4.687845439052.6%
Grok 4786758322852.5%
GPT-4.19190740051.1%
GPT-5.21004544371448.1%
WizardLM 2 8x22b100624434047.9%
DeepSeek V3.110095430047.6%
Qwen 3.5 397B A17B100645119046.7%
Z.AI GLM 4.7 Flash75514841042.8%
DeepSeek V3 (2024-12-26)10065440041.8%
GPT-4o Mini (temp=1)10057480041.0%
Ministral 3B10057440040.1%
Claude Sonnet 4.610010000040.0%
DeepSeek-V2 Chat10010000040.0%
Hermes 3 70B10010000040.0%
Arcee AI: Trinity Mini10010000040.0%
Gemini 2.5 Pro1009700039.4%
Claude Sonnet 47664560039.3%
Claude Sonnet 4.510051450039.1%
Gemini 3 Pro (Preview)7756530037.2%
Grok 4 Fast9842410036.2%
Z.AI GLM 510042310034.6%
Gemini 2.5 Flash Lite8259310034.5%
GPT-569433723034.5%
Z.AI GLM 4.57948410033.8%
Gemma 3 4B1005500030.9%
Arcee AI: Trinity Large (Preview)1004300028.7%
DeepSeek V3.26544310028.2%
GPT-4.1 Mini686700026.9%
Z.AI GLM 4.71003100026.2%
Grok 4.1 Fast50322622026.0%
GPT-5 Mini282726252125.3%
o4 Mini913100024.4%
Rocinante 12B633800020.1%
Claude 3.5 Haiku100000020.0%
Llama 3.1 70B100000020.0%
Cohere Command R+ (Aug. 2024)100000020.0%
Gemini 3 Flash (Preview)3635290020.0%
Qwen 3.5 Plus (2026-02-15)563900018.9%
Gemini 2.5 Flash473500016.4%
Mistral Small 3.2 24B78000015.5%
Llama 3.1 Nemotron 70B71000014.2%
Qwen 2.5 72B58000011.7%
GPT-4o, May 13th (temp=0)56000011.1%
Mistral NeMO4200008.4%
Gemini 3.1 Pro (Preview)000000.0%
ByteDance Seed 1.6000000.0%
Stealth: Aurora Alpha000000.0%
GPT-4o, Aug. 6th (temp=0)000000.0%
GPT-4o Mini (temp=0)000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Writer: Palmyra X51001001001008997.9%
Ministral 3 14B1001001001008597.0%
Ministral 8B1001001001007194.2%
GPT-4.1 Nano1001001001005190.3%
Claude Sonnet 4.51009385847888.2%
Mistral Large10010090845485.6%
Gemma 3 4B1008779726881.3%
Hermes 3 70B10010080685680.8%
Rocinante 12B100100100100080.0%
ByteDance Seed 1.6 Flash10010095653879.6%
Claude 3 Haiku10010065564873.8%
Z.AI GLM 51001008281072.7%
Mistral Large 21001008279072.3%
Claude Haiku 4.510010083333269.8%
Mistral Medium 3.11009574433569.3%
GPT-4o Mini (temp=1)1001009848069.3%
Mistral Small Creative10010073353568.8%
Mistral Large 310010010043068.7%
DeepSeek-V2 Chat100989347067.5%
Ministral 3 8B1001008739065.2%
Arcee AI: Trinity Large (Preview)100888056064.9%
Claude Sonnet 41001006753063.9%
Gemma 3 12B1009158352762.5%
GPT-5.11006556553161.3%
Claude Opus 4.61008455392460.4%
Ministral 3 3B1001001000059.9%
GPT-4.1100937134059.7%
GPT-4.1 Mini100100980059.6%
Gemini 3 Pro (Preview)887561402056.8%
Mistral NeMO100100730054.7%
Minimax M2.5100776529054.3%
Z.AI GLM 4.61001003830053.7%
Gemma 3 27B1001003632053.6%
Z.AI GLM 4.7 Flash1005844342853.0%
Gemini 2.5 Flash Lite100654646051.5%
Llama 3.1 8B10088680051.1%
Claude Sonnet 4.6100100560051.1%
Claude Opus 4.5736353312949.8%
DeepSeek V3 (2025-03-24)100100460049.3%
Arcee AI: Trinity Mini83605445048.3%
GPT-4o, May 13th (temp=0)100704229048.2%
Ministral 3B10068550044.5%
DeepSeek V3.110075450044.0%
Cohere Command R+ (Aug. 2024)9173530043.5%
Qwen 3.5 Plus (2026-02-15)89762322042.0%
Qwen 3.5 397B A17B71674820041.1%
Claude 3.5 Haiku10010000040.0%
Gemini 3 Flash (Preview)723938292139.9%
o4 Mini735231201939.0%
DeepSeek V3.298343027037.8%
Gemini 2.5 Pro7663380035.4%
GPT-4o Mini (temp=0)917600033.4%
MoonshotAI: Kimi K2.51006600033.2%
Claude 3.7 Sonnet8444330032.3%
DeepSeek V3 (2024-12-26)10034260032.0%
GPT-5 Mini73382423031.6%
Z.AI GLM 4.775292624031.0%
Z.AI GLM 4.55048460028.9%
Llama 3.1 70B727100028.7%
Claude 3.5 Sonnet726800028.1%
GPT-548373618027.8%
WizardLM 2 8x22b41393127027.6%
ByteDance Seed 1.61003000026.0%
Gemini 2.5 Flash814500025.3%
GPT-4o, Aug. 6th (temp=0)4341400024.6%
GPT-5 Nano57261716023.1%
o4 Mini High7323170022.6%
GPT-5.237361918022.1%
GPT-4o, Aug. 6th (temp=1)525100020.5%
Grok 436252120020.4%
GPT-4o, May 13th (temp=1)98000019.6%
Grok 4 Fast3733230018.6%
Claude Opus 4533900018.3%
Mistral Small 3.2 24B443660017.2%
Llama 3.1 Nemotron 70B78000015.5%
Grok 4.1 Fast19180007.4%
Qwen 2.5 72B3000006.0%
Gemini 3.1 Pro (Preview)000000.0%
Stealth: Aurora Alpha000000.0%
Hermes 3 405B000000.0%