Sentence opener variety

Test: Bad Writing Habits

Avg. Score
59.4%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Claude 3.5 Haiku93.0%$0.003510.8s81%
2Grok 4.1 Fast87.6%$0.001837.8s72%
3GPT-4o Mini (temp=1)82.2%$0.001234.8s66%
4GPT-4o, Aug. 6th (temp=1)84.8%$0.01824.4s66%
5Llama 3.1 Nemotron 70B82.4%$0.003831.7s63%
6Rocinante 12B82.5%$0.001438.4s57%
7Llama 3.1 8B83.5%$0.00031.3m59%
8DeepSeek V3 (2025-03-24)77.8%$0.001439.4s56%
9Hermes 3 405B78.7%$0.003253.2s57%
10Grok 4 Fast72.0%$0.001724.1s57%
11Claude 3.5 Sonnet79.7%$0.04835.5s62%
12GPT-4.1 Mini69.6%$0.002719.0s55%
13Claude Sonnet 475.8%$0.03243.7s58%
14Claude 3 Haiku68.6%$0.002514.9s52%
15GPT-4o, May 13th (temp=1)72.7%$0.03314.4s55%
16Claude Sonnet 4.572.8%$0.03538.1s58%
17GPT-4.1 Nano64.7%$0.000713.3s51%
18Cohere Command R+ (Aug. 2024)73.8%$0.02052.5s52%
19Hermes 3 70B75.6%$0.00101.2m49%
20Z.AI GLM 4.568.1%$0.005142.1s50%
21GPT-4.165.8%$0.01844.7s54%
22Claude Haiku 4.563.3%$0.01121.6s49%
23Gemma 3 12B63.1%$0.000441.3s49%
24Gemma 3 27B65.9%$0.000652.6s48%
25Claude 3.7 Sonnet67.8%$0.04246.7s54%
26Grok 473.8%$0.0481.7m58%
27GPT-4o Mini (temp=0)62.1%$0.001234.8s45%
28DeepSeek V3 (2024-12-26)63.9%$0.002154.6s47%
29DeepSeek-V2 Chat63.5%$0.002153.3s46%
30Arcee AI: Trinity Large (Preview)61.4%$0.000043.6s45%
31Gemma 3 4B57.9%$0.000220.0s44%
32Writer: Palmyra X558.2%$0.01122.0s47%
33Llama 3.1 70B60.5%$0.001529.4s43%
34Z.AI GLM 563.0%$0.00841.2m48%
35Gemini 2.5 Flash Lite56.3%$0.00099.5s42%
36Gemini 2.5 Flash58.2%$0.005210.6s39%
37Qwen 3.5 Plus (2026-02-15)54.6%$0.006031.5s43%
38Claude Sonnet 4.662.6%$0.03139.3s43%
39o4 Mini53.0%$0.01525.7s45%
40Arcee AI: Trinity Mini55.9%$0.00039.2s36%
41GPT-4o, Aug. 6th (temp=0)57.0%$0.02322.7s42%
42Qwen 2.5 72B52.8%$0.001036.7s42%
43Mistral Medium 3.152.0%$0.004836.5s42%
44Mistral Large 251.7%$0.01329.4s42%
45Claude Opus 4.562.7%$0.07053.4s49%
46Mistral Small Creative47.2%$0.00079.1s39%
47Minimax M2.556.9%$0.00341.3m41%
48Mistral Large 349.7%$0.003330.3s40%
49ByteDance Seed 1.6 Flash50.8%$0.001327.3s37%
50Mistral Large50.6%$0.01430.9s41%
51Ministral 3 14B48.2%$0.000711.7s37%
52GPT-4o, May 13th (temp=0)53.3%$0.03514.1s40%
53Ministral 3B46.8%$0.00018.1s36%
54Ministral 3 8B45.3%$0.000819.6s38%
55Ministral 8B44.3%$0.000410.4s37%
56o4 Mini High53.1%$0.02547.2s40%
57WizardLM 2 8x22b57.4%$0.00261.8m40%
58Mistral NeMO46.4%$0.000510.1s34%
59Ministral 3 3B44.7%$0.000511.1s35%
60Gemini 2.5 Pro50.7%$0.03636.2s39%
61Stealth: Aurora Alpha40.2%$0.00009.8s34%
62Z.AI GLM 4.647.4%$0.006551.5s36%
63Gemini 3 Flash (Preview)43.7%$0.007819.6s34%
64DeepSeek V3.247.8%$0.00141.9m40%
65GPT-5.154.4%$0.0541.8m44%
66GPT-5 Mini41.9%$0.010057.4s36%
67DeepSeek V3.151.6%$0.00201.8m33%
68Z.AI GLM 4.7 Flash43.1%$0.00171.2m34%
69Z.AI GLM 4.745.1%$0.0101.4m37%
70Claude Opus 4.655.1%$0.0781.2m42%
71Gemini 3 Pro (Preview)48.5%$0.05554.4s38%
72MoonshotAI: Kimi K2.557.7%$0.0193.2m42%
73Claude Opus 472.7%$0.2091.4m57%
74ByteDance Seed 1.651.7%$0.0132.5m34%
75GPT-5.242.7%$0.0561.5m41%
76GPT-5 Nano33.5%$0.00421.4m31%
77Qwen 3.5 397B A17B42.7%$0.0143.0m34%
78Gemini 3.1 Pro (Preview)49.1%$0.1071.8m37%
79GPT-544.5%$0.0652.8m36%
80Mistral Small 3.2 24B37.5%$0.00695.7m30%
59.36%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Haiku100100100979598.4%
Rocinante 12B10010095938995.4%
DeepSeek V3 (2025-03-24)1009894928393.4%
GPT-4o Mini (temp=1)1009889878291.2%
Hermes 3 405B999687846385.8%
Claude 3.5 Sonnet908685848285.3%
GPT-4o, Aug. 6th (temp=1)989779777184.4%
Hermes 3 70B999782727084.0%
Claude Sonnet 41009084757083.9%
Grok 4.1 Fast918786837083.4%
Llama 3.1 Nemotron 70B989779736983.1%
DeepSeek V3 (2024-12-26)918482816580.8%
DeepSeek-V2 Chat989581656179.9%
Z.AI GLM 5848381797179.6%
Claude Sonnet 4.5948180806279.4%
Llama 3.1 8B1009997514879.0%
Gemma 3 12B998774726278.6%
Grok 4868180737278.4%
Claude Sonnet 4.6808079757477.7%
Claude Haiku 4.5867673727075.3%
Gemma 3 4B948073656274.8%
GPT-4o, May 13th (temp=1)947673656374.2%
Grok 4 Fast867776666473.7%
Gemma 3 27B838278665873.3%
Gemini 2.5 Flash928477585473.1%
ByteDance Seed 1.6857272716072.3%
Z.AI GLM 4.5837675705672.1%
Claude Opus 4828072715371.4%
Claude 3 Haiku797372696371.3%
GPT-4o Mini (temp=0)797675656170.9%
GPT-4.1847270646470.8%
MoonshotAI: Kimi K2.5757372656469.8%
Claude Opus 4.5757268676669.6%
Cohere Command R+ (Aug. 2024)907367585568.7%
Claude 3.7 Sonnet858460565567.9%
Arcee AI: Trinity Large (Preview)777769585366.7%
Claude Opus 4.6737163626165.9%
GPT-4.1 Mini756865645765.8%
Mistral Large 2767570574664.7%
Gemini 2.5 Flash Lite777164634764.6%
Writer: Palmyra X5726661595562.5%
GPT-4.1 Nano696865585062.2%
Gemini 2.5 Pro656360565659.9%
WizardLM 2 8x22b735959535259.2%
Gemini 3 Pro (Preview)716759504858.9%
Gemini 3.1 Pro (Preview)766253525158.7%
Minimax M2.5806955484058.4%
ByteDance Seed 1.6 Flash816358474358.4%
Qwen 3.5 Plus (2026-02-15)676555494756.5%
o4 Mini646255544756.4%
Ministral 3 3B985350463356.1%
Ministral 3 8B735554484655.2%
Llama 3.1 70B686251484655.0%
Z.AI GLM 4.7635755544554.8%
Arcee AI: Trinity Mini726149464454.4%
DeepSeek V3.2615554544754.3%
DeepSeek V3.1676152464554.3%
GPT-5.1605856564154.3%
Mistral Large736452424054.1%
o4 Mini High705448474653.0%
Ministral 3 14B695351514152.8%
Mistral Medium 3.1565655504452.4%
GPT-4o, May 13th (temp=0)645850463751.1%
GPT-5605550483750.0%
Mistral Large 3585149474450.0%
GPT-4o, Aug. 6th (temp=0)505049494648.7%
Z.AI GLM 4.7 Flash635046453948.6%
Z.AI GLM 4.6565550453648.4%
Qwen 3.5 397B A17B595245444248.4%
Qwen 2.5 72B615246434048.3%
GPT-5 Mini565347413847.1%
Ministral 8B494847444145.8%
GPT-5.2494545444445.5%
Ministral 3B584646413445.1%
Gemini 3 Flash (Preview)575042383844.8%
Mistral NeMO565443353444.4%
Mistral Small Creative474644434044.0%
Stealth: Aurora Alpha474038343338.6%
Mistral Small 3.2 24B454140382537.7%
GPT-5 Nano353432282530.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
DeepSeek V3 (2025-03-24)1009485797786.9%
Claude 3.5 Haiku1009190747285.4%
GPT-4o Mini (temp=1)868685857382.9%
Grok 4.1 Fast969491686582.5%
Hermes 3 70B1009884753578.5%
GPT-4o, Aug. 6th (temp=1)998377725376.7%
Cohere Command R+ (Aug. 2024)957875735875.8%
Rocinante 12B10010092612575.7%
Hermes 3 405B10010064575675.5%
Claude Sonnet 4837473706974.0%
Llama 3.1 Nemotron 70B998967575573.4%
Claude Sonnet 4.5797675735872.2%
DeepSeek V3 (2024-12-26)787676644467.4%
Claude Opus 4878263584467.0%
Gemma 3 27B786766645866.8%
Llama 3.1 8B996362595066.5%
GPT-4.1717068636166.4%
Claude 3.5 Sonnet747063636266.3%
Claude Opus 4.5746666616065.3%
Grok 4807159575664.6%
Claude 3 Haiku686765606064.0%
Claude 3.7 Sonnet706664625363.2%
DeepSeek-V2 Chat797970444463.0%
GPT-4.1 Mini666463535159.5%
Z.AI GLM 4.5736358574659.2%
GPT-4o, May 13th (temp=1)696362564558.9%
Z.AI GLM 5725656545358.1%
Grok 4 Fast666462524557.7%
Arcee AI: Trinity Large (Preview)696258524857.6%
Writer: Palmyra X5726052524556.4%
Claude Sonnet 4.6605956534654.9%
Claude Opus 4.6626156544154.9%
Minimax M2.5565555545354.6%
MoonshotAI: Kimi K2.5655652494853.8%
Claude Haiku 4.5605653514352.7%
Gemini 3.1 Pro (Preview)645651474151.8%
GPT-4.1 Nano605451474551.7%
DeepSeek V3.1815046414051.6%
GPT-4o Mini (temp=0)595350474751.2%
Arcee AI: Trinity Mini765043434050.5%
WizardLM 2 8x22b685944413850.1%
Gemini 2.5 Flash Lite565352483849.6%
Mistral Large 3565352453748.6%
Gemini 3 Pro (Preview)535150484048.5%
Gemma 3 12B605046463948.3%
GPT-5.1545150444047.8%
Ministral 3 14B715242373647.6%
GPT-4o, Aug. 6th (temp=0)564746454347.5%
Gemma 3 4B535349373745.9%
Gemini 2.5 Pro545045403845.5%
Gemini 2.5 Flash564543424045.3%
Stealth: Aurora Alpha614544393845.3%
Mistral Medium 3.1554947423245.2%
Mistral NeMO555344383545.2%
Llama 3.1 70B704443412544.9%
o4 Mini474746444144.8%
Z.AI GLM 4.7 Flash544545423744.5%
o4 Mini High504845413944.4%
Mistral Large564840393844.1%
Z.AI GLM 4.7545148343344.0%
Mistral Large 2544744363342.7%
Ministral 3 3B554540402741.5%
ByteDance Seed 1.6564936343141.1%
Qwen 3.5 397B A17B444340403540.5%
Qwen 2.5 72B434341393640.5%
DeepSeek V3.2474642343340.2%
ByteDance Seed 1.6 Flash424140403840.0%
Qwen 3.5 Plus (2026-02-15)434241383439.7%
Ministral 3 8B454239393439.7%
GPT-5464237373539.4%
GPT-5 Mini444140373338.9%
Ministral 3B484437372738.7%
GPT-5.2454038383438.7%
GPT-4o, May 13th (temp=0)454440342537.6%
Mistral Small Creative434237353037.4%
Z.AI GLM 4.6433634332935.2%
Ministral 8B383635333335.1%
Gemini 3 Flash (Preview)373635353235.0%
Mistral Small 3.2 24B443837252533.6%
GPT-5 Nano252525252525.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Haiku10010098949096.4%
Grok 4.1 Fast989896958995.2%
GPT-4o, Aug. 6th (temp=1)999898918794.4%
Llama 3.1 Nemotron 70B969492888490.7%
Llama 3.1 8B1009897926089.7%
GPT-4o Mini (temp=1)989593817989.3%
Claude 3.5 Sonnet929090868488.5%
DeepSeek V3 (2025-03-24)1009286806985.3%
GPT-4o, May 13th (temp=1)918988847084.4%
Hermes 3 70B969583796783.9%
Claude Sonnet 4928985767182.5%
Cohere Command R+ (Aug. 2024)958282777582.1%
Grok 4998281747081.3%
DeepSeek V3 (2024-12-26)908877777481.2%
Hermes 3 405B938878776079.4%
Claude Sonnet 4.5958077756879.3%
Claude 3 Haiku948973726478.5%
Claude Opus 4888276757078.4%
Rocinante 12B999385674778.1%
Grok 4 Fast848281727178.0%
Z.AI GLM 4.5908278696777.0%
Gemini 2.5 Flash1007675686476.6%
GPT-4.1 Mini918278685574.8%
Claude 3.7 Sonnet817774736373.7%
GPT-4o Mini (temp=0)767575746873.6%
DeepSeek-V2 Chat958467665372.9%
Gemma 3 12B827472675369.7%
Z.AI GLM 5896866615968.4%
Mistral Medium 3.1907463615368.1%
Claude Opus 4.5787366625566.6%
GPT-4o, Aug. 6th (temp=0)827465625066.6%
Z.AI GLM 4.6817266635166.6%
GPT-4.1 Nano747272625266.3%
GPT-4.1696965656466.2%
Minimax M2.5737369625265.8%
ByteDance Seed 1.6976955555465.7%
Gemma 3 27B756764615965.3%
Qwen 3.5 Plus (2026-02-15)696763605763.3%
Mistral Large 2756858585763.2%
Arcee AI: Trinity Large (Preview)797764593462.8%
Writer: Palmyra X5717065525161.9%
Gemini 2.5 Flash Lite736759565561.9%
Arcee AI: Trinity Mini937158473961.5%
GPT-4o, May 13th (temp=0)656362585660.6%
Claude Sonnet 4.6796059535160.4%
Mistral Large706660564559.5%
WizardLM 2 8x22b716459535159.4%
Gemma 3 4B716558524558.4%
Qwen 2.5 72B645857565558.0%
Claude Opus 4.6656361544757.9%
Claude Haiku 4.5755955534857.9%
Llama 3.1 70B686862583257.4%
o4 Mini High776450494356.7%
MoonshotAI: Kimi K2.5666257534356.3%
Mistral Large 3615957524655.1%
o4 Mini595954484652.9%
ByteDance Seed 1.6 Flash585653514552.8%
Gemini 2.5 Pro625958463852.5%
DeepSeek V3.2535250504950.9%
Gemini 3.1 Pro (Preview)535048484749.0%
Gemini 3 Pro (Preview)545049454548.5%
Ministral 3 14B535347464248.2%
Ministral 3B534848464548.1%
Ministral 3 3B554747454247.4%
Ministral 3 8B514846464547.1%
Ministral 8B484747454145.9%
Mistral Small Creative484646454445.8%
DeepSeek V3.1564844413344.6%
GPT-5.1534543424044.5%
GPT-5.2464444434243.8%
Z.AI GLM 4.7544743373743.7%
Stealth: Aurora Alpha474543424243.7%
Mistral NeMO494844433443.6%
Qwen 3.5 397B A17B484544413843.3%
GPT-5554339373742.2%
Mistral Small 3.2 24B505045332941.5%
GPT-5 Mini414140393940.2%
Gemini 3 Flash (Preview)403838383838.5%
Z.AI GLM 4.7 Flash434334343337.6%
GPT-5 Nano403837363537.1%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-4o, Aug. 6th (temp=1)10010098989297.7%
Claude 3.5 Haiku10010096949296.5%
Hermes 3 405B999895959195.7%
Grok 4.1 Fast10010098938795.4%
Claude 3.5 Sonnet1009997968695.4%
Llama 3.1 Nemotron 70B10010097898393.9%
DeepSeek V3 (2025-03-24)1009996888192.7%
Rocinante 12B1009895858492.5%
Cohere Command R+ (Aug. 2024)979795946990.5%
Z.AI GLM 4.51009788847789.1%
Claude 3 Haiku1009787837488.1%
Hermes 3 70B1009692856387.3%
GPT-4o Mini (temp=1)928989847485.6%
Claude Opus 4949089777384.6%
Claude Sonnet 4.5979085757584.5%
Claude Sonnet 4.6908784847483.7%
GPT-4o Mini (temp=0)898787817483.4%
Claude Sonnet 4958480797482.5%
Llama 3.1 8B989897645281.9%
Claude Opus 4.5908784826481.4%
GPT-4o, May 13th (temp=1)949275746980.8%
Grok 4 Fast868680787480.8%
GPT-4.1 Mini947979747379.7%
Grok 4928076747278.8%
Z.AI GLM 5828280777378.8%
Gemini 2.5 Flash1008273726378.0%
Gemma 3 27B908475706977.4%
MoonshotAI: Kimi K2.5908981715677.3%
GPT-4o, Aug. 6th (temp=0)888578686777.2%
GPT-4o, May 13th (temp=0)988480675476.2%
Claude 3.7 Sonnet868278676675.7%
Minimax M2.5957771686675.3%
Gemma 3 12B787670706872.5%
GPT-4.1808069696472.3%
GPT-4.1 Nano817868676672.0%
Claude Opus 4.6938266595971.8%
WizardLM 2 8x22b908863615471.2%
DeepSeek-V2 Chat848375605471.0%
DeepSeek V3.1956765635969.9%
Writer: Palmyra X5817068656469.8%
DeepSeek V3 (2024-12-26)847574674869.6%
Gemini 3 Pro (Preview)827672625669.5%
Llama 3.1 70B827466616068.5%
Arcee AI: Trinity Large (Preview)787572704768.3%
ByteDance Seed 1.6747066646367.2%
Claude Haiku 4.5757370625667.2%
GPT-5.1747468595866.8%
Arcee AI: Trinity Mini797666625066.6%
Gemini 3.1 Pro (Preview)777671664266.2%
Gemini 2.5 Flash Lite886564595766.2%
Qwen 2.5 72B736868595965.4%
Ministral 3 3B706965634963.1%
ByteDance Seed 1.6 Flash877060544362.9%
Ministral 3 14B797954504962.1%
Ministral 3B837953514562.1%
Gemini 2.5 Pro727169494861.7%
o4 Mini High747358544861.4%
Mistral NeMO706563624560.9%
Gemma 3 4B746356555460.5%
o4 Mini736262545160.4%
Mistral Large636160575358.9%
DeepSeek V3.2605856565657.3%
Z.AI GLM 4.6696361474657.2%
Gemini 3 Flash (Preview)705952524756.0%
Mistral Medium 3.1676253484755.6%
Qwen 3.5 Plus (2026-02-15)625755524654.3%
Qwen 3.5 397B A17B745949454454.1%
Z.AI GLM 4.7755347454452.9%
Ministral 8B676447443952.3%
Mistral Small Creative595852524152.2%
GPT-5575252504851.6%
GPT-5 Mini615649463649.6%
Ministral 3 8B595748434249.6%
Mistral Large 3665546423849.4%
Z.AI GLM 4.7 Flash625150423948.8%
Mistral Large 2615643424248.7%
Mistral Small 3.2 24B715050432647.7%
GPT-5.2474644434344.4%
Stealth: Aurora Alpha464342413441.3%
GPT-5 Nano383533323133.9%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Haiku1009796908994.3%
GPT-4o, Aug. 6th (temp=1)1009591908692.5%
Grok 4.1 Fast999888886988.6%
GPT-4o Mini (temp=1)1009186807586.5%
Llama 3.1 Nemotron 70B989188827386.4%
Hermes 3 70B989391896086.0%
Claude 3.5 Sonnet979284797786.0%
GPT-4o, May 13th (temp=1)959381806883.4%
Rocinante 12B1009897724882.9%
Z.AI GLM 4.5938881817282.9%
Cohere Command R+ (Aug. 2024)1009391695481.2%
Hermes 3 405B998378736980.4%
DeepSeek V3 (2025-03-24)868078777679.5%
DeepSeek V3 (2024-12-26)978181695776.8%
Grok 4848279756476.8%
Grok 4 Fast897776756776.8%
Llama 3.1 8B1009591453573.3%
Claude Sonnet 4787673706572.5%
GPT-4.1 Mini867976635972.4%
GPT-4o Mini (temp=0)868069666072.1%
GPT-4.1 Nano868368635971.7%
DeepSeek-V2 Chat787467676770.6%
Claude 3 Haiku897164635769.0%
Llama 3.1 70B1006659595868.6%
Claude Sonnet 4.5737170685968.3%
GPT-4.1757370615867.4%
Gemma 3 27B777369625467.0%
MoonshotAI: Kimi K2.5787168595666.4%
Gemma 3 4B797666575266.2%
Gemini 2.5 Flash837259575364.7%
Claude Sonnet 4.6876159595764.7%
Claude 3.7 Sonnet706665635964.7%
Qwen 2.5 72B706964595964.2%
Claude Haiku 4.5766760595964.1%
Claude Opus 4706766595764.0%
GPT-4o, May 13th (temp=0)706969564862.3%
Z.AI GLM 4.6736558575361.2%
Gemini 2.5 Flash Lite825958534960.3%
Writer: Palmyra X5646361575560.2%
GPT-4o, Aug. 6th (temp=0)786358514959.7%
Claude Opus 4.5666661564859.5%
Claude Opus 4.6686459584759.0%
Gemma 3 12B775756565059.0%
ByteDance Seed 1.6666261594558.6%
o4 Mini636161574858.0%
Arcee AI: Trinity Mini775652484655.9%
Minimax M2.5675754544755.7%
Arcee AI: Trinity Large (Preview)666555474455.4%
Mistral NeMO686154494655.4%
Z.AI GLM 5676060523655.1%
ByteDance Seed 1.6 Flash685755514555.0%
Mistral Medium 3.1625753534754.3%
Mistral Large 3675750474653.4%
Gemini 3 Pro (Preview)606056494353.4%
DeepSeek V3.2605552505053.2%
Mistral Large 2625654454452.2%
WizardLM 2 8x22b616053434251.9%
Ministral 3B706848393551.8%
Gemini 2.5 Pro625648484551.7%
Qwen 3.5 Plus (2026-02-15)635249494551.5%
Mistral Large635450464451.2%
Stealth: Aurora Alpha605949484050.9%
Gemini 3.1 Pro (Preview)694848454050.2%
DeepSeek V3.1595151503749.9%
Mistral Small Creative614949474149.5%
Z.AI GLM 4.7625147453748.6%
Ministral 3 14B564846454548.0%
o4 Mini High544848464247.6%
Ministral 3 3B535045454347.1%
GPT-5.1544847454046.6%
Ministral 8B504747463945.8%
Ministral 3 8B464645444344.7%
GPT-5574442413644.0%
GPT-5.2454544444244.0%
Z.AI GLM 4.7 Flash514645393242.6%
GPT-5 Mini484542413842.5%
Qwen 3.5 397B A17B514643403142.2%
Mistral Small 3.2 24B525048342541.7%
Gemini 3 Flash (Preview)464240393840.9%
GPT-5 Nano414039373538.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast1009693938793.9%
Claude 3.5 Haiku10010090898091.9%
DeepSeek V3 (2025-03-24)10010093847790.8%
Hermes 3 70B1009886827187.6%
Rocinante 12B969487866785.9%
Hermes 3 405B988986777685.2%
Llama 3.1 Nemotron 70B1009383816885.0%
Claude 3.5 Sonnet999777776482.7%
Llama 3.1 8B10010098614881.4%
GPT-4o, Aug. 6th (temp=1)979085775781.1%
GPT-4o Mini (temp=1)968280796881.1%
Claude Sonnet 4.5888379786779.0%
Claude 3 Haiku997675706977.6%
Minimax M2.5858381695975.5%
Claude Sonnet 4908380695174.5%
Claude Opus 4797676676572.5%
Grok 4 Fast808072686272.5%
Writer: Palmyra X5817473676572.2%
Claude Opus 4.5947266656171.8%
Claude 3.7 Sonnet787673656471.2%
DeepSeek-V2 Chat857967655169.5%
Claude Sonnet 4.6807571615969.3%
Claude Haiku 4.5767573645869.3%
Cohere Command R+ (Aug. 2024)907064635769.0%
GPT-4o, May 13th (temp=1)717070696368.5%
Grok 4777271635667.7%
Z.AI GLM 4.5746666656567.2%
Z.AI GLM 5817165605867.1%
GPT-4.1 Mini736966636366.9%
MoonshotAI: Kimi K2.5777266595666.0%
Claude Opus 4.6757170585365.3%
DeepSeek V3 (2024-12-26)757366574864.0%
GPT-4.1716762625062.8%
o4 Mini High777259564962.6%
GPT-4.1 Nano716360605060.7%
Gemma 3 12B707066464058.3%
Arcee AI: Trinity Large (Preview)666463534357.8%
Gemini 3.1 Pro (Preview)676356534857.2%
WizardLM 2 8x22b666656554257.0%
Gemma 3 4B666362514356.9%
Qwen 3.5 397B A17B745854534356.2%
GPT-4o Mini (temp=0)645853525055.5%
Llama 3.1 70B675853504655.1%
Gemma 3 27B585755525154.6%
Arcee AI: Trinity Mini676747464654.5%
Z.AI GLM 4.7 Flash635555504954.1%
Gemini 3 Pro (Preview)625954534253.9%
ByteDance Seed 1.6 Flash676060433953.8%
GPT-5.1615654534253.2%
Gemini 2.5 Flash565454534953.2%
Mistral Small Creative625353494853.1%
o4 Mini715551474153.0%
ByteDance Seed 1.6595453514552.4%
Qwen 3.5 Plus (2026-02-15)605950484252.0%
Mistral Medium 3.1615351494251.3%
DeepSeek V3.2575647464249.5%
Mistral Large 2615543434248.6%
Ministral 3B625047433948.1%
GPT-4o, Aug. 6th (temp=0)594745434247.5%
Gemini 2.5 Flash Lite545348413947.0%
Gemini 2.5 Pro545246424147.0%
Mistral Large574545443946.0%
Ministral 3 14B604443433845.5%
DeepSeek V3.1494946433945.2%
Z.AI GLM 4.7524843424145.1%
Ministral 8B484744434244.8%
Qwen 2.5 72B474545454144.6%
Ministral 3 8B504642424144.1%
Ministral 3 3B564542383843.9%
GPT-4o, May 13th (temp=0)484544424043.9%
Z.AI GLM 4.6564948382943.8%
Stealth: Aurora Alpha494641403742.5%
GPT-5.2444343414042.2%
Mistral Large 3484241413742.1%
GPT-5504341393742.1%
Gemini 3 Flash (Preview)504341373340.9%
GPT-5 Mini373636343435.4%
GPT-5 Nano373634313033.5%
Mistral NeMO423933252532.8%
Mistral Small 3.2 24B252525252525.0%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Haiku10010096929095.6%
Rocinante 12B1009695887891.4%
Llama 3.1 8B1009787836085.5%
GPT-4o Mini (temp=1)969286797185.0%
GPT-4o, Aug. 6th (temp=1)1009685696482.8%
Llama 3.1 Nemotron 70B969594715582.2%
Claude 3.5 Sonnet928786776581.3%
Grok 4.1 Fast868483767180.0%
Claude Sonnet 4999784675379.8%
Hermes 3 405B1008174726979.0%
Hermes 3 70B1007976736778.9%
DeepSeek V3 (2025-03-24)988974726278.8%
ByteDance Seed 1.6908177726577.0%
GPT-4.1878177716275.6%
Gemma 3 4B827575716873.9%
Grok 4 Fast797672686371.8%
GPT-4.1 Mini838374645371.3%
GPT-4o, May 13th (temp=1)787767666670.7%
Claude 3 Haiku767271716270.3%
Claude Opus 4797671665669.4%
Grok 4727171676268.6%
Z.AI GLM 5878270534868.1%
Claude Haiku 4.5857271585067.2%
Claude Sonnet 4.5867465575367.0%
Cohere Command R+ (Aug. 2024)867165575366.6%
DeepSeek V3 (2024-12-26)756969645566.2%
Claude 3.7 Sonnet776764636066.1%
GPT-4o Mini (temp=0)747065605965.7%
Gemma 3 12B717066645765.6%
DeepSeek-V2 Chat736965645365.0%
GPT-4.1 Nano786862555363.2%
Z.AI GLM 4.5666563635863.0%
GPT-5.1706662595362.2%
Gemma 3 27B686362615561.8%
Llama 3.1 70B817159534561.6%
Qwen 3.5 Plus (2026-02-15)666460595861.4%
Claude Sonnet 4.6676563595160.9%
Writer: Palmyra X5736059575660.8%
Claude Opus 4.6676561585160.5%
Gemini 3.1 Pro (Preview)756459544359.3%
GPT-4o, Aug. 6th (temp=0)706462514959.1%
Claude Opus 4.5666362574759.0%
WizardLM 2 8x22b635957555057.0%
Gemini 3 Flash (Preview)795751504456.1%
Arcee AI: Trinity Mini666361484256.1%
GPT-5 Mini646054534855.9%
Z.AI GLM 4.7 Flash706355474555.8%
Gemini 2.5 Flash755851484655.3%
Gemini 2.5 Flash Lite726051484555.1%
DeepSeek V3.1646453524154.9%
Arcee AI: Trinity Large (Preview)615555525054.7%
Minimax M2.5676454463954.0%
o4 Mini615953494653.6%
Qwen 3.5 397B A17B655252514753.3%
Gemini 2.5 Pro625850494653.1%
MoonshotAI: Kimi K2.5655650464652.8%
GPT-4o, May 13th (temp=0)655649474652.6%
Z.AI GLM 4.7625252504752.6%
ByteDance Seed 1.6 Flash645953454152.3%
Qwen 2.5 72B636049454552.2%
Mistral Large 3595654494352.1%
Z.AI GLM 4.6565655474551.9%
Mistral Large 2665447474551.8%
Mistral Medium 3.1605451494351.7%
Mistral Large666049424251.5%
o4 Mini High565653524051.4%
Ministral 3B725443424250.7%
Mistral NeMO625553373749.1%
Mistral Small Creative654646434148.2%
Gemini 3 Pro (Preview)595844404048.2%
Ministral 3 8B564846454247.5%
DeepSeek V3.2584746434046.8%
GPT-5535046444046.5%
GPT-5.2474544434244.3%
Ministral 3 14B474343423842.6%
Ministral 8B504239373740.9%
Ministral 3 3B424139383639.4%
Stealth: Aurora Alpha393836323035.0%
Mistral Small 3.2 24B504329252534.3%
GPT-5 Nano353533313132.9%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Haiku1009696906890.1%
Llama 3.1 8B999695864383.6%
Grok 41009970646279.0%
Rocinante 12B969076735578.1%
Grok 4.1 Fast808077757477.1%
GPT-4o, Aug. 6th (temp=1)857966636170.7%
GPT-4o Mini (temp=1)806969676570.0%
Grok 4 Fast1006962615769.9%
Llama 3.1 Nemotron 70B837772645169.6%
DeepSeek V3 (2025-03-24)777264645165.4%
Hermes 3 405B836959584763.2%
Claude Opus 4696756555560.3%
GPT-4o, May 13th (temp=1)745959575260.1%
Hermes 3 70B737262524159.9%
Claude Sonnet 4636258575258.4%
Claude Sonnet 4.5636259564957.9%
Cohere Command R+ (Aug. 2024)696853504757.3%
Arcee AI: Trinity Large (Preview)735956474555.9%
Claude 3.5 Sonnet695653534655.4%
GPT-4.1 Mini595957564655.3%
Claude 3 Haiku795250474755.0%
DeepSeek-V2 Chat766255443854.8%
Claude Haiku 4.5685749494653.6%
Claude 3.7 Sonnet555454534953.1%
Z.AI GLM 4.5635451504652.9%
Writer: Palmyra X5615653494452.6%
Claude Opus 4.5655951454252.5%
GPT-4.1625252494251.7%
Gemma 3 27B545351484249.6%
GPT-4.1 Nano634843434047.4%
Claude Sonnet 4.6535048434147.2%
Mistral NeMO575650393447.1%
Gemma 3 12B584845444046.9%
Gemini 2.5 Flash594845393946.1%
DeepSeek V3 (2024-12-26)524847433745.3%
o4 Mini High585142403545.2%
Llama 3.1 70B484846443944.9%
Minimax M2.5625043422544.4%
Mistral Large 2625336363444.4%
GPT-5.1494844393943.9%
GPT-4o Mini (temp=0)464444434243.8%
Qwen 2.5 72B584241393843.4%
Gemma 3 4B494641403943.1%
Gemini 2.5 Flash Lite494542403943.0%
GPT-4o, Aug. 6th (temp=0)494642403743.0%
Z.AI GLM 5484442423942.8%
o4 Mini444342423541.0%
Mistral Medium 3.1484541343440.6%
WizardLM 2 8x22b464140393640.2%
Mistral Small Creative453939383539.2%
Mistral Large 3464140383139.1%
Gemini 2.5 Pro464140353439.1%
Gemini 3.1 Pro (Preview)454437343438.8%
ByteDance Seed 1.6623534313138.6%
MoonshotAI: Kimi K2.5424138353538.0%
ByteDance Seed 1.6 Flash443838383237.9%
Qwen 3.5 Plus (2026-02-15)474335323137.7%
DeepSeek V3.2444035353337.5%
Gemini 3 Flash (Preview)444036353237.3%
GPT-4o, May 13th (temp=0)444237333037.2%
Gemini 3 Pro (Preview)414037353236.9%
Ministral 3 14B393836363436.5%
GPT-5.2403936343336.5%
Ministral 3 8B393736363436.4%
Mistral Large453835343036.4%
Claude Opus 4.6403939333136.2%
Z.AI GLM 4.6393837372936.1%
Z.AI GLM 4.7433737342635.4%
DeepSeek V3.1393836333135.4%
Arcee AI: Trinity Mini393735343235.4%
GPT-5383734313034.1%
Ministral 8B363635332934.1%
Ministral 3B363633323133.8%
GPT-5 Mini363535313133.7%
Z.AI GLM 4.7 Flash353434333033.2%
Ministral 3 3B383534312733.0%
Qwen 3.5 397B A17B343332313031.9%
Stealth: Aurora Alpha363628272630.6%
GPT-5 Nano322726252527.0%
Mistral Small 3.2 24B252525252525.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-4o, Aug. 6th (temp=1)999585797386.2%
Grok 4.1 Fast1008781817685.1%
DeepSeek V3 (2025-03-24)1009493805584.5%
GPT-4o Mini (temp=1)898786847684.3%
Claude 3.5 Sonnet918787827183.6%
Llama 3.1 Nemotron 70B999081786782.7%
Claude 3.5 Haiku939077777582.5%
Llama 3.1 8B1009897683780.1%
GPT-4.1 Mini937978757279.6%
Cohere Command R+ (Aug. 2024)797977736875.3%
Hermes 3 405B867872707075.3%
Grok 4858074696774.9%
GPT-4.1 Nano867170696672.6%
Grok 4 Fast857573696072.5%
Claude 3.7 Sonnet797670676772.0%
GPT-4o, May 13th (temp=1)797372675769.6%
WizardLM 2 8x22b927869565369.5%
Claude Opus 4827767605969.2%
Llama 3.1 70B947068605268.7%
Z.AI GLM 4.5817069675768.6%
Claude Sonnet 4.5767268645967.8%
DeepSeek V3 (2024-12-26)757469625366.7%
Claude Sonnet 4756766626065.9%
Arcee AI: Trinity Large (Preview)966258575665.6%
Gemma 3 12B696666656065.3%
GPT-4.1716968675064.9%
Rocinante 12B988458433864.2%
DeepSeek-V2 Chat867559514863.7%
Claude 3 Haiku686662616063.5%
Z.AI GLM 5916859544563.3%
Hermes 3 70B807665514262.8%
Gemma 3 27B716662555161.1%
GPT-4o, Aug. 6th (temp=0)676461595260.7%
GPT-4o Mini (temp=0)737058554760.6%
Qwen 2.5 72B656460595560.3%
GPT-4o, May 13th (temp=0)726157565059.1%
Claude Haiku 4.5626060585458.8%
Gemma 3 4B636356565358.1%
Arcee AI: Trinity Mini676660503856.1%
Writer: Palmyra X5605855545055.6%
Qwen 3.5 Plus (2026-02-15)615954525155.6%
Claude Opus 4.5645757524755.4%
Claude Sonnet 4.6655854484654.5%
Mistral Large 2605959494554.4%
MoonshotAI: Kimi K2.5656152484353.7%
Gemini 2.5 Flash Lite585454524853.0%
Minimax M2.5595554514352.3%
Mistral Medium 3.1595151504852.0%
GPT-5.1585652494551.9%
Mistral Small Creative655048464550.8%
Mistral Small 3.2 24B695747413750.3%
Mistral Large 3644947454449.9%
Gemini 2.5 Flash595148474349.6%
Ministral 3 14B565252464049.1%
Mistral Large605545444149.1%
o4 Mini515047454347.1%
Claude Opus 4.6525145444346.9%
o4 Mini High574645454146.9%
ByteDance Seed 1.6 Flash554745444246.8%
Gemini 3 Pro (Preview)534545434045.2%
Ministral 3 3B484846414044.4%
Gemini 2.5 Pro544942393644.2%
ByteDance Seed 1.6544441403643.0%
Mistral NeMO645335332942.8%
Ministral 3B484442413842.6%
GPT-5454441414042.3%
DeepSeek V3.2494743383442.3%
Stealth: Aurora Alpha464545383642.0%
GPT-5.2444242424042.0%
Ministral 8B454343413641.7%
Gemini 3.1 Pro (Preview)564140393341.6%
Ministral 3 8B434241413841.1%
GPT-5 Mini434339383639.8%
Gemini 3 Flash (Preview)444440363439.7%
DeepSeek V3.1424140373739.6%
GPT-5 Nano413938383738.9%
Z.AI GLM 4.6413939383338.0%
Z.AI GLM 4.7454238323137.9%
Z.AI GLM 4.7 Flash403932313034.4%
Qwen 3.5 397B A17B373735302532.9%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Rocinante 12B999898958695.2%
Claude 3.5 Haiku1009796918593.8%
Claude 3.5 Sonnet999490908491.3%
GPT-4o Mini (temp=1)1009889848491.1%
Hermes 3 405B989691867689.5%
Grok 4.1 Fast978887858087.2%
Llama 3.1 8B999795706986.2%
Llama 3.1 Nemotron 70B978781797784.0%
Cohere Command R+ (Aug. 2024)988886845982.9%
GPT-4o, Aug. 6th (temp=1)908680787682.0%
Grok 4 Fast898379767580.3%
Claude Sonnet 4997979786680.0%
GPT-4o, May 13th (temp=1)898076737277.9%
Claude Sonnet 4.5878377747077.9%
Claude Opus 4937875736175.9%
Grok 4857974746475.3%
DeepSeek V3 (2025-03-24)888279754273.1%
Claude 3.7 Sonnet818069676672.8%
GPT-4.1 Nano807674686472.5%
GPT-4.1 Mini877771685872.1%
Hermes 3 70B898278684171.5%
Z.AI GLM 4.5787470706471.1%
Claude 3 Haiku817870685370.2%
Claude Haiku 4.5858072684670.0%
GPT-4.1787472695669.9%
Gemma 3 27B946964595868.8%
Claude Sonnet 4.6767068646368.3%
GPT-5.1727068646467.5%
Arcee AI: Trinity Large (Preview)1007572474167.2%
GPT-4o Mini (temp=0)776867665767.0%
Qwen 2.5 72B826865635666.8%
Arcee AI: Trinity Mini917769484666.3%
WizardLM 2 8x22b836262615865.1%
Gemini 2.5 Flash Lite876763604965.1%
DeepSeek-V2 Chat787165575264.7%
ByteDance Seed 1.6766866654864.7%
Claude Opus 4.5766967575464.5%
Qwen 3.5 Plus (2026-02-15)856860604964.3%
Z.AI GLM 5706767595763.9%
Llama 3.1 70B766361595863.4%
o4 Mini High736461575461.9%
Minimax M2.5726561604961.6%
Gemma 3 12B776463594661.5%
GPT-4o, Aug. 6th (temp=0)815756555460.6%
Writer: Palmyra X5786256524959.4%
Gemma 3 4B706860514759.2%
Gemini 3 Flash (Preview)686557534958.5%
o4 Mini686356554858.2%
GPT-5656256545057.3%
Gemini 2.5 Flash746654494257.2%
Gemini 2.5 Pro646360534456.7%
DeepSeek V3 (2024-12-26)755349494855.1%
Mistral Large 3645554515155.0%
Mistral Large575555535354.5%
DeepSeek V3.2746050454254.1%
MoonshotAI: Kimi K2.5585755524854.0%
GPT-4o, May 13th (temp=0)695352514453.8%
Mistral NeMO715851473953.2%
Claude Opus 4.6635655494253.2%
Gemini 3.1 Pro (Preview)625350494852.4%
Mistral Large 2575755464652.1%
DeepSeek V3.1616150474152.0%
Gemini 3 Pro (Preview)625050444049.2%
Mistral Medium 3.1575049464148.6%
Qwen 3.5 397B A17B635244434148.5%
Mistral Small Creative594846454448.2%
Z.AI GLM 4.6575451403647.7%
Z.AI GLM 4.7525049474147.7%
Z.AI GLM 4.7 Flash575045414046.8%
Ministral 3 8B534545434145.5%
Ministral 8B504746434045.5%
ByteDance Seed 1.6 Flash484646454345.3%
Ministral 3B484842424144.2%
GPT-5.2494543424143.8%
GPT-5 Mini494443414143.8%
Ministral 3 14B484342403842.0%
Ministral 3 3B484343402539.6%
Mistral Small 3.2 24B444242353439.3%
GPT-5 Nano373736322834.1%
Stealth: Aurora Alpha373630292531.5%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Haiku10010096969397.2%
Llama 3.1 8B1009794939094.7%
Rocinante 12B1009494946689.5%
GPT-4o, Aug. 6th (temp=1)959594856887.3%
GPT-4.1 Mini878475747278.3%
Grok 4.1 Fast868577747078.2%
Claude 3.5 Sonnet838278776777.4%
GPT-4o Mini (temp=1)828179786777.4%
Hermes 3 405B969380635377.1%
Llama 3.1 Nemotron 70B888381755776.6%
DeepSeek V3 (2025-03-24)908079686175.3%
GPT-4o, May 13th (temp=1)887975696575.1%
Hermes 3 70B988077674673.6%
Cohere Command R+ (Aug. 2024)867473696473.0%
Grok 4817571696171.5%
Claude Sonnet 4.5807766666570.8%
Claude 3 Haiku917964645370.2%
Claude Opus 4767568676469.9%
GPT-4.1 Nano847671605969.9%
Grok 4 Fast757470615968.0%
Claude Sonnet 4747270625967.3%
Claude 3.7 Sonnet717065636266.2%
GPT-4o, Aug. 6th (temp=0)686867646366.0%
DeepSeek V3 (2024-12-26)766767665365.9%
Claude Haiku 4.5717065625764.9%
WizardLM 2 8x22b856767555164.8%
DeepSeek-V2 Chat746363615563.3%
Llama 3.1 70B776457555461.5%
Claude Opus 4.5666261585861.1%
Z.AI GLM 4.5706960555160.9%
Gemma 3 12B656562594859.7%
GPT-4.1656161555559.6%
GPT-4o, May 13th (temp=0)676258555559.4%
Gemma 3 27B727059484759.1%
Arcee AI: Trinity Mini806361523558.3%
Mistral Large736757454557.4%
Qwen 3.5 Plus (2026-02-15)626054515055.3%
Mistral Large 2615855514754.6%
MoonshotAI: Kimi K2.5675252514954.3%
Gemini 2.5 Flash Lite615453524753.6%
Gemini 2.5 Flash725351464453.4%
Writer: Palmyra X5625753464653.0%
Gemma 3 4B645956444252.9%
Qwen 2.5 72B705450454252.3%
DeepSeek V3.1705250474151.8%
Ministral 3 14B565352514651.4%
o4 Mini595848474351.1%
Mistral Large 3615349484451.0%
Arcee AI: Trinity Large (Preview)764945454050.9%
Z.AI GLM 5565452474550.9%
GPT-4o Mini (temp=0)545250494850.6%
Gemini 2.5 Pro565349484650.4%
GPT-5.1595049464449.6%
ByteDance Seed 1.6645649423749.6%
Mistral Small Creative555349474449.5%
Mistral Medium 3.1585245444448.4%
o4 Mini High585346444148.4%
Claude Opus 4.6535049493948.1%
Claude Sonnet 4.6565251433847.9%
Z.AI GLM 4.6545148463947.8%
Mistral NeMO794643393247.6%
Ministral 3 8B644645424147.4%
Gemini 3 Flash (Preview)524949444247.3%
DeepSeek V3.2494947464246.5%
ByteDance Seed 1.6 Flash505047433945.9%
Ministral 3B474644424043.8%
Ministral 3 3B474645433843.7%
Minimax M2.5564744413143.6%
Mistral Small 3.2 24B484742413843.4%
Gemini 3 Pro (Preview)524343413743.3%
GPT-5.2454443424143.2%
GPT-5 Mini454443414042.7%
GPT-5474341403941.9%
Stealth: Aurora Alpha454440383740.8%
Ministral 8B494439363540.7%
GPT-5 Nano414039383839.1%
Gemini 3.1 Pro (Preview)523837372938.6%
Z.AI GLM 4.7 Flash443938373338.2%
Qwen 3.5 397B A17B434039383037.7%
Z.AI GLM 4.7403837363537.1%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Llama 3.1 8B100100100999999.7%
Claude 3.5 Haiku979492837487.9%
Grok 4.1 Fast1008783828086.5%
WizardLM 2 8x22b10010077736382.6%
Llama 3.1 Nemotron 70B918381767581.3%
GPT-4o, Aug. 6th (temp=1)948080787180.9%
DeepSeek V3 (2025-03-24)948078766378.3%
Claude 3.5 Sonnet888175716776.2%
Grok 4 Fast828173696974.6%
Claude Sonnet 4837575736173.4%
GPT-4o Mini (temp=1)828268676472.6%
Grok 41007970575472.1%
Claude 3 Haiku987667605571.0%
GPT-4.1 Mini767569696570.9%
DeepSeek-V2 Chat857473634768.4%
GPT-4o, May 13th (temp=1)856661615966.7%
Claude Sonnet 4.5816864635566.2%
Claude Opus 4757163625966.1%
Claude 3.7 Sonnet776766595965.6%
Z.AI GLM 4.5797763574964.8%
Llama 3.1 70B767167624764.7%
DeepSeek V3 (2024-12-26)777166604663.9%
MoonshotAI: Kimi K2.5806360595362.9%
Claude Haiku 4.5727160575362.6%
Hermes 3 405B787670474162.3%
Rocinante 12B836661544361.6%
Cohere Command R+ (Aug. 2024)716865604461.5%
GPT-4.1726762594761.2%
Gemma 3 4B686663604861.1%
Writer: Palmyra X5686657565159.7%
Claude Opus 4.5716259544858.8%
o4 Mini715958554657.8%
Gemma 3 12B656457545057.8%
GPT-4.1 Nano686456504957.4%
Hermes 3 70B1007542363357.4%
Arcee AI: Trinity Large (Preview)746650494556.8%
GPT-5.1625956544955.9%
Gemini 2.5 Flash785949474555.4%
Arcee AI: Trinity Mini716358413954.3%
GPT-4o, May 13th (temp=0)705251484553.4%
Z.AI GLM 5676156443853.1%
Minimax M2.5645352494051.7%
o4 Mini High585048484650.3%
Qwen 3.5 Plus (2026-02-15)615951473450.2%
Mistral Medium 3.1695245444050.0%
GPT-4o, Aug. 6th (temp=0)595049474449.9%
DeepSeek V3.1736139383649.3%
ByteDance Seed 1.6 Flash635047453948.7%
GPT-4o Mini (temp=0)555049474148.6%
Mistral Small Creative575243423946.7%
Mistral Large545046434046.6%
Gemini 3.1 Pro (Preview)604944403946.5%
Gemma 3 27B545244433745.9%
DeepSeek V3.2544746453845.9%
ByteDance Seed 1.6615144393345.7%
Claude Opus 4.6555348422845.2%
Ministral 3B554643403944.6%
Claude Sonnet 4.6555342393344.5%
Mistral Large 2534545403944.5%
Mistral Large 3594640383643.8%
Qwen 2.5 72B534242404043.6%
Gemini 3 Flash (Preview)524843383643.5%
GPT-5.2464444433842.9%
Ministral 3 8B474443413742.5%
Ministral 3 14B474241413942.1%
Gemini 2.5 Flash Lite444343423641.6%
Ministral 3 3B464441383641.3%
Qwen 3.5 397B A17B504743333341.1%
Gemini 2.5 Pro444342413340.6%
Z.AI GLM 4.6534241333140.0%
Gemini 3 Pro (Preview)484539353440.0%
Z.AI GLM 4.7 Flash513939352938.6%
GPT-5403939373638.3%
Ministral 8B484035332836.7%
GPT-5 Mini414035343236.5%
Z.AI GLM 4.7403737363336.4%
Mistral NeMO403934312533.8%
Stealth: Aurora Alpha403331312532.2%
Mistral Small 3.2 24B372525252527.4%
GPT-5 Nano322727252527.1%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Haiku100100100979397.9%
GPT-4o Mini (temp=1)989695907791.1%
Llama 3.1 8B989795818190.5%
Grok 4.1 Fast999485847587.4%
Rocinante 12B1009891796586.4%
Hermes 3 405B988682807283.7%
Claude Sonnet 4949284786783.2%
Claude Opus 4959287736382.1%
Claude Sonnet 4.6848383787079.5%
Claude Sonnet 4.5908079786077.4%
GPT-4o, Aug. 6th (temp=1)878475736877.3%
Llama 3.1 Nemotron 70B968583615976.9%
GPT-4o, May 13th (temp=1)948476635774.5%
Gemma 3 27B897975666474.4%
Claude 3.5 Sonnet857674696774.1%
Claude Opus 4.5807572706973.0%
GPT-4.1868078655572.7%
Gemma 3 12B887971705472.4%
Cohere Command R+ (Aug. 2024)867472666272.0%
Grok 4877575645671.2%
Gemma 3 4B757572676570.7%
GPT-4.1 Nano837170696070.5%
Z.AI GLM 5777671676170.3%
Arcee AI: Trinity Large (Preview)867270605669.1%
Claude 3 Haiku817065655968.1%
DeepSeek V3 (2025-03-24)1007459545167.6%
Grok 4 Fast737269636167.5%
Gemini 2.5 Flash Lite806864606066.3%
Gemini 2.5 Flash777472654466.3%
Hermes 3 70B737068605865.9%
Claude 3.7 Sonnet817167575365.8%
GPT-4.1 Mini727168575764.9%
Claude Haiku 4.5796964595364.8%
Gemini 2.5 Pro897057564864.0%
GPT-5.1747168545163.6%
Z.AI GLM 4.5726361605963.1%
GPT-4o Mini (temp=0)716759595862.9%
WizardLM 2 8x22b726961575562.7%
Llama 3.1 70B786562624662.6%
DeepSeek V3.1985858514762.0%
Minimax M2.5666661595761.7%
Claude Opus 4.6776159545461.1%
GPT-4o, Aug. 6th (temp=0)766156504357.1%
Qwen 3.5 Plus (2026-02-15)636055555156.8%
MoonshotAI: Kimi K2.5735954494856.8%
DeepSeek V3 (2024-12-26)756052514356.2%
o4 Mini High636156524755.8%
ByteDance Seed 1.6646362464355.6%
o4 Mini605954535155.5%
Writer: Palmyra X5595756554955.0%
Z.AI GLM 4.6706357433954.4%
GPT-5635453525054.3%
Mistral Small Creative675957464254.0%
ByteDance Seed 1.6 Flash585555544853.8%
Mistral Medium 3.1645654494553.5%
GPT-4o, May 13th (temp=0)645852434351.8%
Z.AI GLM 4.7605554494151.7%
DeepSeek V3.2575450494150.2%
Mistral NeMO605947433849.5%
Mistral Large535251464449.1%
Qwen 2.5 72B604847454448.8%
Arcee AI: Trinity Mini655348463148.7%
Gemini 3 Pro (Preview)585444434348.4%
Ministral 8B555445454348.3%
Mistral Large 3575045424147.0%
DeepSeek-V2 Chat624545424046.7%
Ministral 3 14B554945434146.7%
Ministral 3B704641383846.6%
Gemini 3.1 Pro (Preview)634642423746.0%
Mistral Large 2534745434245.9%
GPT-5 Mini534544434245.2%
Ministral 3 8B594842383444.4%
Gemini 3 Flash (Preview)535042393844.3%
Z.AI GLM 4.7 Flash514742423844.0%
GPT-5.2474443424143.5%
Qwen 3.5 397B A17B504543413843.2%
Mistral Small 3.2 24B464342413641.7%
Ministral 3 3B484242413341.2%
Stealth: Aurora Alpha453937373137.9%
GPT-5 Nano343131302530.2%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Haiku1009794897991.6%
Rocinante 12B1009998817690.6%
Grok 4.1 Fast1009389858490.0%
Llama 3.1 8B10010096886689.9%
GPT-4o, Aug. 6th (temp=1)1009183827886.8%
Hermes 3 70B1008071645874.7%
Cohere Command R+ (Aug. 2024)977570616172.7%
Llama 3.1 Nemotron 70B797669696471.5%
Claude Sonnet 4787572656069.8%
GPT-4o Mini (temp=1)747367666368.5%
Arcee AI: Trinity Large (Preview)837468635067.7%
Hermes 3 405B807468655267.7%
DeepSeek V3 (2025-03-24)817867585166.8%
Gemma 3 27B757462595865.5%
Grok 4736963585763.9%
Claude Opus 4796461605062.7%
GPT-4o, May 13th (temp=1)786760554761.3%
Grok 4 Fast706158565660.2%
Claude Sonnet 4.5676659565360.1%
Claude 3 Haiku686160565560.0%
Claude 3.5 Sonnet646160575759.8%
GPT-4.1 Mini636360574758.2%
GPT-4.1615454544653.8%
Claude 3.7 Sonnet615551494752.7%
Gemma 3 12B635751504152.6%
Claude Haiku 4.5666450414152.5%
DeepSeek V3 (2024-12-26)885146423552.4%
GPT-5.1625751474451.9%
GPT-4o Mini (temp=0)545250504750.7%
Z.AI GLM 5585150464549.9%
Claude Opus 4.5575150494049.6%
Writer: Palmyra X5525151504349.5%
GPT-4o, Aug. 6th (temp=0)555150464549.4%
MoonshotAI: Kimi K2.5675447403548.5%
Llama 3.1 70B625050423848.4%
Claude Sonnet 4.6675050452948.4%
GPT-4.1 Nano604944424247.2%
Claude Opus 4.6564948443847.0%
o4 Mini544746424146.1%
DeepSeek-V2 Chat574543414045.2%
o4 Mini High514744434045.0%
Minimax M2.5585737353444.3%
Z.AI GLM 4.5574442403743.8%
Z.AI GLM 4.7524643393643.3%
Mistral Medium 3.1474544433743.1%
ByteDance Seed 1.6 Flash474643413843.1%
Gemini 2.5 Flash Lite454444414042.8%
GPT-4o, May 13th (temp=0)454545383741.9%
Mistral Large 2554140393441.8%
DeepSeek V3.1594535343241.1%
Mistral Large524443402741.1%
Mistral Large 3484638363641.1%
GPT-5444240393941.0%
Arcee AI: Trinity Mini524643372740.9%
Gemini 2.5 Pro524342353240.8%
Gemini 3 Pro (Preview)494039393640.8%
Mistral Small Creative564536353140.5%
Qwen 2.5 72B424140403940.4%
Ministral 3B534140353140.2%
Qwen 3.5 397B A17B494337363540.1%
Gemma 3 4B454039393840.1%
Ministral 3 8B454040383739.9%
Ministral 3 14B534137353339.8%
Stealth: Aurora Alpha444240393539.8%
GPT-5.2404039393739.0%
Ministral 8B414039383538.6%
GPT-5 Mini434038353538.3%
Qwen 3.5 Plus (2026-02-15)464036343437.8%
WizardLM 2 8x22b463935343437.4%
DeepSeek V3.2423836353437.2%
Ministral 3 3B453838353037.1%
Gemini 2.5 Flash474039312837.0%
Gemini 3.1 Pro (Preview)423935353436.9%
Z.AI GLM 4.7 Flash393837353436.6%
Mistral NeMO483737342536.4%
Z.AI GLM 4.6473832313035.7%
Gemini 3 Flash (Preview)423735352935.5%
ByteDance Seed 1.6434133313035.5%
Mistral Small 3.2 24B333125252527.9%
GPT-5 Nano302827252527.2%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Haiku1001001001009398.5%
Grok 4.1 Fast999898969096.1%
Llama 3.1 8B100100100756287.3%
Llama 3.1 Nemotron 70B919087838186.2%
GPT-4o Mini (temp=1)949385827786.1%
Grok 4908787877785.5%
GPT-4o, Aug. 6th (temp=1)908984826982.8%
Cohere Command R+ (Aug. 2024)988888865182.0%
Hermes 3 405B888382827381.5%
Claude Sonnet 4948681805980.1%
Grok 4 Fast848277767578.7%
Hermes 3 70B989776744377.4%
Rocinante 12B1007974646175.5%
Claude 3.5 Sonnet897574726775.4%
Claude Opus 4808077676273.1%
Claude Sonnet 4.5977675595872.9%
Z.AI GLM 4.5947669665972.6%
GPT-4.1777771716772.5%
Claude 3.7 Sonnet867268686371.4%
GPT-4o Mini (temp=0)937165646371.1%
GPT-4o, May 13th (temp=1)877068676471.1%
GPT-4.1 Mini857473714970.5%
Gemma 3 27B927868575670.2%
GPT-4.1 Nano827065656469.4%
Gemma 3 12B717169696468.9%
Arcee AI: Trinity Mini896963636168.9%
DeepSeek V3 (2024-12-26)857867575668.7%
Gemini 2.5 Flash Lite777165615866.2%
Qwen 3.5 Plus (2026-02-15)777267635166.1%
Z.AI GLM 5766967635566.1%
DeepSeek V3 (2025-03-24)797064635466.0%
GPT-4o, Aug. 6th (temp=0)706662615963.5%
o4 Mini High817856505062.9%
Llama 3.1 70B1006058534262.4%
DeepSeek-V2 Chat806463554761.9%
Mistral Medium 3.1726860555261.4%
Mistral Large 3706961603859.6%
Gemini 2.5 Flash757452524359.4%
Claude Opus 4.5666258565559.3%
GPT-4o, May 13th (temp=0)716757534558.7%
Mistral Large 2726058535158.6%
Claude Haiku 4.5636059585258.3%
Claude 3 Haiku636356555157.3%
Arcee AI: Trinity Large (Preview)796455512855.5%
Qwen 2.5 72B676055474755.1%
Claude Sonnet 4.6625858573654.5%
Mistral NeMO685958463553.0%
Gemma 3 4B585655534252.8%
WizardLM 2 8x22b695545444451.4%
Gemini 2.5 Pro585452474450.8%
o4 Mini545150484850.1%
Mistral Large565550474250.0%
MoonshotAI: Kimi K2.5565250474249.4%
Writer: Palmyra X5535249484449.0%
Ministral 3 8B555148464548.9%
Mistral Small Creative615250433848.9%
Claude Opus 4.6575248464048.8%
Minimax M2.5555247464348.7%
ByteDance Seed 1.6 Flash584846454448.4%
GPT-5.1544847454347.2%
Stealth: Aurora Alpha504949423945.8%
Ministral 3 14B494948424045.8%
DeepSeek V3.1684442383645.8%
DeepSeek V3.2594745393745.3%
Ministral 8B484846444045.1%
Mistral Small 3.2 24B504949482544.0%
GPT-5.2464543434243.7%
Gemini 3.1 Pro (Preview)444443434243.2%
Ministral 3 3B454542423942.6%
Z.AI GLM 4.7484641403842.6%
Gemini 3 Pro (Preview)504941393342.3%
Ministral 3B494341413441.6%
Z.AI GLM 4.6504639363541.3%
GPT-5 Mini434241413740.7%
GPT-5 Nano434138373538.9%
ByteDance Seed 1.6504836303038.9%
Gemini 3 Flash (Preview)414039383638.6%
GPT-5423737363637.5%
Qwen 3.5 397B A17B383737363436.4%
Z.AI GLM 4.7 Flash453737323036.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Sonnet969594928892.9%
Grok 4.1 Fast969187878188.5%
Claude Opus 4959389828188.1%
GPT-4o, Aug. 6th (temp=1)1009982797987.7%
Claude 3.5 Haiku949486838087.4%
Cohere Command R+ (Aug. 2024)949386847786.9%
Rocinante 12B100100100716186.6%
Llama 3.1 Nemotron 70B908983837985.0%
Gemma 3 27B998980787784.5%
Hermes 3 405B1008581817584.3%
Grok 4 Fast959286836584.3%
DeepSeek V3.1969489726483.0%
GPT-4o Mini (temp=1)928885797182.9%
Claude Sonnet 4.6948581787582.9%
Arcee AI: Trinity Mini898684816982.0%
Claude 3.7 Sonnet968879747181.6%
Hermes 3 70B1009582745781.5%
GPT-4o, May 13th (temp=1)988780737081.4%
Claude Sonnet 4.5939085696981.3%
Claude Sonnet 4878382807381.1%
Z.AI GLM 4.5998175757080.0%
Grok 4918774706978.2%
Claude Haiku 4.5848373737377.3%
DeepSeek V3 (2025-03-24)988870686277.1%
Llama 3.1 8B1009366595875.4%
GPT-4o Mini (temp=0)808074727075.2%
Z.AI GLM 5897371676773.5%
GPT-4.1 Mini867471696773.4%
Arcee AI: Trinity Large (Preview)907572695872.8%
GPT-4.1787874666572.4%
Claude 3 Haiku827667666370.9%
Gemma 3 12B817771656170.8%
Gemini 2.5 Flash Lite797674616170.2%
Llama 3.1 70B1006966585469.5%
Minimax M2.5767369676169.2%
GPT-4.1 Nano827168635768.3%
GPT-5.1717170646167.5%
Qwen 3.5 Plus (2026-02-15)727164636366.6%
MoonshotAI: Kimi K2.5777364615766.4%
Claude Opus 4.6777666594765.2%
Claude Opus 4.5736964605464.2%
Ministral 8B816858575663.9%
Qwen 2.5 72B757161555363.0%
Mistral Large 2656564645662.8%
GPT-4o, Aug. 6th (temp=0)816765505062.7%
GPT-4o, May 13th (temp=0)706564625362.7%
Writer: Palmyra X5666661605461.3%
Gemini 2.5 Pro786258545461.1%
Ministral 3 14B797266464261.0%
Ministral 3B716763624260.8%
DeepSeek-V2 Chat666662595160.8%
Gemini 2.5 Flash706762535060.6%
o4 Mini High676765564660.3%
Mistral Large 3836357514559.8%
ByteDance Seed 1.6 Flash686362554658.7%
Gemma 3 4B686462594058.7%
Gemini 3 Pro (Preview)696665504458.6%
Mistral Large686359584358.2%
DeepSeek V3 (2024-12-26)756953474658.0%
o4 Mini666058535257.8%
Z.AI GLM 4.7645957565057.1%
Z.AI GLM 4.6645756524755.1%
WizardLM 2 8x22b625754504553.8%
Mistral Medium 3.1615856474753.7%
GPT-5645852514253.1%
Mistral NeMO656254503052.4%
Ministral 3 8B636053424151.7%
DeepSeek V3.2575549484650.9%
Ministral 3 3B645345454349.7%
Gemini 3.1 Pro (Preview)595149424248.6%
Mistral Small Creative565046434047.2%
Gemini 3 Flash (Preview)614744424146.9%
Z.AI GLM 4.7 Flash584543434145.9%
GPT-5 Mini504745444245.6%
Qwen 3.5 397B A17B585342383745.6%
GPT-5.2484646454044.8%
ByteDance Seed 1.6434343424142.3%
Mistral Small 3.2 24B585039252539.4%
Stealth: Aurora Alpha424239393539.1%
GPT-5 Nano373434333233.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Haiku10010095898894.4%
Llama 3.1 Nemotron 70B959587837987.8%
GPT-4o, Aug. 6th (temp=1)959584797786.2%
Claude 3.5 Sonnet908484847884.1%
Grok 4.1 Fast888786797883.7%
Claude Sonnet 4918985806882.4%
GPT-4o, May 13th (temp=1)938483777181.5%
Hermes 3 405B919075727280.0%
GPT-4o Mini (temp=1)868180777379.5%
Llama 3.1 8B1007675756978.8%
Gemma 3 27B938175696376.4%
Hermes 3 70B988276665976.2%
Rocinante 12B1009776504674.0%
Claude Opus 4877369686672.7%
GPT-4.1 Mini827772685871.4%
Claude Sonnet 4.5867574606071.1%
Grok 4757470676469.9%
DeepSeek V3 (2025-03-24)837367646169.4%
GPT-4.1 Nano847669645269.0%
Cohere Command R+ (Aug. 2024)797069646469.0%
Z.AI GLM 4.5727167666568.1%
Llama 3.1 70B886965635567.8%
Claude 3.7 Sonnet737066645966.2%
Claude Haiku 4.5747263615965.8%
GPT-4o Mini (temp=0)767062615464.7%
Gemma 3 12B786564615564.6%
DeepSeek-V2 Chat766561595863.9%
Grok 4 Fast717062625463.5%
GPT-4.1766561595563.2%
DeepSeek V3 (2024-12-26)707060585762.8%
GPT-4o, Aug. 6th (temp=0)747355525261.4%
Claude Sonnet 4.6706761525060.3%
Qwen 2.5 72B646459585660.2%
Claude 3 Haiku746860554360.0%
Qwen 3.5 Plus (2026-02-15)756360604259.9%
Gemma 3 4B656360535158.5%
Arcee AI: Trinity Large (Preview)676459534757.9%
Z.AI GLM 5666553525157.5%
Gemini 2.5 Flash656255544856.9%
Mistral Large 2676058504856.4%
ByteDance Seed 1.6 Flash706457484155.9%
Claude Opus 4.5645757544755.7%
Mistral Medium 3.1655655514955.2%
GPT-4o, May 13th (temp=0)605757515055.1%
o4 Mini626154494854.8%
Gemini 2.5 Flash Lite716049454153.4%
Gemini 2.5 Pro655352484552.3%
Mistral Large595858454152.0%
Ministral 3 14B685451444351.9%
Ministral 3 8B615848464451.4%
Mistral Small Creative625647434350.3%
Mistral Large 3585151474349.9%
o4 Mini High634848464449.8%
MoonshotAI: Kimi K2.5545250494249.4%
WizardLM 2 8x22b605447444249.4%
Claude Opus 4.6515150504549.4%
Ministral 3 3B634848444349.3%
Writer: Palmyra X5585744434148.6%
GPT-5.1534946444447.3%
Minimax M2.5514948464247.2%
Mistral NeMO674944423347.2%
Z.AI GLM 4.6634947383546.5%
Ministral 8B504948464046.5%
Ministral 3B494847473845.9%
DeepSeek V3.2525146414045.8%
Arcee AI: Trinity Mini545345383745.3%
DeepSeek V3.1584343414044.9%
GPT-5.2464544434344.0%
Gemini 3 Flash (Preview)474545443944.0%
Stealth: Aurora Alpha494643414043.8%
Z.AI GLM 4.7464644423743.0%
Gemini 3.1 Pro (Preview)444343424242.8%
Mistral Small 3.2 24B494743413442.5%
Gemini 3 Pro (Preview)474542393441.6%
GPT-5 Nano434241404041.0%
GPT-5434240403840.6%
Z.AI GLM 4.7 Flash474038383840.2%
GPT-5 Mini424140393739.6%
ByteDance Seed 1.6503838373339.3%
Qwen 3.5 397B A17B403935342735.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4.1 Fast100100100999298.2%
Claude 3.5 Haiku1009796918393.4%
GPT-4o, Aug. 6th (temp=1)10010088807388.3%
Llama 3.1 Nemotron 70B10010092756987.1%
Claude Opus 4968876747481.6%
Rocinante 12B979786626080.7%
Llama 3.1 8B1009793605280.4%
Claude 3.5 Sonnet919174726478.5%
Claude Sonnet 4.5838278747177.6%
GPT-4o Mini (temp=1)867474726373.8%
Hermes 3 70B928074724873.1%
Claude Sonnet 4847878626072.6%
GPT-4.1 Nano847875695672.3%
Claude 3.7 Sonnet867270676271.4%
Grok 4838379575471.1%
Hermes 3 405B938266644670.4%
Claude 3 Haiku787169686469.8%
Z.AI GLM 4.5777669666169.7%
DeepSeek V3 (2025-03-24)767170666268.8%
GPT-4.1 Mini767068656268.2%
GPT-4o, May 13th (temp=1)767269685668.1%
Z.AI GLM 5727168666368.0%
Claude Sonnet 4.6777068615766.6%
Grok 4 Fast807264585365.4%
Gemma 3 27B696763616064.0%
Llama 3.1 70B767465594263.4%
Gemma 3 12B896954544762.5%
Cohere Command R+ (Aug. 2024)776866524962.5%
MoonshotAI: Kimi K2.5837958533962.3%
Claude Opus 4.5656362615861.9%
Arcee AI: Trinity Large (Preview)666666575461.7%
GPT-4.1736660554960.9%
Writer: Palmyra X5726956555160.8%
Minimax M2.5815858555160.7%
Gemini 2.5 Flash825956554659.4%
DeepSeek-V2 Chat746461553958.4%
Claude Haiku 4.5635957545357.1%
GPT-5.1595858585257.1%
Gemini 3.1 Pro (Preview)676053505055.9%
Claude Opus 4.6686652484455.6%
o4 Mini656055514254.7%
ByteDance Seed 1.6 Flash865349444154.7%
Qwen 3.5 Plus (2026-02-15)595654525154.4%
Ministral 3 14B636258444253.9%
Ministral 3B615954494653.8%
DeepSeek V3.1745652464153.6%
Gemini 2.5 Flash Lite675650484353.1%
DeepSeek V3.2655351504552.8%
o4 Mini High555552504551.5%
Mistral Medium 3.1665848424251.2%
Arcee AI: Trinity Mini695050444150.8%
Gemma 3 4B565449484750.7%
Mistral Large615249464550.7%
GPT-4o Mini (temp=0)545450464549.9%
WizardLM 2 8x22b605752423749.6%
Z.AI GLM 4.7 Flash575553453749.4%
DeepSeek V3 (2024-12-26)714644414048.5%
Mistral Large 3535247464448.3%
Gemini 3 Pro (Preview)505048444146.6%
Z.AI GLM 4.6714939373345.7%
Ministral 8B524845434045.7%
GPT-4o, Aug. 6th (temp=0)544545423945.1%
GPT-5464645444144.4%
ByteDance Seed 1.6704536353544.1%
Ministral 3 3B464645414143.6%
Mistral Large 2454444434243.5%
Mistral Small Creative474642423843.2%
Qwen 2.5 72B454443424042.7%
GPT-5.2454443413942.5%
Stealth: Aurora Alpha464444393742.1%
Gemini 2.5 Pro494340393842.0%
Mistral NeMO474342413541.5%
GPT-4o, May 13th (temp=0)564538343341.2%
Qwen 3.5 397B A17B513939343138.6%
Gemini 3 Flash (Preview)414137363638.4%
Ministral 3 8B514038322938.1%
Z.AI GLM 4.7414038373438.0%
GPT-5 Mini444239362837.8%
GPT-5 Nano363535343334.5%
Mistral Small 3.2 24B423929252532.0%