Pronoun-first sentence starts

Test: Bad Writing Habits

Avg. Score
74.7%
Scenarios
18

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Claude 3.5 Haiku97.7%$0.003510.8s87%
2Llama 3.1 Nemotron 70B97.4%$0.003831.7s83%
3Llama 3.1 8B98.4%$0.00031.3m87%
4Llama 3.1 70B96.0%$0.001529.4s71%
5Claude 3 Haiku90.0%$0.002514.9s62%
6Mistral Small Creative87.8%$0.00079.1s58%
7Grok 4.1 Fast90.5%$0.001837.8s61%
8Ministral 3B88.2%$0.00018.1s56%
9Ministral 3 14B86.8%$0.000711.7s57%
10DeepSeek V3 (2025-03-24)91.5%$0.001439.4s57%
11Z.AI GLM 4.590.5%$0.005142.1s58%
12Ministral 3 3B85.9%$0.000511.1s52%
13Mistral Medium 3.186.9%$0.004836.5s54%
14GPT-5.293.6%$0.0561.5m75%
15Mistral Large 386.6%$0.003330.3s51%
16Claude Sonnet 490.6%$0.03243.7s58%
17Claude Haiku 4.587.1%$0.01121.6s50%
18Ministral 3 8B83.7%$0.000819.6s49%
19Rocinante 12B86.6%$0.001438.4s49%
20Claude Sonnet 4.591.5%$0.03538.1s55%
21Grok 4 Fast83.1%$0.001724.1s50%
22Qwen 2.5 72B83.5%$0.001036.7s51%
23Mistral Large85.9%$0.01430.9s51%
24Mistral Large 285.8%$0.01329.4s49%
25Ministral 8B81.3%$0.000410.4s44%
26Z.AI GLM 586.8%$0.00841.2m54%
27Qwen 3.5 Plus (2026-02-15)82.2%$0.006031.5s49%
28Claude 3.7 Sonnet88.2%$0.04246.7s55%
29Writer: Palmyra X583.8%$0.01122.0s43%
30Claude Opus 4.588.6%$0.07053.4s56%
31GPT-4o, May 13th (temp=1)81.1%$0.03314.4s42%
32GPT-4.1 Mini78.4%$0.002719.0s35%
33GPT-4o, Aug. 6th (temp=1)78.4%$0.01824.4s38%
34GPT-5 Nano80.9%$0.00421.4m43%
35Hermes 3 405B77.8%$0.003253.2s36%
36Minimax M2.580.5%$0.00341.3m37%
37ByteDance Seed 1.6 Flash75.8%$0.001327.3s31%
38Claude 3.5 Sonnet82.6%$0.04835.5s37%
39Cohere Command R+ (Aug. 2024)77.8%$0.02052.5s36%
40GPT-4o, Aug. 6th (temp=0)75.1%$0.02322.7s32%
41Hermes 3 70B76.7%$0.00101.2m33%
42Arcee AI: Trinity Large (Preview)73.9%$0.000043.6s30%
43Stealth: Aurora Alpha70.6%$0.00009.8s24%
44Claude Opus 4.681.7%$0.0781.2m48%
45Gemma 3 12B70.5%$0.000441.3s26%
46GPT-4o Mini (temp=1)68.7%$0.001234.8s23%
47DeepSeek V3 (2024-12-26)70.7%$0.002154.6s25%
48Gemini 3 Flash (Preview)63.9%$0.007819.6s27%
49Gemini 2.5 Flash66.9%$0.005210.6s19%
50MoonshotAI: Kimi K2.581.0%$0.0193.2m48%
51Gemini 2.5 Flash Lite61.6%$0.00099.5s21%
52DeepSeek-V2 Chat68.9%$0.002153.3s22%
53Claude Sonnet 4.670.1%$0.03139.3s26%
54GPT-4.169.0%$0.01844.7s23%
55WizardLM 2 8x22b70.3%$0.00261.8m29%
56GPT-5.179.1%$0.0541.8m36%
57Arcee AI: Trinity Mini57.4%$0.00039.2s17%
58GPT-4o, May 13th (temp=0)66.4%$0.03514.1s19%
59Gemma 3 27B61.1%$0.000652.6s18%
60o4 Mini61.2%$0.01525.7s17%
61Grok 473.8%$0.0481.7m28%
62Z.AI GLM 4.657.8%$0.006551.5s21%
63GPT-5 Mini59.6%$0.010057.4s20%
64GPT-4o Mini (temp=0)56.7%$0.001234.8s14%
65DeepSeek V3.263.0%$0.00141.9m22%
66Gemma 3 4B52.5%$0.000220.0s11%
67o4 Mini High59.5%$0.02547.2s16%
68GPT-4.1 Nano46.4%$0.000713.3s15%
69Gemini 3.1 Pro (Preview)75.8%$0.1071.8m36%
70Mistral NeMO38.9%$0.000510.1s16%
71Qwen 3.5 397B A17B65.1%$0.0143.0m24%
72Z.AI GLM 4.7 Flash47.9%$0.00171.2m16%
73Gemini 3 Pro (Preview)56.8%$0.05554.4s19%
74Claude Opus 487.3%$0.2091.4m44%
75Gemini 2.5 Pro49.0%$0.03636.2s16%
76Z.AI GLM 4.749.8%$0.0101.4m16%
77DeepSeek V3.151.7%$0.00201.8m15%
78ByteDance Seed 1.655.5%$0.0132.5m17%
79GPT-551.6%$0.0652.8m17%
80Mistral Small 3.2 24B42.5%$0.00695.7m10%
74.70%

Individual Scenarios

Detailed Writing Rules

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
GPT-4.1100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Ministral 3 3B1001001001009999.8%
Claude 3.5 Sonnet1001001001009999.8%
Claude 3 Haiku1001001001009999.7%
Claude Sonnet 4.61001001001009899.6%
Qwen 3.5 Plus (2026-02-15)1001001001009899.5%
Gemini 2.5 Flash1001001001009699.2%
GPT-4.1 Mini1001001001009598.9%
Gemini 2.5 Pro100100100979698.6%
Grok 4 Fast1001001001009298.4%
DeepSeek V3.110010097979798.2%
Arcee AI: Trinity Mini1001001001008997.8%
Gemma 3 4B1001001001008897.7%
GPT-4.1 Nano100100100959397.5%
Hermes 3 405B1001001001008496.9%
ByteDance Seed 1.6 Flash1001001001007294.5%
Cohere Command R+ (Aug. 2024)100100100937994.4%
Ministral 8B1001001001007294.4%
GPT-5.21001001001007094.1%
DeepSeek V3.210010095958093.9%
Gemini 3 Flash (Preview)1001001001006693.3%
WizardLM 2 8x22b1001001001006392.7%
GPT-51001001001005691.2%
Gemma 3 27B100100100876890.9%
Z.AI GLM 4.6100100100975490.1%
ByteDance Seed 1.610010093787589.2%
GPT-5 Mini10010097757188.8%
Arcee AI: Trinity Large (Preview)100100100865487.9%
GPT-4o, May 13th (temp=0)100100100831880.1%
Z.AI GLM 4.7 Flash10010081472470.4%
Gemini 3.1 Pro (Preview)858172555168.7%
Gemini 3 Pro (Preview)1009851442663.8%
Mistral Small 3.2 24B938566222057.1%
Mistral NeMO1001002011647.5%
Z.AI GLM 4.71004233282645.8%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Writer: Palmyra X51001001001009999.7%
Z.AI GLM 51001001001009899.7%
Mistral Large1001001001009799.3%
MoonshotAI: Kimi K2.51001001001009799.3%
Claude Opus 4.51001001001009498.9%
Claude 3.7 Sonnet1001001001009498.9%
Claude Opus 4.61001001001009498.7%
ByteDance Seed 1.6 Flash100100100999498.6%
Ministral 3 3B1001001001009098.1%
Grok 4.1 Fast100100100989297.9%
Hermes 3 405B1001001001008997.8%
Ministral 3 14B100100100969397.7%
GPT-5.2100100100988797.2%
Ministral 8B100100100978897.0%
Llama 3.1 8B1001001001008496.9%
Ministral 3B100100100978796.7%
Stealth: Aurora Alpha1001001001008396.5%
Grok 4 Fast10010098968896.5%
GPT-4o, May 13th (temp=1)10010097948795.7%
Qwen 3.5 Plus (2026-02-15)100100100888494.5%
Qwen 3.5 397B A17B999893918793.4%
WizardLM 2 8x22b10010095908393.4%
GPT-4o, Aug. 6th (temp=1)100100100887692.7%
Ministral 3 8B10010097858192.6%
Llama 3.1 70B100100100946792.2%
GPT-4.1100100100877091.3%
DeepSeek V3 (2024-12-26)100100100806889.4%
Cohere Command R+ (Aug. 2024)10010096965589.3%
Grok 410010092776987.6%
GPT-5 Nano10010088826687.2%
GPT-5.11008484848387.0%
Gemini 2.5 Flash1009994746787.0%
Gemma 3 27B10010084767086.0%
Rocinante 12B1001001001002985.9%
Claude Sonnet 4.610010080796584.6%
Gemini 3.1 Pro (Preview)1009892755584.0%
Gemini 3 Flash (Preview)918585837583.8%
Qwen 2.5 72B1009385805983.5%
DeepSeek V3.21009078757383.1%
DeepSeek-V2 Chat10010087646282.6%
Hermes 3 70B10010098921781.3%
Arcee AI: Trinity Mini10010080606080.1%
o4 Mini10010088772177.2%
o4 Mini High1007573706576.3%
GPT-4o Mini (temp=0)927473676674.3%
GPT-5 Mini1008071655073.3%
Z.AI GLM 4.6937968675472.5%
Arcee AI: Trinity Large (Preview)10010080572472.4%
Gemma 3 4B988564545470.9%
Gemma 3 12B989573454370.8%
DeepSeek V3.11007570664170.5%
GPT-4o, Aug. 6th (temp=0)888762595169.5%
Gemini 2.5 Pro887268595067.5%
GPT-4.1 Nano1007361594467.4%
GPT-5767462515062.4%
ByteDance Seed 1.6847160573862.1%
Gemini 2.5 Flash Lite896060514460.6%
Z.AI GLM 4.7 Flash757157551454.4%
Gemini 3 Pro (Preview)706455353251.2%
GPT-4o, May 13th (temp=0)95874526050.7%
Mistral NeMO72575444045.6%
Z.AI GLM 4.768554641042.0%
Mistral Small 3.2 24B5430160020.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.2100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Llama 3.1 70B1001001001009899.5%
Llama 3.1 8B1001001001009699.2%
Mistral Large 31001001001008296.5%
Claude Sonnet 41001001001007695.2%
Stealth: Aurora Alpha100100100908394.5%
Z.AI GLM 4.51001001001006593.1%
Ministral 3 14B10010097848092.3%
Mistral Small Creative1009696858392.0%
Claude Sonnet 4.51001001001005891.5%
Rocinante 12B1001001001005791.4%
Qwen 2.5 72B1001001001005791.3%
Writer: Palmyra X5100100100975991.2%
GPT-4o, Aug. 6th (temp=0)100100100995691.1%
Claude Opus 4.61001001001004989.8%
DeepSeek V3 (2025-03-24)10010096936089.7%
Ministral 3 3B10010092787388.7%
Ministral 3B1001001001004288.4%
Minimax M2.510010096885687.9%
MoonshotAI: Kimi K2.5100100100696486.6%
Claude Haiku 4.51001001001002584.9%
Claude 3 Haiku969484846684.8%
Claude Opus 4.510010087726384.5%
Ministral 8B100100100852882.6%
Mistral Medium 3.1100100100595482.6%
ByteDance Seed 1.6 Flash100100100881380.2%
GPT-4o, May 13th (temp=0)100100100100080.0%
Claude Opus 4100100100100079.9%
Grok 4.1 Fast10010088654679.8%
o4 Mini High10010010096079.2%
Ministral 3 8B1009379744979.1%
GPT-4.1 Mini988582605676.2%
ByteDance Seed 1.610010066604674.4%
Qwen 3.5 Plus (2026-02-15)1008479683873.8%
Mistral Large10010067525073.7%
Grok 4 Fast10010064534672.7%
Claude 3.7 Sonnet1008367524970.0%
GPT-5 Nano988263633568.3%
WizardLM 2 8x22b1009440393762.1%
Mistral Small 3.2 24B10010061222160.7%
Z.AI GLM 51006653463860.5%
GPT-4.11001001001060.3%
GPT-4o, Aug. 6th (temp=1)10010045391760.2%
Mistral Large 21009344362960.2%
GPT-5.11001001000060.0%
o4 Mini100100971059.6%
GPT-4o Mini (temp=1)1001007417659.4%
GPT-4o, May 13th (temp=1)100896838058.9%
Qwen 3.5 397B A17B896253433857.0%
Cohere Command R+ (Aug. 2024)100964930055.2%
DeepSeek-V2 Chat100856526055.2%
DeepSeek V3 (2024-12-26)100100610052.3%
DeepSeek V3.210088640050.5%
Hermes 3 405B676450353249.6%
Claude 3.5 Sonnet766444441648.9%
Gemini 3.1 Pro (Preview)100554937048.2%
Z.AI GLM 4.61007720201045.4%
Hermes 3 70B99583831045.1%
Gemini 3 Flash (Preview)83603736043.1%
Z.AI GLM 4.7 Flash81645212041.9%
Gemini 2.5 Flash10010080041.6%
Grok 410047464039.5%
GPT-5 Mini10057305138.5%
Mistral NeMO9452332036.3%
GPT-573422820834.2%
Gemini 2.5 Flash Lite9542158032.1%
Arcee AI: Trinity Large (Preview)100500021.1%
Gemini 2.5 Pro691100016.0%
Gemini 3 Pro (Preview)67000013.5%
Gemma 3 12B23155008.6%
Gemma 3 27B2498008.2%
Arcee AI: Trinity Mini1777006.1%
GPT-4o Mini (temp=0)2400004.9%
DeepSeek V3.11400002.9%
Z.AI GLM 4.7700001.4%
GPT-4.1 Nano220000.9%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Claude 3 Haiku1001001001009799.4%
GPT-5 Mini100100100999298.2%
GPT-5.11001001001009198.1%
GPT-4o, Aug. 6th (temp=1)1001001001008997.8%
GPT-4o, Aug. 6th (temp=0)1001001001008797.3%
GPT-4o, May 13th (temp=1)1001001001008797.3%
Claude Opus 4.61001001001008095.9%
Gemini 2.5 Flash Lite100100100918494.9%
Mistral Small 3.2 24B1001001001007194.3%
Minimax M2.51001001001007194.3%
Grok 4 Fast100100100908194.3%
DeepSeek V3.11001001001007194.1%
Hermes 3 70B100100100947593.6%
Qwen 3.5 397B A17B10010096927793.0%
GPT-4o Mini (temp=0)100100100857892.5%
Qwen 2.5 72B100100100807891.7%
Claude Sonnet 4.610010096927091.7%
o4 Mini High1001001001005791.4%
Gemini 2.5 Flash100100100827591.2%
WizardLM 2 8x22b1009896926991.2%
Rocinante 12B10010099906089.7%
o4 Mini100100100736086.7%
Gemini 3 Pro (Preview)1008883827886.2%
DeepSeek V3.2989388767586.1%
Cohere Command R+ (Aug. 2024)10010096785685.9%
Mistral Small Creative10010088766285.1%
Gemini 3 Flash (Preview)1009979766583.8%
Gemma 3 12B10010099605182.0%
Hermes 3 405B10010098703780.9%
GPT-51008980775980.9%
Z.AI GLM 4.71009181745580.2%
ByteDance Seed 1.610010080713777.6%
Z.AI GLM 4.7 Flash908884843776.5%
GPT-4o Mini (temp=1)918971665774.7%
Mistral Large1008468615573.8%
Mistral Large 2100100100452073.0%
Mistral NeMO937271645671.2%
Arcee AI: Trinity Mini10010070444171.0%
Z.AI GLM 4.6968782454470.8%
GPT-4.1888072594969.6%
Arcee AI: Trinity Large (Preview)897167605167.5%
Gemini 3.1 Pro (Preview)10010042414064.6%
Gemma 3 4B1007263454264.3%
Mistral Large 3886865473660.8%
Gemini 2.5 Pro806855494759.7%
Gemma 3 27B1007849431256.4%
GPT-4.1 Nano655044332643.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude 3.5 Haiku1001001001009498.7%
Mistral Small Creative100100100989698.7%
Llama 3.1 70B1001001001009298.3%
Llama 3.1 8B1001001001007594.9%
Grok 4.1 Fast100100100997194.0%
Claude Sonnet 4.5100100100986592.6%
Claude 3 Haiku100100100817992.0%
Z.AI GLM 4.5100100100916791.6%
DeepSeek V3 (2025-03-24)10010097827991.6%
GPT-5.210010095866789.5%
Llama 3.1 Nemotron 70B10010099725184.3%
Claude Sonnet 4100100100744684.0%
GPT-5 Nano10010095733480.4%
Ministral 3 3B10010010096079.1%
Ministral 3 8B10010068646078.4%
Stealth: Aurora Alpha10010078605177.9%
Ministral 3 14B1009993544177.5%
Rocinante 12B1001009673374.6%
Cohere Command R+ (Aug. 2024)10010090523074.3%
Ministral 8B1009487513573.5%
Mistral Large 31001009167071.5%
Claude Haiku 4.51008985463571.0%
Writer: Palmyra X51007670565270.8%
Qwen 3.5 Plus (2026-02-15)988767534970.7%
Claude Opus 4.61006562614666.8%
Mistral Large1007953493863.7%
Mistral Medium 3.1877062563862.6%
GPT-4o, May 13th (temp=0)100767062061.6%
Qwen 2.5 72B90816967061.4%
ByteDance Seed 1.610010071241061.0%
Ministral 3B1009844432060.9%
Z.AI GLM 5100928317960.3%
Hermes 3 70B100905147057.5%
Claude 3.5 Sonnet1009955161657.3%
Claude 3.7 Sonnet968551441157.3%
Minimax M2.5756460511252.4%
GPT-4.1 Mini1007735242051.1%
Claude Opus 4100736615050.9%
MoonshotAI: Kimi K2.5757347322049.4%
GPT-4o, Aug. 6th (temp=1)100814620049.3%
Claude Opus 4.5855840332949.0%
GPT-4o, May 13th (temp=1)100664520947.8%
Grok 4 Fast634744423446.1%
Arcee AI: Trinity Mini84624039045.1%
Claude Sonnet 4.6774747371745.0%
Arcee AI: Trinity Large (Preview)83625027044.4%
Mistral Small 3.2 24B100483818041.0%
DeepSeek-V2 Chat10010000040.0%
GPT-4o, Aug. 6th (temp=0)84484125039.7%
Grok 4664235351839.1%
Mistral Large 278472624034.9%
Mistral NeMO6757415234.2%
WizardLM 2 8x22b65323125030.5%
Gemini 2.5 Flash Lite46393427029.2%
Z.AI GLM 4.653353026028.6%
o4 Mini6352270028.4%
ByteDance Seed 1.6 Flash50443013027.3%
DeepSeek V3.25343263024.8%
Qwen 3.5 397B A17B363326191024.5%
GPT-5 Mini37373315024.3%
GPT-5.164222014123.9%
Z.AI GLM 4.7 Flash4236200019.7%
Gemini 3.1 Pro (Preview)541500013.8%
Gemini 3 Flash (Preview)68000013.5%
Gemma 3 12B491300012.2%
GPT-4o Mini (temp=1)332410011.4%
Gemini 2.5 Flash282620011.3%
DeepSeek V3.125220009.4%
GPT-524128008.5%
Z.AI GLM 4.724160008.0%
GPT-4.13242007.6%
Hermes 3 405B20170007.3%
Gemma 3 4B2810005.9%
GPT-4.1 Nano2330005.3%
Gemini 2.5 Pro2060005.2%
Gemma 3 27B1160003.4%
Gemini 3 Pro (Preview)700001.3%
GPT-4o Mini (temp=0)600001.1%
DeepSeek V3 (2024-12-26)500001.0%
o4 Mini High300000.6%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
GPT-4.1100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Mistral Large 3100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
MoonshotAI: Kimi K2.51001001001009999.8%
Claude Haiku 4.51001001001009999.8%
Mistral Medium 3.11001001001009899.6%
Grok 4100100100999899.5%
Rocinante 12B1001001001009799.5%
Z.AI GLM 4.5100100100999799.3%
Qwen 3.5 Plus (2026-02-15)1001001001009599.0%
Gemma 3 4B1001001001009498.8%
Claude Opus 4.6100100100979698.5%
Mistral Small Creative1001001001009298.4%
Stealth: Aurora Alpha1001001001009198.2%
Ministral 8B1001001001009098.0%
Minimax M2.51001001001008997.7%
GPT-5.2100100100959497.7%
Z.AI GLM 51001001001008797.3%
Hermes 3 405B1001001001008597.0%
Ministral 3 3B1001001001008396.7%
Ministral 3B1001001001008296.3%
ByteDance Seed 1.6 Flash100100100958495.8%
Ministral 3 8B1001001001007294.5%
Ministral 3 14B1001001001007294.4%
GPT-4o, Aug. 6th (temp=1)100100100937793.9%
Mistral Large1001001001007093.9%
Llama 3.1 8B1001001001006893.7%
WizardLM 2 8x22b10010097936891.5%
Qwen 3.5 397B A17B1009793916889.9%
Grok 4 Fast100100100816288.5%
GPT-4o, Aug. 6th (temp=0)100100100766487.9%
Qwen 2.5 72B10010088807087.6%
GPT-4o, May 13th (temp=1)1009190747385.5%
Gemini 3 Flash (Preview)989788747085.3%
Gemini 3.1 Pro (Preview)949386816483.4%
Gemini 2.5 Flash1008783826483.3%
DeepSeek V3.21009685686482.5%
Cohere Command R+ (Aug. 2024)100100100684382.2%
GPT-4o, May 13th (temp=0)1009286844982.1%
Z.AI GLM 4.61009785656181.6%
GPT-4o Mini (temp=1)1008979716781.2%
Gemini 2.5 Flash Lite909085786280.8%
ByteDance Seed 1.61008783815280.5%
Hermes 3 70B1008787754478.4%
Gemma 3 12B10010072615978.3%
GPT-5 Nano928476736477.5%
Arcee AI: Trinity Mini10010087792077.1%
DeepSeek V3 (2024-12-26)100100100542976.7%
Claude Sonnet 4.61009471625075.4%
GPT-5.1807977646272.4%
DeepSeek-V2 Chat10010062484671.3%
o4 Mini938481484770.7%
GPT-4o Mini (temp=0)847770575668.6%
DeepSeek V3.1948070524367.6%
Gemini 2.5 Pro827568684467.4%
GPT-4.1 Nano1006563594866.9%
o4 Mini High867473722666.1%
Arcee AI: Trinity Large (Preview)10010068471064.8%
Gemma 3 27B777770544264.1%
Z.AI GLM 4.7 Flash936462441656.1%
GPT-5 Mini605651493550.0%
GPT-574696033548.2%
Z.AI GLM 4.777705826948.1%
Gemini 3 Pro (Preview)100764317047.3%
Mistral NeMO7770420037.8%
Mistral Small 3.2 24B600001.3%

genre

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Mistral Large 3100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Large 2100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
DeepSeek V3.21001001001009999.7%
GPT-4o, May 13th (temp=1)1001001001009999.7%
GPT-4o Mini (temp=0)1001001001009899.7%
GPT-4.11001001001009899.6%
Claude Sonnet 41001001001009799.5%
Rocinante 12B1001001001009799.4%
Ministral 3 14B1001001001009699.2%
Writer: Palmyra X51001001001009599.1%
Gemini 2.5 Flash1001001001009599.0%
Hermes 3 70B1001001001009498.8%
GPT-5.11001001001009498.7%
Z.AI GLM 51001001001009298.4%
DeepSeek V3 (2024-12-26)1001001001009198.1%
Claude 3.5 Sonnet1001001001008797.3%
GPT-5.2100100100969097.1%
GPT-5 Mini10010099988796.7%
Minimax M2.51001001001008496.7%
GPT-4o, Aug. 6th (temp=1)1001001001008296.3%
Gemma 3 27B1001001001008196.3%
o4 Mini100100100948796.2%
Z.AI GLM 4.7100100100998296.1%
ByteDance Seed 1.6 Flash1001001001007995.9%
Gemma 3 12B1001001001007995.8%
Ministral 8B100100100898895.5%
GPT-4.1 Mini1001001001007595.0%
MoonshotAI: Kimi K2.5100100100938395.0%
Gemini 3 Flash (Preview)10010092929094.8%
Ministral 3 3B100100100928294.7%
Ministral 3 8B100100100957393.7%
Claude 3 Haiku1001001001006593.0%
Mistral Small Creative10010094878493.0%
Llama 3.1 Nemotron 70B1001001001006392.6%
Arcee AI: Trinity Large (Preview)1001001001006392.5%
GPT-51009493908392.1%
GPT-4o, May 13th (temp=0)100100100966391.9%
Gemini 2.5 Flash Lite1009692878491.6%
Stealth: Aurora Alpha100100100887091.5%
Ministral 3B1001001001005791.4%
o4 Mini High999889848490.8%
WizardLM 2 8x22b100100100827290.8%
Gemini 3 Pro (Preview)1009592896788.8%
Claude Sonnet 4.610010088787187.4%
Arcee AI: Trinity Mini10010082777687.0%
Gemma 3 4B10010096706987.0%
DeepSeek V3.11009187817185.8%
Qwen 3.5 397B A17B10010088726885.6%
Gemini 2.5 Pro969387817085.3%
GPT-4.1 Nano968787726781.6%
Z.AI GLM 4.6959381605677.2%
Claude Opus 4.61008481763675.5%
GPT-4o, Aug. 6th (temp=0)1008280733273.4%
Z.AI GLM 4.7 Flash897755544463.9%
Mistral NeMO494536241634.1%
Mistral Small 3.2 24B7239390030.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Grok 4 Fast1001001001009999.9%
Claude 3.7 Sonnet1001001001009899.6%
Grok 41001001001009799.4%
Rocinante 12B1001001001009699.0%
GPT-4.1 Mini1001001001009498.8%
Writer: Palmyra X51001001001009098.0%
Claude 3 Haiku1001001001008096.0%
Hermes 3 405B10010096968795.7%
Arcee AI: Trinity Large (Preview)100100100938495.3%
Claude Haiku 4.51001001001007494.8%
GPT-4o Mini (temp=1)100100100888594.6%
Mistral Small Creative1009594918993.6%
Ministral 3B100100100947493.6%
Mistral Medium 3.11001001001006492.9%
Hermes 3 70B100100100877792.7%
MoonshotAI: Kimi K2.510010097957192.4%
Ministral 3 8B10010098857591.6%
Mistral Large10010095887190.7%
ByteDance Seed 1.6 Flash10010098837190.4%
Mistral Large 210010090897089.9%
WizardLM 2 8x22b100100100826489.2%
Cohere Command R+ (Aug. 2024)100100100736988.4%
GPT-4o, Aug. 6th (temp=0)100100100855587.9%
Ministral 3 14B949291807887.1%
Ministral 8B1009490806886.4%
Qwen 2.5 72B100100100696286.3%
GPT-5.1939089847085.2%
Arcee AI: Trinity Mini10010084726884.8%
Z.AI GLM 51008882787684.7%
GPT-4o, May 13th (temp=1)10010094696084.5%
Mistral Large 310010085726584.5%
DeepSeek V3 (2024-12-26)10010075726682.5%
GPT-5 Nano1009189676181.5%
GPT-5.2888583806880.8%
DeepSeek V3.2848180797680.0%
Gemini 3.1 Pro (Preview)1008478785779.4%
Stealth: Aurora Alpha999686665079.1%
GPT-4.1938877756278.9%
Minimax M2.5999282664677.1%
GPT-4o Mini (temp=0)1009381605177.0%
Gemma 3 12B1008567655875.2%
Gemma 3 27B928985753174.5%
Qwen 3.5 Plus (2026-02-15)887473696874.4%
DeepSeek-V2 Chat100100100551173.3%
GPT-5 Mini957871625271.7%
Claude Sonnet 4.6807573725671.2%
Claude Opus 4.6898175565370.6%
Gemini 2.5 Flash837671694969.5%
Ministral 3 3B927876582666.1%
o4 Mini High906760503760.6%
Z.AI GLM 4.7736961563959.8%
Qwen 3.5 397B A17B696055535157.7%
Z.AI GLM 4.6676157484755.8%
DeepSeek V3.1716559512854.7%
Gemma 3 4B726764363354.4%
Gemini 3 Pro (Preview)686056483753.5%
o4 Mini806059491252.0%
GPT-4.1 Nano645958433251.1%
Gemini 2.5 Flash Lite825342413851.0%
Z.AI GLM 4.7 Flash615943412746.0%
Gemini 2.5 Pro625249412545.8%
Gemini 3 Flash (Preview)624642363443.9%
GPT-5675037242340.1%
ByteDance Seed 1.6734934271238.9%
GPT-4o, May 13th (temp=0)565147181838.0%
Mistral NeMO574239251736.0%
Mistral Small 3.2 24B3500007.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.11001001001009599.1%
Z.AI GLM 510010099968395.5%
Gemma 3 12B1001001001007695.2%
Claude 3.5 Haiku1001001001006993.8%
GPT-5.2100100100976091.5%
GPT-4o, May 13th (temp=1)1001001001004388.6%
Llama 3.1 8B100100100806388.6%
Llama 3.1 70B1001001001002785.4%
Llama 3.1 Nemotron 70B1009082787584.9%
Mistral Large10010075736682.9%
Claude Opus 410010083834081.3%
DeepSeek V3 (2025-03-24)100100100100681.3%
Arcee AI: Trinity Large (Preview)1008371706678.1%
Hermes 3 70B10010090801376.6%
WizardLM 2 8x22b1009887623576.5%
Qwen 2.5 72B10010073595076.3%
Mistral Large 2100100100562576.1%
Claude Sonnet 4.5100100100671175.5%
Claude Opus 4.510010072623874.3%
Claude Opus 4.610010093562274.2%
Mistral Small 3.2 24B100100100521473.2%
Grok 4.1 Fast1009278642972.5%
Claude Sonnet 41008169575071.4%
Z.AI GLM 4.5949468573569.5%
Claude Haiku 4.51007978491363.5%
Ministral 3B1006561543763.5%
Claude 3.7 Sonnet1006559433861.1%
MoonshotAI: Kimi K2.51007257433060.3%
Cohere Command R+ (Aug. 2024)1007455512060.0%
Claude Sonnet 4.610099950058.9%
Grok 4 Fast100746941457.7%
Writer: Palmyra X51001007114057.1%
Claude 3 Haiku1006047453056.5%
Claude 3.5 Sonnet1001004137256.1%
Hermes 3 405B846757353355.1%
Qwen 3.5 Plus (2026-02-15)907143392954.4%
Gemini 3 Pro (Preview)757264431353.3%
GPT-5100585751053.3%
Mistral Medium 3.1888564161153.0%
Rocinante 12B10082728052.3%
Minimax M2.51009327141349.5%
Mistral NeMO1001002220048.4%
DeepSeek V3 (2024-12-26)716057292248.0%
GPT-5 Mini73736726047.9%
GPT-4o, Aug. 6th (temp=1)1005133302347.5%
Grok 4705549332045.2%
GPT-4o Mini (temp=1)100473929744.3%
GPT-4o, May 13th (temp=0)10077420043.7%
Mistral Small Creative66594537742.8%
GPT-4o, Aug. 6th (temp=0)9078400041.6%
Mistral Large 372585223041.0%
Ministral 3 3B1004530161341.0%
ByteDance Seed 1.6 Flash100652810040.5%
Gemini 3.1 Pro (Preview)66524141040.0%
Ministral 3 14B545149241738.9%
Ministral 8B51494536036.1%
ByteDance Seed 1.67753335033.5%
Gemini 2.5 Flash Lite80531110531.7%
Ministral 3 8B55532814531.2%
GPT-4o Mini (temp=0)10027208031.0%
DeepSeek-V2 Chat1002660026.4%
Gemini 3 Flash (Preview)39322927025.6%
GPT-4.18233100025.1%
GPT-5 Nano303029211124.0%
GPT-4.1 Mini6920129723.4%
Arcee AI: Trinity Mini6033200022.5%
Qwen 3.5 397B A17B5528180020.4%
Stealth: Aurora Alpha3814123013.4%
Gemma 3 27B3022110012.7%
Gemini 2.5 Flash322030010.9%
Z.AI GLM 4.62018140010.4%
Z.AI GLM 4.7 Flash51000010.3%
Gemini 2.5 Pro2350005.6%
DeepSeek V3.22300004.6%
GPT-4.1 Nano1660004.4%
Z.AI GLM 4.71600003.2%
o4 Mini1040002.7%
o4 Mini High000000.0%
DeepSeek V3.1000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Mistral Large 2100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Qwen 2.5 72B1001001001009999.8%
Cohere Command R+ (Aug. 2024)1001001001009999.8%
Llama 3.1 70B1001001001009999.8%
GPT-5.2100100100999698.9%
Gemma 3 12B1001001001009498.9%
Z.AI GLM 5100100100989498.4%
Z.AI GLM 4.51001001001009198.2%
Claude 3.5 Haiku1001001001009198.2%
GPT-4o, May 13th (temp=1)1001001001009198.2%
Gemini 3.1 Pro (Preview)1001001001008897.6%
ByteDance Seed 1.6 Flash1001001001008897.6%
Hermes 3 405B1001001001008797.3%
Writer: Palmyra X5100100100949197.0%
GPT-4.1 Mini1001001001008496.9%
Mistral Small Creative1001001001008296.4%
Minimax M2.5100100100988396.3%
Gemma 3 4B100100100919196.3%
DeepSeek-V2 Chat100100100938796.0%
WizardLM 2 8x22b10010097958795.8%
Arcee AI: Trinity Large (Preview)100100100918795.6%
GPT-5 Nano100100100918595.3%
DeepSeek V3.210010097928795.2%
DeepSeek V3 (2024-12-26)1001001001007695.2%
o4 Mini High10010099928394.8%
Ministral 3 8B100100100888594.6%
GPT-4o Mini (temp=0)10010099947894.3%
Qwen 3.5 Plus (2026-02-15)10010098928194.1%
Gemma 3 27B1001001001006793.5%
Claude Sonnet 4.610010099986993.4%
GPT-4o, Aug. 6th (temp=1)1001001001006693.2%
Gemini 2.5 Flash10010094917892.5%
GPT-5 Mini10010089888191.6%
Gemini 2.5 Flash Lite100100100896891.5%
Mistral Large 310010089887189.5%
GPT-4.1100100100816388.8%
Ministral 3 3B1001001001004488.8%
Qwen 3.5 397B A17B979290867688.1%
Gemini 2.5 Pro1009694796987.7%
Gemini 3 Pro (Preview)1009987797287.2%
Mistral Medium 3.11009182818086.7%
GPT-5.11008783807885.7%
Claude Opus 4.61009892765984.8%
Rocinante 12B100100100655684.3%
GPT-4o Mini (temp=1)10010089765383.6%
o4 Mini919179777682.9%
ByteDance Seed 1.610010082755882.9%
DeepSeek V3.11009891734280.8%
Hermes 3 70B100100100682879.2%
Z.AI GLM 4.61009375615977.6%
Z.AI GLM 4.7909079705877.5%
Gemini 3 Flash (Preview)1009876713676.1%
Z.AI GLM 4.7 Flash998779594673.9%
Stealth: Aurora Alpha100938662068.1%
GPT-5807972703867.7%
GPT-4.1 Nano787061502857.4%
Mistral Small 3.2 24B100925331055.1%
Arcee AI: Trinity Mini625249443648.5%
Mistral NeMO100443831744.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Llama 3.1 8B100100100100100100.0%
Llama 3.1 Nemotron 70B1001001001009899.6%
Mistral Large1001001001008997.7%
Llama 3.1 70B10010098919095.9%
Claude 3.5 Haiku10010093796687.5%
Mistral Large 31009893745483.8%
DeepSeek V3 (2025-03-24)10010084725381.9%
Rocinante 12B10010078656181.0%
Ministral 3 3B1008375756880.0%
Mistral Small Creative100100100603579.1%
Claude 3 Haiku997862585470.3%
Ministral 3B958269525169.5%
Mistral Large 210010076412368.2%
Qwen 2.5 72B996161605467.2%
Grok 4 Fast757262555363.4%
Ministral 3 8B1008945393862.5%
GPT-4o, May 13th (temp=1)100877539060.0%
GPT-5.2775755545158.9%
Grok 4.1 Fast706656492954.1%
Ministral 8B846758252251.1%
Claude 3.7 Sonnet685857422449.9%
Gemma 3 27B100100420048.3%
Claude Sonnet 4.51004735302547.4%
WizardLM 2 8x22b787244202046.7%
Z.AI GLM 567635932946.1%
Mistral Small 3.2 24B655858351245.7%
Claude Opus 4.558585747745.5%
Claude Haiku 4.5784442372344.9%
Mistral Medium 3.175733835244.6%
Claude Sonnet 4945434311044.6%
Hermes 3 405B100674212044.1%
Ministral 3 14B645741332042.8%
Qwen 3.5 Plus (2026-02-15)83613426942.3%
Cohere Command R+ (Aug. 2024)75733112438.7%
Hermes 3 70B1009300038.6%
MoonshotAI: Kimi K2.556555422037.3%
Claude Opus 4.67663266034.2%
Mistral NeMO7749413034.0%
Gemini 3 Flash (Preview)543531222232.8%
Claude Opus 46766221031.3%
ByteDance Seed 1.68142330031.0%
Z.AI GLM 4.55147369028.5%
Gemini 3.1 Pro (Preview)434227181028.1%
DeepSeek V3 (2024-12-26)8920148026.1%
GPT-5 Nano6730253024.9%
GPT-4o, Aug. 6th (temp=1)873500024.3%
DeepSeek V3.26527188324.2%
GPT-4o, Aug. 6th (temp=0)694200022.1%
Arcee AI: Trinity Large (Preview)902000022.1%
Grok 4561614131022.0%
GPT-4o, May 13th (temp=0)852000021.0%
Minimax M2.55028144019.2%
GPT-4.1 Mini31291711017.6%
DeepSeek-V2 Chat5715140017.3%
Writer: Palmyra X53330230017.1%
Gemini 2.5 Flash5118160017.1%
Claude 3.5 Sonnet27241511015.5%
GPT-4.123202014015.3%
GPT-5 Mini3127108015.2%
o4 Mini3530100015.0%
Gemini 2.5 Pro2220188414.5%
Gemma 3 12B462320014.2%
Stealth: Aurora Alpha3127130014.2%
Qwen 3.5 397B A17B3024170014.2%
GPT-5.129191111013.9%
Gemini 2.5 Flash Lite3220120012.9%
GPT-4o Mini (temp=1)2020107612.5%
ByteDance Seed 1.6 Flash60000012.0%
Gemini 3 Pro (Preview)351280010.9%
Z.AI GLM 4.7282400010.3%
GPT-4.1 Nano3590008.8%
Claude Sonnet 4.63842008.6%
Arcee AI: Trinity Mini24160008.0%
DeepSeek V3.118184008.0%
GPT-4o Mini (temp=0)2800005.6%
GPT-52000004.0%
Z.AI GLM 4.6910002.1%
Gemma 3 4B500001.0%
o4 Mini High000000.0%
Z.AI GLM 4.7 Flash000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Sonnet 4.5100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Claude Sonnet 41001001001009999.7%
Claude Opus 41001001001009899.7%
ByteDance Seed 1.6 Flash100100100989798.9%
Mistral Small Creative10010098989598.2%
Mistral Large 31001001001009198.2%
Mistral Large1001001001009098.1%
Claude Haiku 4.5100100100969397.8%
Claude Opus 4.51001001001008797.3%
Mistral Medium 3.110010097949396.8%
Grok 4.1 Fast10010099958996.5%
Gemini 2.5 Flash1001001001007995.8%
Rocinante 12B1001001001007995.8%
Grok 4100100100968295.7%
GPT-5.210010098918995.7%
Arcee AI: Trinity Large (Preview)1001001001007995.7%
DeepSeek V3 (2025-03-24)1001001001007695.3%
Mistral Large 21001001001007595.0%
Ministral 3 14B100100100987394.3%
Cohere Command R+ (Aug. 2024)10010095928093.4%
Writer: Palmyra X510010098926791.3%
GPT-5 Nano100100100817390.9%
Minimax M2.510010088838290.6%
Ministral 3 8B10010099985590.3%
GPT-5.11009994866789.0%
Z.AI GLM 510010094806988.5%
Arcee AI: Trinity Mini100100100766588.3%
DeepSeek-V2 Chat10010091846287.3%
GPT-4o, Aug. 6th (temp=0)979791876387.0%
Hermes 3 405B100100100755586.0%
Grok 4 Fast1001001001002685.1%
GPT-4o, May 13th (temp=0)10010081796484.8%
Qwen 2.5 72B1009890746084.4%
Ministral 8B1009992675983.3%
MoonshotAI: Kimi K2.51008787756582.7%
GPT-4o, May 13th (temp=1)1009182716782.3%
GPT-4o Mini (temp=1)1008781736280.7%
Gemini 3.1 Pro (Preview)989391615880.4%
Ministral 3 3B1009479764979.7%
Gemma 3 12B1009785644979.0%
Qwen 3.5 Plus (2026-02-15)1007675726978.2%
Ministral 3B10010075724077.3%
Gemma 3 4B1008979664876.4%
GPT-4.110010063615676.0%
o4 Mini1008681614775.0%
DeepSeek V3.2897976655973.6%
Qwen 3.5 397B A17B847574635970.9%
GPT-4o, Aug. 6th (temp=1)898070704370.3%
GPT-4o Mini (temp=0)787369686170.0%
Gemini 2.5 Flash Lite1007967524768.9%
Z.AI GLM 4.61008764572666.8%
Claude Opus 4.6929161444366.5%
WizardLM 2 8x22b1001007953066.3%
o4 Mini High926867663666.0%
Gemini 2.5 Pro858057524964.5%
GPT-5 Mini827067564163.1%
Gemma 3 27B818050423357.3%
DeepSeek V3.1757459333154.5%
GPT-4.1 Nano85766044754.2%
Gemini 3 Pro (Preview)615452514452.5%
Gemini 3 Flash (Preview)625943403748.1%
Z.AI GLM 4.7615749462046.6%
Hermes 3 70B84625722045.0%
Stealth: Aurora Alpha79773126042.6%
ByteDance Seed 1.6685047201539.8%
Z.AI GLM 4.7 Flash65514137539.7%
Claude Sonnet 4.6603935332838.9%
GPT-5453730281731.4%
Mistral NeMO553125201829.8%
Mistral Small 3.2 24B302500011.1%

Novelcrafter Default Prompt

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Mistral Large 3100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Large 2100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Mistral Large100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Small Creative100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
Rocinante 12B100100100100100100.0%
GPT-5.110010010010010099.9%
Z.AI GLM 51001001001009899.7%
Ministral 3 3B1001001001009899.5%
Claude Haiku 4.51001001001009799.5%
Arcee AI: Trinity Large (Preview)1001001001009699.2%
GPT-4o Mini (temp=0)1001001001009599.0%
Minimax M2.51001001001009298.5%
Claude Opus 4.61001001001009198.3%
Claude Sonnet 41001001001009198.2%
Ministral 3 8B1001001001009198.2%
GPT-4.11001001001009198.1%
Gemini 2.5 Flash100100100959497.9%
Gemma 3 4B1001001001008797.3%
DeepSeek V3.21001001001008296.4%
Arcee AI: Trinity Mini1009999978395.8%
GPT-4o, May 13th (temp=0)1001001001007795.5%
GPT-5 Mini100100100997695.0%
Gemini 3 Pro (Preview)100100100967894.8%
Claude Sonnet 4.6100100100898494.6%
Z.AI GLM 4.71001001001007394.6%
Stealth: Aurora Alpha100100100878594.4%
DeepSeek V3.1100100100878393.9%
o4 Mini100100100996592.7%
Z.AI GLM 4.7 Flash10010092918092.7%
GPT-5100100100936792.1%
ByteDance Seed 1.6 Flash10010095887691.7%
WizardLM 2 8x22b100100100926591.5%
o4 Mini High10010093816988.7%
Gemini 2.5 Flash Lite10010095737188.0%
Gemini 2.5 Pro1009891836186.6%
Z.AI GLM 4.61009675746281.5%
Mistral Small 3.2 24B1006149431854.2%
Mistral NeMO100634132047.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Mistral Large 310010010010010099.9%
Claude Opus 4.51001001001009999.8%
Llama 3.1 70B1001001001009899.6%
Claude 3 Haiku1001001001009799.5%
Hermes 3 70B1001001001009799.4%
Mistral Large 21001001001009799.3%
Z.AI GLM 51001001001009198.2%
Claude Opus 4.6100100100989398.1%
Z.AI GLM 4.51001001001009098.0%
Writer: Palmyra X5100100100959497.8%
DeepSeek V3 (2025-03-24)1001001001008997.8%
GPT-5.11001001001008797.3%
Ministral 3B1001001001008797.3%
Ministral 3 8B10010096959296.8%
Mistral Large1001001001008396.7%
GPT-5 Nano100100100919096.3%
GPT-4o, Aug. 6th (temp=1)100100100968395.7%
Llama 3.1 Nemotron 70B1001001001007795.3%
Arcee AI: Trinity Large (Preview)1001001001007494.8%
Minimax M2.510010096948394.6%
Mistral Small Creative100100100957594.0%
ByteDance Seed 1.6 Flash100100100887692.7%
GPT-5.210010091888492.6%
Gemma 3 27B1009896937291.8%
GPT-4.1 Mini1009892858291.4%
Gemma 3 12B10010099886891.0%
Qwen 3.5 397B A17B10010095837490.4%
GPT-4o Mini (temp=1)1009894856788.6%
Ministral 3 14B100100100855788.5%
Hermes 3 405B100100100994288.3%
Ministral 8B100100100776287.8%
GPT-4o, Aug. 6th (temp=0)1009997925087.5%
GPT-4o, May 13th (temp=1)10010098766387.3%
Grok 41009695776887.1%
Qwen 3.5 Plus (2026-02-15)1009285847086.3%
GPT-4.110010098943986.2%
Cohere Command R+ (Aug. 2024)100100100775386.1%
Ministral 3 3B1008783827685.6%
DeepSeek V3 (2024-12-26)100100100645383.3%
Gemini 2.5 Flash1008377736880.2%
MoonshotAI: Kimi K2.51008880676479.8%
Qwen 2.5 72B1009983704479.3%
Stealth: Aurora Alpha969488643375.0%
DeepSeek V3.11007370705072.6%
Z.AI GLM 4.6877672664769.6%
Gemini 3 Pro (Preview)767268675267.0%
Gemini 3 Flash (Preview)838259565166.5%
DeepSeek V3.2937258534964.8%
Gemini 2.5 Flash Lite937371464164.7%
WizardLM 2 8x22b756766654964.6%
GPT-4o Mini (temp=0)907159573963.3%
Z.AI GLM 4.7776763624462.5%
o4 Mini777566643162.4%
GPT-5 Mini956360533160.2%
GPT-4.1 Nano755959544658.5%
GPT-4o, May 13th (temp=0)776361463556.4%
o4 Mini High918146372455.8%
DeepSeek-V2 Chat10010033301555.5%
Claude Sonnet 4.6876949402955.0%
GPT-5706951383552.4%
Arcee AI: Trinity Mini746556301447.8%
Gemini 2.5 Pro595147423346.3%
Z.AI GLM 4.7 Flash65645820342.1%
Gemma 3 4B615630202037.3%
Mistral NeMO621674017.9%
Mistral Small 3.2 24B5220100016.5%
ByteDance Seed 1.6700001.4%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.1100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5.2100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Claude 3.5 Haiku1001001001009799.4%
Llama 3.1 8B1001001001009398.5%
Claude 3.5 Sonnet100100100878794.7%
Gemini 3.1 Pro (Preview)100100100908093.9%
Claude Sonnet 4.51001001001006793.4%
Claude Opus 41001001001006593.0%
Llama 3.1 70B1001001001003587.1%
Claude Opus 4.610010099765786.4%
Z.AI GLM 51001001001002284.5%
Claude 3.7 Sonnet10010091834984.5%
GPT-4o, Aug. 6th (temp=1)100100100634782.0%
Qwen 2.5 72B100100100100080.0%
Mistral Small 3.2 24B10010010080075.9%
Ministral 3 14B1009368625575.7%
Mistral Medium 3.11007574655974.6%
GPT-5 Nano1009663635174.5%
Z.AI GLM 4.510010067594473.9%
Qwen 3.5 Plus (2026-02-15)10010087601472.2%
Grok 4.1 Fast10010094313070.9%
Writer: Palmyra X5948774524770.7%
Mistral Large 2959467524069.6%
DeepSeek V3 (2025-03-24)10010010047069.3%
Claude Sonnet 410010065502067.0%
Ministral 3 3B1008275571866.5%
Rocinante 12B977164593364.8%
Gemini 3 Pro (Preview)100929132063.0%
MoonshotAI: Kimi K2.51006354523861.3%
Hermes 3 70B837068454061.2%
GPT-4.11009048442060.3%
Arcee AI: Trinity Large (Preview)1008478261159.8%
Ministral 3 8B94916845059.6%
Mistral Small Creative87816955058.3%
Claude Haiku 4.51001004542057.4%
Ministral 3B10010052181356.5%
Grok 4 Fast757253482855.2%
Claude 3 Haiku100855129854.6%
ByteDance Seed 1.6 Flash967260321254.5%
Mistral Large 3100915211050.9%
Hermes 3 405B95636233050.6%
Mistral NeMO100924217050.2%
Gemma 3 12B1001002313748.5%
o4 Mini High100100400048.0%
DeepSeek-V2 Chat100100330046.7%
Gemini 2.5 Flash10096370046.7%
Mistral Large100100320046.4%
GPT-4.1 Mini895730271243.1%
GPT-4o, May 13th (temp=1)100463626141.8%
Gemini 2.5 Flash Lite10083250041.5%
Z.AI GLM 4.7100521817538.4%
GPT-51008740038.1%
GPT-4o, Aug. 6th (temp=0)71454133037.7%
Gemini 3 Flash (Preview)464439351736.2%
Cohere Command R+ (Aug. 2024)8276153034.9%
WizardLM 2 8x22b9850240034.4%
GPT-4o Mini (temp=1)7549348033.2%
Z.AI GLM 4.7 Flash80481811031.3%
Stealth: Aurora Alpha10035200031.0%
Ministral 8B10022110026.6%
DeepSeek V3 (2024-12-26)6726200022.5%
GPT-4o, May 13th (temp=0)100700021.5%
Gemma 3 27B4530300020.9%
o4 Mini100000020.0%
ByteDance Seed 1.6652400018.0%
Qwen 3.5 397B A17B28181411214.6%
Z.AI GLM 4.6302800011.7%
GPT-5 Mini4600009.2%
Grok 43500007.0%
Arcee AI: Trinity Mini3300006.6%
DeepSeek V3.22700005.5%
Gemini 2.5 Pro2070005.3%
GPT-4o Mini (temp=0)1640004.1%
DeepSeek V3.1000000.0%
GPT-4.1 Nano000000.0%
Gemma 3 4B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
GPT-5.2100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Ministral 8B100100100100100100.0%
Ministral 3B100100100100100100.0%
o4 Mini10010010010010099.9%
Gemma 3 12B10010010010010099.9%
Cohere Command R+ (Aug. 2024)1001001001009999.8%
GPT-4o, May 13th (temp=0)100100100999999.6%
Claude Sonnet 4.51001001001009899.6%
DeepSeek V3 (2024-12-26)1001001001009899.6%
Mistral Large 21001001001009699.3%
GPT-5.1100100100989899.2%
Claude Opus 4.61001001001009498.9%
Gemma 3 27B1001001001009398.5%
Gemini 3.1 Pro (Preview)1001001001009398.5%
Qwen 3.5 Plus (2026-02-15)1001001001008897.6%
Stealth: Aurora Alpha1001001001008897.6%
GPT-4o, May 13th (temp=1)1001001001008797.3%
GPT-5 Mini100100100949297.3%
ByteDance Seed 1.6 Flash1001001001008597.0%
Gemini 3 Pro (Preview)1001001001008597.0%
Ministral 3 8B100100100988496.5%
Claude 3.5 Haiku100100100958796.3%
Rocinante 12B100100100978295.9%
Mistral Small Creative100100100918795.5%
MoonshotAI: Kimi K2.51001001001007194.3%
GPT-4o, Aug. 6th (temp=0)100100100997194.0%
GPT-4o Mini (temp=0)1001001001006893.7%
Minimax M2.51001001001006693.1%
Qwen 2.5 72B10010098927693.1%
Mistral Large 3100100100976893.0%
Hermes 3 405B1001001001006492.9%
Arcee AI: Trinity Large (Preview)100100100907392.5%
Mistral Large100100100936792.0%
Claude Haiku 4.5100100100995991.5%
Z.AI GLM 4.7 Flash10010093847991.3%
Mistral Medium 3.110010096887291.1%
Writer: Palmyra X51001001001005591.1%
Gemini 3 Flash (Preview)10010095787589.4%
GPT-4.1 Mini100100100826288.8%
Gemini 2.5 Flash10010085807988.7%
GPT-4o Mini (temp=1)100100100845888.3%
Qwen 3.5 397B A17B10010084787687.5%
Z.AI GLM 4.7100100100874285.7%
Hermes 3 70B10010093765484.6%
Claude Sonnet 4.610010084795984.4%
GPT-5998980777684.3%
DeepSeek-V2 Chat100100100873484.1%
GPT-4.1100100100774283.7%
WizardLM 2 8x22b10010087822779.1%
DeepSeek V3.21008576686579.1%
GPT-4o, Aug. 6th (temp=1)10010089494777.1%
Gemini 2.5 Flash Lite10010081683677.0%
Gemma 3 4B1009794623176.8%
Z.AI GLM 4.61007977635374.4%
Gemini 2.5 Pro949068675073.8%
o4 Mini High1007573714572.7%
DeepSeek V3.11008060484566.6%
Arcee AI: Trinity Mini797171623363.4%
GPT-4.1 Nano817461603662.6%
ByteDance Seed 1.6715449353448.6%
Mistral NeMO515041261636.8%
Mistral Small 3.2 24B1003800027.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Ministral 3B100100100100100100.0%
Llama 3.1 8B100100100999598.8%
Llama 3.1 Nemotron 70B1001001001008496.8%
GPT-5.21001001001006292.3%
Gemini 3.1 Pro (Preview)979589878289.9%
Mistral Large 3100100100875889.0%
Claude 3.5 Haiku10010087807788.8%
Ministral 3 3B1001001001001382.6%
Mistral Medium 3.1100100100851379.7%
Mistral Large 210010076615879.0%
Llama 3.1 70B10010095841077.9%
Z.AI GLM 4.510010076575276.9%
Claude 3 Haiku838077696174.2%
Ministral 3 14B898873685174.0%
Claude Sonnet 41008275752471.1%
Ministral 8B988162555469.9%
Claude 3.7 Sonnet1001007548766.0%
Rocinante 12B1001008041064.2%
GPT-5 Nano100938540063.8%
Claude Haiku 4.510010047363363.2%
Mistral Small Creative928369373162.6%
Grok 4.1 Fast1005655534962.5%
Hermes 3 405B1001005845260.8%
Ministral 3 8B100796937858.6%
Z.AI GLM 5806353504057.3%
Arcee AI: Trinity Large (Preview)757170442556.9%
Gemini 3 Flash (Preview)878146372354.7%
Hermes 3 70B100767020053.1%
DeepSeek V3 (2024-12-26)10075740049.8%
GPT-4o, Aug. 6th (temp=1)8974694147.3%
Claude Sonnet 4.510066662046.8%
GPT-4o, Aug. 6th (temp=0)9668599046.6%
Claude Opus 4.5635543363446.2%
Qwen 3.5 Plus (2026-02-15)724341402944.9%
Claude Opus 4.6715452221342.2%
Mistral Large70685220041.9%
GPT-4o, May 13th (temp=1)78644716041.0%
Qwen 2.5 72B10052379640.9%
Writer: Palmyra X5100612217040.1%
DeepSeek V3 (2025-03-24)605433272339.5%
Grok 4 Fast585645201238.2%
Mistral Small 3.2 24B9943409038.2%
Claude Opus 453484138035.9%
MoonshotAI: Kimi K2.554494824235.3%
Z.AI GLM 4.6913418151234.2%
Gemma 3 12B59403923633.2%
GPT-4.1 Mini6851400031.9%
GPT-5.153332721828.5%
Qwen 3.5 397B A17B503319141225.7%
ByteDance Seed 1.6 Flash4746276025.3%
Cohere Command R+ (Aug. 2024)842370022.7%
Claude 3.5 Sonnet34262020520.9%
DeepSeek V3.25537120020.8%
ByteDance Seed 1.6732000018.7%
Gemma 3 27B30271715017.8%
Z.AI GLM 4.73924159017.5%
Grok 44332101017.1%
DeepSeek-V2 Chat641480017.1%
GPT-4.1691400016.6%
Mistral NeMO483200016.0%
Gemini 3 Pro (Preview)3820120014.1%
Arcee AI: Trinity Mini431200011.1%
Stealth: Aurora Alpha2220130010.9%
o4 Mini53000010.5%
Claude Sonnet 4.624117208.7%
GPT-4o Mini (temp=1)201310008.7%
WizardLM 2 8x22b3440007.7%
Minimax M2.52863007.4%
Gemma 3 4B17140006.2%
o4 Mini High2074006.2%
Gemini 2.5 Flash2370006.0%
GPT-5 Mini2800005.6%
DeepSeek V3.12800005.5%
Z.AI GLM 4.7 Flash2320005.1%
GPT-4o, May 13th (temp=0)2500005.1%
Gemini 2.5 Flash Lite1700003.5%
GPT-4.1 Nano1700003.4%
Gemini 2.5 Pro400000.8%
GPT-5000000.0%
GPT-4o Mini (temp=0)000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Large 2100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Llama 3.1 8B100100100100100100.0%
Mistral Large 31001001001009899.7%
Grok 4 Fast1001001001009899.6%
Ministral 3 14B1001001001009799.5%
Ministral 3 3B1001001001009398.7%
GPT-4.1 Mini100100100989398.2%
GPT-5.21001001001008897.6%
Qwen 3.5 Plus (2026-02-15)1001001001008897.6%
Claude 3.5 Sonnet100100100998396.4%
Claude 3.5 Haiku1001001001008096.0%
Ministral 3B100100100988296.0%
Mistral Large100100100977995.4%
Hermes 3 70B10010097908995.3%
Cohere Command R+ (Aug. 2024)1001001001007695.1%
Minimax M2.5100100100937994.3%
GPT-4o, May 13th (temp=1)10010099878594.0%
Gemini 3.1 Pro (Preview)1001001001006893.6%
Arcee AI: Trinity Mini100100100848193.0%
Z.AI GLM 5100100100927092.4%
GPT-5 Nano100100100847892.3%
Llama 3.1 70B1001001001006292.3%
Mistral Small Creative100100100946792.1%
Claude Opus 4.61001001001005591.0%
Grok 410010098826789.4%
Z.AI GLM 4.6100100100905789.3%
Arcee AI: Trinity Large (Preview)10010091837189.0%
Gemini 2.5 Flash Lite100100100875889.0%
GPT-4o, Aug. 6th (temp=0)10010088787788.6%
Ministral 3 8B10010095905688.2%
Writer: Palmyra X510010092856087.5%
DeepSeek-V2 Chat100100100686786.9%
Gemini 2.5 Flash10010083747386.2%
Stealth: Aurora Alpha100100100755385.6%
Gemma 3 12B10010083806685.6%
MoonshotAI: Kimi K2.5100100100755285.4%
GPT-4.1100100100923385.1%
GPT-5.11009387846184.9%
GPT-4o, May 13th (temp=0)10010076737284.2%
GPT-4o, Aug. 6th (temp=1)10010089814582.8%
Ministral 8B100100100595081.8%
Rocinante 12B100100100792580.8%
Qwen 2.5 72B10010094604980.6%
Gemini 3 Flash (Preview)969090714978.9%
Gemma 3 27B878578756978.8%
Z.AI GLM 4.7888786825078.3%
Gemini 3 Pro (Preview)989590554877.1%
GPT-4o Mini (temp=1)10010081792276.4%
Gemma 3 4B10010090463975.1%
o4 Mini High1009288602572.8%
GPT-4.1 Nano1008578544472.2%
ByteDance Seed 1.6 Flash848366665470.7%
o4 Mini100947674870.1%
DeepSeek V3.2757471676169.6%
DeepSeek V3.1847067633964.6%
Claude Sonnet 4.61008079303063.9%
WizardLM 2 8x22b927369443362.2%
Qwen 3.5 397B A17B92898332059.1%
Mistral Small 3.2 24B787870411456.0%
Gemini 2.5 Pro656250463652.1%
DeepSeek V3 (2024-12-26)1004839322348.3%
GPT-577575744447.9%
Z.AI GLM 4.7 Flash91595138047.7%
GPT-5 Mini765442322746.3%
GPT-4o Mini (temp=0)535240343342.3%
ByteDance Seed 1.6787129141340.8%
Mistral NeMO59454311933.2%