Matches word count

Test: N-Length Sentences

Avg. Score
76.6%
Scenarios
3

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 3.1 Flash Lite (Preview)100.0%$0.00011.2s100%
2Inception Mercury 299.9%$0.00071.2s98%
3GPT-5.4 Nano (Reasoning)100.0%$0.00107.5s100%
4GPT-5.4 Mini (Reasoning, Low)100.0%$0.00256.3s100%
5Gemini 3 Flash (Preview)99.2%$0.00041.9s95%
6Nemotron 3 Super100.0%$0.000016.0s100%
7GPT-5.4 Mini (Reasoning)100.0%$0.00388.1s100%
8GPT-5.4 Nano (Reasoning, Low)99.5%$0.00085.7s95%
9Inception Mercury98.0%$0.00011.6s90%
10GPT-5 Nano100.0%$0.001028.2s100%
11Stealth: Aurora Alpha98.5%1.7s90%
12GPT-5.4 (Reasoning, Low)100.0%$0.009310.4s100%
13o4 Mini100.0%$0.008320.8s100%
14GPT-5.2100.0%$0.01115.0s100%
15Gemini 3 Flash (Preview, Reasoning)100.0%$0.01017.7s100%
16Qwen 3.5 Flash100.0%$0.002438.6s100%
17GPT-5 Mini99.3%$0.004326.2s94%
18Z.AI GLM 5 Turbo100.0%$0.01126.6s100%
19GPT-5.4 (Reasoning)99.8%$0.01319.5s98%
20o4 Mini High100.0%$0.01127.6s100%
21Grok 4.20 (Beta, Reasoning)99.1%$0.0148.0s92%
22MiniMax M2.598.8%$0.003139.2s90%
23MiniMax M2.798.9%$0.004144.4s93%
24Gemini 3 Pro (Preview)99.1%$0.01813.0s93%
25GPT-5.199.7%$0.01726.6s97%
26Claude Opus 4.593.6%$0.00526.8s79%
27Qwen 3.5 35B99.7%$0.01339.4s96%
28ByteDance Seed 2.0 Lite99.3%$0.005158.7s96%
29MoonshotAI: Kimi K2.599.8%$0.008655.1s98%
30GPT-5.4 Mini88.4%$0.00061.1s71%
31GPT-4.188.1%$0.00122.8s73%
32GPT-4o Mini (temp=0)86.1%$0.00019.2s69%
33Mistral Small 4 (Reasoning)90.4%$0.001516.7s70%
34Qwen 3.5 27B100.0%$0.0161.0m100%
35Aion 2.096.6%$0.003450.7s82%
36ByteDance Seed 1.695.8%$0.002731.1s71%
37GPT-5.486.7%$0.00194.4s63%
38GPT-4o, May 13th (temp=0)86.9%$0.00254.6s61%
39Qwen 3.5 122B100.0%$0.02657.1s100%
40Llama 3.1 70B83.6%$0.00022.1s56%
41ByteDance Seed 1.6 Flash87.4%$0.000713.5s57%
42GPT-4o Mini (temp=1)84.5%$0.00016.1s56%
43Qwen 3.5 9B100.0%$0.00132.1m100%
44GPT-599.9%$0.03151.9s99%
45Mistral Medium 3.181.0%$0.00034.3s54%
46Claude Opus 487.2%$0.01513.2s66%
47Z.AI GLM 4.7 Flash91.7%$0.00181.3m77%
48Claude Opus 4.683.8%$0.00557.6s53%
49GPT-4.1 Nano77.8%$0.00012.4s50%
50Z.AI GLM 598.5%$0.0111.6m89%
51Llama 3.1 Nemotron 70B81.2%$0.00015.7s47%
52GPT-4o, May 13th (temp=1)80.5%$0.00214.7s49%
53GPT-4.1 Mini79.6%$0.00022.2s45%
54Grok 482.8%$0.007215.0s56%
55Claude Opus 4.6 (Reasoning)99.8%$0.05227.5s98%
56Claude Sonnet 4.581.9%$0.00336.0s46%
57Claude 3.5 Haiku79.9%$0.00062.5s41%
58Llama 3.1 8B79.5%$0.0000910ms39%
59Nemotron 3 Nano99.7%$0.00232.5m97%
60Gemini 3.1 Pro (Preview)100.0%$0.05143.9s100%
61Claude Sonnet 479.0%$0.00295.2s40%
62Qwen 2.5 72B77.4%$0.000316.6s44%
63Gemini 2.5 Pro85.2%$0.01816.6s57%
64ByteDance Seed 2.0 Mini99.5%$0.00262.7m98%
65Claude 3.7 Sonnet77.0%$0.00325.1s41%
66Stealth: Healer Alpha79.4%$0.000012.1s36%
67GPT-4o, Aug. 6th (temp=1)74.4%$0.00152.4s35%
68Gemma 3 27B72.9%$0.00005.4s31%
69Ministral 3 14B65.9%$0.00012.0s37%
70Gemini 2.5 Flash Lite (Reasoning)70.0%$0.00055.6s34%
71DeepSeek V3 (2025-03-24)70.2%$0.00016.9s28%
72Qwen3 235B A22B Instruct 250766.5%$0.00016.9s32%
73Claude 3.5 Sonnet71.3%$0.00284.6s28%
74Claude Haiku 4.567.1%$0.00092.9s28%
75Claude Sonnet 4.670.1%$0.00244.6s25%
76Grok 4.1 Fast69.6%$0.000610.7s25%
77GPT-4o, Aug. 6th (temp=0)67.5%$0.00132.2s23%
78DeepSeek V3 (2024-12-26)65.5%$0.00026.6s26%
79Writer: Palmyra X562.2%$0.00147.9s32%
80Mistral Small Creative57.8%$0.00011.2s32%
81Claude Sonnet 4.6 (Reasoning)100.0%$0.07849.2s100%
82Z.AI GLM 4.795.1%$0.00652.5m71%
83Stealth: Hunter Alpha67.2%$0.000016.6s24%
84Mistral Small 458.1%$0.00011.6s23%
85Qwen 3.5 397B A17B100.0%$0.0253.0m100%
86GPT-5.4 Nano58.0%$0.00021.4s20%
87Gemma 3 12B62.9%$0.00004.1s15%
88Gemini 2.5 Flash (Reasoning)64.0%$0.00387.4s20%
89Mistral Small 3.2 24B54.7%$0.00012.6s22%
90Grok 4.20 (Beta)58.4%$0.0006898ms17%
91Grok 4 Fast53.1%$0.00034.0s22%
92Gemma 3 4B57.3%$0.00001.8s15%
93Hermes 3 405B58.2%$0.000011.9s18%
94Mistral Large 356.9%$0.00034.6s16%
95Qwen 3.5 Plus (2026-02-15)52.4%$0.00035.9s20%
96Gemini 2.5 Flash Lite51.9%$0.0000785ms16%
97Claude 3 Haiku48.5%$0.00022.7s20%
98LFM2 24B51.5%$0.00003.1s15%
99DeepSeek V3.260.8%$0.000316.5s11%
100DeepSeek V3.153.6%$0.00019.1s14%
101Ministral 3 3B41.8%$0.00001.0s21%
102Cohere Command R+ (Aug. 2024)44.8%$0.00082.0s17%
103Arcee AI: Trinity Large (Preview)42.9%$0.00003.6s18%
104WizardLM 2 8x22b43.7%$0.00028.4s19%
105Gemini 2.5 Flash44.0%$0.00031.3s12%
106Ministral 3 8B41.6%$0.00001.5s10%
107Hermes 3 70B42.1%$0.00015.8s11%
108Qwen 3 32B38.4%$0.000313.8s19%
109Z.AI GLM 4.541.9%$0.00035.8s8%
110Z.AI GLM 4.657.6%$0.003455.9s15%
111Mistral Large 237.5%$0.00072.9s7%
112Mistral Large40.3%$0.00375.3s8%
113Arcee AI: Trinity Mini31.3%$0.00013.2s11%
114Ministral 3B26.1%$0.0000768ms14%
115Ministral 8B28.4%$0.0000904ms7%
116DeepSeek-V2 Chat33.3%$0.000110.5s3%
117Rocinante 12B27.0%$0.00018.4s9%
118Mistral NeMO18.6%$0.00001.9s7%
76.55%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
Z.AI GLM 4.71001001001001001001001001009999.9%
Llama 3.1 70B1001001001001001001001001009899.8%
Llama 3.1 8B1001001001001001001001001009899.8%
Grok 41001001001001001001001001009899.8%
ByteDance Seed 1.61001001001001001001001001009699.6%
Inception Mercury1001001001001001001001001009599.5%
MoonshotAI: Kimi K2.51001001001001001001001001009499.4%
GPT-4o, May 13th (temp=0)10010010010010010010098989899.3%
ByteDance Seed 2.0 Mini10010010010010010010098989799.3%
GPT-5.11001001001001001001001001009299.2%
Mistral Medium 3.11001001001001001009898979799.0%
ByteDance Seed 2.0 Lite100100100100100100100100989298.9%
Claude Opus 4.51001001001001001001001001008998.9%
MiniMax M2.71001001001001001001001001008998.9%
Gemini 2.5 Pro1001001001001001009897979698.9%
GPT-4o, May 13th (temp=1)1001001001001001009897969698.7%
GPT-5.4 Mini100100100100100989898979698.6%
GPT-4o Mini (temp=0)10098989898989898989698.3%
Gemini 2.5 Flash Lite (Reasoning)1001001009998989897979698.3%
GPT-4.1 Mini10010010010099989897969398.1%
Gemma 3 27B100100100100100989896969398.0%
Claude Opus 4.61001001001001001009895949397.9%
Stealth: Aurora Alpha100100100100100100100100918997.9%
GPT-4o Mini (temp=1)1001001001001001009695949397.8%
Claude Opus 41001001001001001009795949397.8%
Claude 3.7 Sonnet1001001001001001001001001007897.8%
Claude 3.5 Sonnet10010010010010010010096938797.6%
Gemini 2.5 Flash (Reasoning)10099999999979796959597.5%
GPT-5.41001001001001001009896918997.5%
Aion 2.010010010010010010010097908397.1%
GPT-4.1 Nano100100100100100989796918997.1%
Mistral Small 4 (Reasoning)10010010010098979694949196.9%
DeepSeek V3 (2025-03-24)100100100100100989898888196.3%
Z.AI GLM 4.7 Flash10010010010010010010096878096.3%
Gemma 3 12B1001001009896969696938796.0%
Z.AI GLM 4.610098989897969493929095.7%
Claude Sonnet 410010010010097979392918695.6%
GPT-4o, Aug. 6th (temp=1)100100999996959493918995.5%
Claude Sonnet 4.510010010010099969491898695.5%
Gemma 3 4B1001001009898989493897794.6%
DeepSeek V3 (2024-12-26)1001001001001001009693867094.5%
GPT-4o, Aug. 6th (temp=0)9896969493939393939294.2%
Claude 3.5 Haiku10098989692929191919193.9%
GPT-4.110098989897969689887893.7%
DeepSeek V3.2100100100100100999387787893.6%
Z.AI GLM 4.51001001009898988987828193.2%
DeepSeek V3.1100100999897979085848293.2%
Claude Haiku 4.59999999897929088868493.1%
Qwen3 235B A22B Instruct 25079998949393929190868091.6%
Mistral Large 39693939391918989898991.4%
Mistral Small 410096969693929187827690.8%
Gemini 2.5 Flash Lite10098969593918986867290.7%
LFM2 24B9392929291918989898590.2%
Grok 4.20 (Beta)10010010010098988381786389.9%
Qwen 2.5 72B1001001001001001001009593088.9%
Stealth: Hunter Alpha1001001009894918077736788.0%
ByteDance Seed 1.6 Flash100100969191898785726087.0%
Mistral Large9695939089898382797386.9%
DeepSeek-V2 Chat10093918989828280787285.6%
Ministral 3 14B9391908988878080797284.9%
Mistral Small 3.2 24B9593919087858380776384.6%
Gemini 2.5 Flash9898959393928884644084.3%
Hermes 3 70B100100989290878769585683.7%
Grok 4.1 Fast10099918078787878787883.7%
Hermes 3 405B9090888785848380766983.1%
Qwen 3.5 Plus (2026-02-15)10091827878787878787881.9%
GPT-5.4 Nano8987857878787875727078.9%
Writer: Palmyra X510098969083827876731178.6%
Stealth: Healer Alpha100100100100100100100787178.5%
Arcee AI: Trinity Large (Preview)9387818180777269665676.2%
Grok 4 Fast8978787878787070675373.8%
Mistral Small Creative8080787272727272706373.3%
Mistral Large 29191918987827060521673.0%
Cohere Command R+ (Aug. 2024)9489827876766553513669.9%
Ministral 3 3B9384747464635656525066.5%
Claude 3 Haiku898581787272686547165.8%
WizardLM 2 8x22b9189898581686450231065.0%
Ministral 8B100100787864564536322061.0%
Arcee AI: Trinity Mini9789767363534545272459.3%
Rocinante 12B8878726662595940342258.1%
Ministral 3 8B9693908555363497751.1%
Ministral 3B716464645252403720046.5%
Qwen 3 32B90846754342922131039.5%
Mistral NeMO72474537352221171129.8%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-51001001001001001001001001009899.8%
GPT-5 Mini1001001001001001001001001009799.7%
Claude Opus 4.51001001001001001001001001009799.7%
Inception Mercury 21001001001001001001001001009699.6%
ByteDance Seed 2.0 Lite100100100100100100100100989799.5%
Grok 4.20 (Beta, Reasoning)1001001001001001001001001009599.5%
Claude Opus 4.6 (Reasoning)100100100100100100100100989499.3%
Inception Mercury100100100100100100100100969699.2%
Nemotron 3 Nano1001001001001001001001001009199.1%
ByteDance Seed 2.0 Mini10010010010010010010097979799.1%
Qwen 3.5 35B1001001001001001001001001009099.0%
Claude Opus 41001001001001001001001001008898.8%
MiniMax M2.51001001001001001001001001008898.8%
Aion 2.0100100100100100100100100979098.7%
Llama 3.1 Nemotron 70B1001001001001001009797979598.7%
MiniMax M2.7100100100100100100100100898997.8%
Z.AI GLM 51001001001001001001001001007497.4%
Claude Sonnet 4.51001001001001001009897908797.2%
Claude 3.5 Haiku100100100100100989595949097.1%
Llama 3.1 70B1001001001001001009795878496.4%
Llama 3.1 8B10010010010097969595917795.1%
Claude Opus 4.61001001001001001009998767494.7%
GPT-4o, Aug. 6th (temp=1)10097959595959595909094.6%
GPT-5.41001001009795959490878494.2%
Gemini 2.5 Pro1001001009696959190858493.9%
GPT-4o, Aug. 6th (temp=0)10095959592929292909093.3%
Claude 3.7 Sonnet10096959494939393918193.1%
GPT-4o Mini (temp=1)9795959592929292908392.4%
GPT-4.1 Mini9797959292929190888492.0%
GPT-4.110097979592929290837991.8%
Claude Sonnet 4100100100100100979088746991.8%
Claude 3.5 Sonnet100100979592929086837891.3%
Grok 4.1 Fast100100979492918785848291.2%
Z.AI GLM 4.710010010010010010010090743990.3%
Gemma 3 27B9592929292909087848389.7%
Z.AI GLM 4.7 Flash10010010010096967977747489.7%
DeepSeek V3.2100100979595909090725988.8%
GPT-4o Mini (temp=0)9290909090909087848488.6%
GPT-4o, May 13th (temp=0)9797979792928787706388.2%
Hermes 3 405B9797929288878787846787.9%
ByteDance Seed 1.610010010010010010010095424187.8%
Gemma 3 12B9592929090878785847487.6%
GPT-5.4 Nano9795929290878484817187.4%
GPT-5.4 Mini9494939189898786856587.3%
GPT-4o, May 13th (temp=1)9592929190878380807286.1%
Stealth: Hunter Alpha100100999392909090493283.4%
Qwen 2.5 72B100100929190888583653783.3%
Mistral Medium 3.19592909090797977726883.2%
DeepSeek V3 (2025-03-24)9797929284747474727283.1%
ByteDance Seed 1.6 Flash1001001001009594837776382.8%
Grok 410097878784817777745682.3%
Gemini 2.5 Flash (Reasoning)9789878685847875716581.7%
Stealth: Healer Alpha100100100100100957971393081.4%
Mistral Small 4 (Reasoning)100100979087807565585080.2%
DeepSeek V3 (2024-12-26)9290898787827770655979.9%
Claude Haiku 4.58989868682777474716779.5%
Claude Sonnet 4.610097747474747474747478.9%
Mistral Small Creative9090908780797570656378.9%
Mistral Large 39090908484848266595978.8%
Grok 4 Fast9793929087847268483976.8%
GPT-4.1 Nano8484838382777572665576.1%
Ministral 3 14B9288858076757271695275.9%
Gemma 3 4B8888878077757265635274.8%
Grok 4.20 (Beta)9694797974706767626274.8%
Writer: Palmyra X58987877876757373544873.9%
Qwen3 235B A22B Instruct 25079592908785797362432373.0%
Claude 3 Haiku9784838278757463613172.9%
Ministral 3 8B9087858480727263504272.6%
Qwen 3.5 Plus (2026-02-15)10084847472727065523570.9%
Mistral Small 49580807876756159484569.8%
Gemini 2.5 Flash Lite (Reasoning)9587878483756843403069.2%
DeepSeek V3.18484838279756739373166.2%
Z.AI GLM 4.610093878074705542351865.2%
Mistral Small 3.2 24B8079777064595752514863.8%
LFM2 24B7675756867585757534663.3%
Cohere Command R+ (Aug. 2024)9584817070595656181060.0%
WizardLM 2 8x22b7774666659575452462057.1%
Gemini 2.5 Flash Lite8574736161605039372256.2%
Ministral 3 3B827865645856545315752.9%
Arcee AI: Trinity Large (Preview)7167655958504541393252.5%
Gemini 2.5 Flash6962625452504541271447.6%
Qwen 3 32B7366634645373427241643.1%
Mistral Large 28173574643281919171439.6%
Hermes 3 70B8157424140363087634.6%
Mistral Large655151423826212116333.4%
Z.AI GLM 4.556544839393925118532.5%
Arcee AI: Trinity Mini64525138343124157732.3%
Mistral NeMO574845242415141110925.7%
Ministral 3B39323130272626217624.5%
Rocinante 12B5643353126211431023.1%
Ministral 8B49363428281615100021.6%
DeepSeek-V2 Chat2523181815131087414.2%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
Inception Mercury 2100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Nemotron 3 Nano100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)1001001001001001001001001009499.4%
ByteDance Seed 2.0 Lite1001001001001001001001001009399.3%
GPT-5.4 Nano (Reasoning, Low)1001001001001001001001001008698.6%
GPT-5 Mini1001001001001001001001001008398.3%
Z.AI GLM 51001001001001001001001001008098.0%
Grok 4.20 (Beta, Reasoning)1001001001001001001001001007897.8%
MiniMax M2.51001001001001001001001001007597.5%
Stealth: Aurora Alpha1001001001001001001001001007597.5%
Gemini 3 Flash (Preview)1001001001001001009595949097.5%
Gemini 3 Pro (Preview)100100100100100100100100938097.4%
Inception Mercury10010010010010010010084848495.2%
Z.AI GLM 4.71001001001001001001001001005195.1%
Mistral Small 4 (Reasoning)10010010010094939090898494.1%
Aion 2.010010010010010010010093925493.9%
ByteDance Seed 1.6 Flash1001001001001001008783787592.3%
Z.AI GLM 4.7 Flash100100100100100939074746089.2%
Claude Opus 4.510096888884807575746082.1%
GPT-5.4 Mini10092878378767474705979.3%
GPT-4.19392858484807272675778.7%
Stealth: Healer Alpha1001001001001001009237342078.3%
GPT-4o, May 13th (temp=0)100100888580807364431873.2%
GPT-4o Mini (temp=0)7272727272727272726871.3%
GPT-5.410090807067665959514068.4%
Grok 48478766968686761543966.3%
Claude Opus 48375706666616161565365.1%
GPT-4o Mini (temp=1)10080767270655955342363.2%
Gemini 2.5 Pro8380808078595751382062.7%
Mistral Medium 3.17672686867615857542960.9%
GPT-4.1 Nano7676766564595957383060.1%
Qwen 2.5 72B10061616161616161403560.0%
Claude Opus 4.610093806155474737373258.9%
GPT-4o, May 13th (temp=1)9287756764544839301156.7%
Llama 3.1 70B7469605854535046433854.5%
Claude Sonnet 4.5927269655958494220252.9%
Claude Sonnet 4917876726649321712349.7%
GPT-4.1 Mini878259595142383432448.8%
Claude 3.5 Haiku848483564947392416348.6%
Llama 3.1 Nemotron 70B6660535245393937292944.9%
Llama 3.1 8B847474484533272421543.6%
Gemini 2.5 Flash Lite (Reasoning)6765555352493525131042.4%
Claude 3.7 Sonnet595352474745452724240.1%
Ministral 3 14B65514946444232324336.8%
Qwen3 235B A22B Instruct 2507755857393931241312034.8%
Writer: Palmyra X571585144383722148034.2%
Grok 4.1 Fast1009585211915200033.8%
GPT-4o, Aug. 6th (temp=1)595752463427221412933.2%
Qwen 3 32B56554847382824207332.5%
Claude Sonnet 4.687773432232322151031.4%
DeepSeek V3 (2025-03-24)74585139272418118031.1%
Gemma 3 27B6056534744281722230.9%
Stealth: Hunter Alpha83726656250000030.3%
Claude Haiku 4.5676657503610000028.7%
Claude 3.5 Sonnet51474540191613136125.1%
DeepSeek V3 (2024-12-26)49443521211918102022.0%
Mistral Small Creative44402827221816133021.1%
Mistral Small 3.2 24B5147271975100015.7%
GPT-4o, Aug. 6th (temp=0)2519171513131212121114.9%
Mistral Small 449251916139600013.7%
Gemini 2.5 Flash (Reasoning)5134121096310012.8%
Z.AI GLM 4.659367653200011.9%
Grok 4.20 (Beta)661511533200010.4%
Gemini 2.5 Flash Lite4734422000008.9%
WizardLM 2 8x22b41211455110008.9%
Grok 4 Fast33231587100008.6%
Hermes 3 70B27161587600007.9%
GPT-5.4 Nano4711444331007.7%
Ministral 3B231915114000007.3%
Claude 3 Haiku271513104000006.9%
Ministral 3 3B23191042200005.9%
Gemma 3 12B1313992220005.1%
Cohere Command R+ (Aug. 2024)2519000000004.5%
Qwen 3.5 Plus (2026-02-15)2312332000004.4%
Hermes 3 405B2016000000003.6%
Gemma 3 4B1210210000002.5%
Ministral 8B137221000002.5%
Arcee AI: Trinity Mini88700000002.2%
DeepSeek V3.176000000001.3%
Ministral 3 8B91000000001.0%
LFM2 24B62000000000.9%
Mistral Large60000000000.6%
Mistral Large 321110000000.5%
Mistral NeMO20000000000.2%
DeepSeek V3.210000000000.1%
Gemini 2.5 Flash00000000000.0%
Arcee AI: Trinity Large (Preview)00000000000.0%
Rocinante 12B00000000000.0%
DeepSeek-V2 Chat00000000000.0%
Z.AI GLM 4.500000000000.0%
Mistral Large 200000000000.0%