Matches word count

Test: N-Length Sentences

Avg. Score
70.7%
Scenarios
3

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 3 Flash (Preview)98.9%$0.00041.9s95%
2Stealth: Aurora Alpha98.2%1.7s90%
3GPT-5 Nano99.9%$0.001028.2s99%
4GPT-5 Mini98.6%$0.004326.2s93%
5GPT-5.299.7%$0.01115.0s98%
6o4 Mini97.9%$0.008320.8s91%
7Minimax M2.598.4%$0.003139.2s90%
8Claude Opus 4.593.5%$0.00526.8s79%
9o4 Mini High98.2%$0.01127.6s92%
10GPT-4.188.0%$0.00122.8s71%
11Gemini 3 Pro (Preview)99.1%$0.01813.0s93%
12MoonshotAI: Kimi K2.599.2%$0.008655.1s95%
13GPT-4o Mini (temp=0)86.1%$0.00019.2s69%
14GPT-5.198.9%$0.01726.6s94%
15GPT-4o, May 13th (temp=0)87.0%$0.00254.6s60%
16Llama 3.1 70B84.3%$0.00022.1s57%
17GPT-4o Mini (temp=1)84.8%$0.00016.1s57%
18ByteDance Seed 1.6 Flash85.1%$0.000713.5s55%
19Mistral Medium 3.180.5%$0.00034.3s52%
20GPT-4o, May 13th (temp=1)82.0%$0.00224.7s52%
21ByteDance Seed 1.688.0%$0.002731.1s58%
22GPT-4.1 Mini80.3%$0.00022.2s45%
23GPT-4.1 Nano77.1%$0.00012.4s48%
24Claude Opus 4.683.8%$0.00557.6s53%
25Z.AI GLM 4.7 Flash91.0%$0.00181.3m74%
26Claude Opus 487.7%$0.01513.2s67%
27Llama 3.1 Nemotron 70B79.8%$0.00015.7s43%
28Llama 3.1 8B79.9%$0.0000910ms40%
29Z.AI GLM 598.5%$0.0111.6m89%
30Grok 482.8%$0.007215.0s56%
31Claude Sonnet 4.581.0%$0.00336.0s45%
32Claude 3.5 Haiku78.9%$0.00062.5s39%
33GPT-599.6%$0.03151.9s98%
34Qwen 2.5 72B77.3%$0.000316.6s44%
35Claude 3.7 Sonnet77.1%$0.00325.1s42%
36Claude Sonnet 478.2%$0.00295.2s39%
37Gemini 2.5 Pro87.0%$0.01816.6s64%
38GPT-4o, Aug. 6th (temp=1)74.8%$0.00152.4s37%
39Gemma 3 27B73.4%$0.00005.4s33%
40Ministral 3 14B66.1%$0.00012.0s38%
41DeepSeek V3 (2025-03-24)71.1%$0.00016.9s30%
42Claude 3.5 Sonnet72.3%$0.00284.6s29%
43Claude Haiku 4.567.0%$0.00092.9s27%
44Claude Sonnet 4.670.5%$0.00244.6s25%
45Grok 4.1 Fast69.7%$0.000610.7s26%
46DeepSeek V3 (2024-12-26)65.7%$0.00026.6s26%
47GPT-4o, Aug. 6th (temp=0)67.5%$0.00132.2s24%
48Mistral Small Creative58.0%$0.00011.2s32%
49Writer: Palmyra X562.8%$0.00147.9s32%
50Gemini 3.1 Pro (Preview)100.0%$0.05143.9s100%
51Z.AI GLM 4.795.1%$0.00652.5m71%
52Gemma 3 12B63.0%$0.00004.1s15%
53Mistral Small 3.2 24B54.7%$0.00012.6s22%
54Grok 4 Fast53.2%$0.00034.0s22%
55Gemma 3 4B57.5%$0.00001.8s15%
56Hermes 3 405B58.3%$0.000011.9s18%
57Mistral Large 357.0%$0.00034.6s16%
58Qwen 3.5 Plus (2026-02-15)52.4%$0.00035.9s20%
59Gemini 2.5 Flash Lite52.6%$0.0000785ms16%
60Claude 3 Haiku48.4%$0.00022.7s20%
61DeepSeek V3.260.8%$0.000316.5s11%
62DeepSeek V3.154.0%$0.00019.1s15%
63Qwen 3.5 397B A17B100.0%$0.0253.0m100%
64Ministral 3 3B41.8%$0.00001.0s21%
65Cohere Command R+ (Aug. 2024)44.9%$0.00082.0s17%
66Arcee AI: Trinity Large (Preview)42.9%$0.00003.6s18%
67WizardLM 2 8x22b43.8%$0.00028.4s19%
68Gemini 2.5 Flash44.0%$0.00031.3s12%
69Ministral 3 8B41.9%$0.00001.5s10%
70Hermes 3 70B41.9%$0.00015.8s11%
71Z.AI GLM 4.542.0%$0.00035.8s8%
72Z.AI GLM 4.658.2%$0.003455.9s16%
73Mistral Large 237.4%$0.00072.9s7%
74Arcee AI: Trinity Mini31.3%$0.00013.2s12%
75Mistral Large40.4%$0.00375.3s8%
76Ministral 3B26.5%$0.0000768ms15%
77Ministral 8B28.2%$0.0000904ms7%
78DeepSeek-V2 Chat33.3%$0.000110.5s3%
79Rocinante 12B27.1%$0.00018.4s9%
80Mistral NeMO18.6%$0.00001.9s7%
70.71%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Minimax M2.5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Z.AI GLM 4.71001001001001001001001001009999.9%
Llama 3.1 70B1001001001001001001001001009899.8%
Llama 3.1 8B1001001001001001001001001009899.8%
o4 Mini1001001001001001001001001009899.8%
Grok 41001001001001001001001001009899.8%
GPT-5.21001001001001001001001001009799.7%
ByteDance Seed 1.61001001001001001001001001009699.6%
MoonshotAI: Kimi K2.51001001001001001001001001009499.4%
GPT-4o, May 13th (temp=0)1001001001001001009998989899.3%
Mistral Medium 3.11001001001001001009898979799.0%
GPT-5.11001001001001001009999999298.9%
Claude Opus 4.51001001001001001001001001008998.9%
Gemini 2.5 Pro1001001001001001009897979698.9%
GPT-4o, May 13th (temp=1)1001001001001001009897969698.7%
GPT-4o Mini (temp=0)10098989898989898989698.3%
GPT-4.1 Mini10010010010099989897969398.1%
Gemma 3 27B100100100100100989896969398.0%
Claude Opus 4.61001001001001001009895949397.9%
GPT-4o Mini (temp=1)1001001001001001009695949397.8%
Claude Opus 41001001001001001009795949397.8%
Claude 3.7 Sonnet1001001001001001001001001007897.8%
Stealth: Aurora Alpha10010010010010010010098918997.8%
Claude 3.5 Sonnet1001001001001001009896938797.3%
GPT-4.1 Nano100100100100100989796918997.1%
DeepSeek V3 (2025-03-24)10010010010010010010098888196.7%
Z.AI GLM 4.7 Flash10010010010010010010096878096.3%
Gemma 3 12B1001001009896969696938796.0%
Z.AI GLM 4.610098989897969493929095.7%
Claude Sonnet 410010010010097979392918695.6%
Claude Sonnet 4.510010010010099969491898695.5%
GPT-4o, Aug. 6th (temp=1)100100999796959493918995.4%
DeepSeek V3 (2024-12-26)1001001001001001009893867094.7%
Gemma 3 4B1001001009898989493897794.6%
GPT-4o, Aug. 6th (temp=0)9896969393939393929294.0%
Claude 3.5 Haiku10098989692929191919193.9%
DeepSeek V3.1100100999898979085848293.3%
Z.AI GLM 4.51001001009898988987828193.2%
Claude Haiku 4.59999999897929088868493.1%
DeepSeek V3.2100100100100100998887787893.0%
GPT-4.110098989897969689887092.9%
Mistral Large 39693939391918989898991.4%
Gemini 2.5 Flash Lite10098969593918986867290.7%
Qwen 2.5 72B1001001001001001001009593088.9%
Mistral Large9695939389898382797387.1%
ByteDance Seed 1.6 Flash100100969191898785726087.0%
DeepSeek-V2 Chat10093918989828280787285.6%
Mistral Small 3.2 24B9593919087858380776384.6%
Gemini 2.5 Flash9898959393928884644084.3%
Ministral 3 14B9190888887878080797284.3%
Hermes 3 70B100100989290878769585683.7%
Grok 4.1 Fast10099918078787878787883.7%
Hermes 3 405B9090888785848380766983.1%
Qwen 3.5 Plus (2026-02-15)10091827878787878787881.9%
Writer: Palmyra X510098959083827876701178.2%
Arcee AI: Trinity Large (Preview)9387818180777269665676.2%
Grok 4 Fast8978787878787070675373.8%
Mistral Small Creative8080787272727272706373.3%
Mistral Large 29191918987827060521673.0%
Cohere Command R+ (Aug. 2024)9489787876766553513669.6%
Ministral 3 3B9384747464635656525066.5%
WizardLM 2 8x22b9189898581686450231065.0%
Claude 3 Haiku898178777272686547165.0%
Ministral 8B100100787864564536322061.0%
Arcee AI: Trinity Mini9789767363534545272459.3%
Rocinante 12B8878726662595940342258.1%
Ministral 3 8B9693908555363497751.1%
Ministral 3B716464645252403720046.5%
Mistral NeMO72474537352221171129.8%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-5.11001001001001001001001001009899.8%
Gemini 3 Flash (Preview)1001001001001001001001001009899.8%
Claude Opus 4.51001001001001001001001001009799.7%
GPT-5 Nano1001001001001001001001001009799.7%
GPT-5100100100100100100100100999899.6%
GPT-5 Mini100100100100100100100100989799.5%
o4 Mini High10010010010010010010097969699.0%
Claude Opus 41001001001001001001001001008898.8%
o4 Mini1001001001001001009797979798.8%
Llama 3.1 Nemotron 70B100100100100100979797959598.2%
Minimax M2.51001001001001001009897958897.7%
Z.AI GLM 51001001001001001001001001007497.4%
Claude 3.5 Haiku10010010010097979595959297.2%
Claude Sonnet 4.5100100100100100989796908796.8%
Llama 3.1 70B10010010010097979795878796.1%
Llama 3.1 8B1001001009897969593917794.7%
Claude Opus 4.61001001001001001009998767494.7%
GPT-4o, Aug. 6th (temp=1)10097959595929290909093.6%
Gemini 2.5 Pro10097969695959388888493.3%
GPT-4o, Aug. 6th (temp=0)10095959592929292909093.3%
GPT-4.1 Mini9797969592929290888492.4%
GPT-4o Mini (temp=1)9795959592929292908392.4%
Claude 3.7 Sonnet10094949393939391918192.3%
Grok 4.1 Fast100100979594919085848492.0%
GPT-4.110095959592929290837991.3%
Claude 3.5 Sonnet100100979592909086837891.0%
Claude Sonnet 410010010010097959088746490.8%
Z.AI GLM 4.710010010010010010010090743990.3%
Gemma 3 27B9592929292929087878090.0%
Z.AI GLM 4.7 Flash10010010010096967977747489.7%
DeepSeek V3.2100100979792929090755989.3%
GPT-4o Mini (temp=0)9290909090909087848488.6%
GPT-4o, May 13th (temp=0)9797979792928787706388.2%
Hermes 3 405B9797929290888787846788.2%
Gemma 3 12B9592929090878785847487.6%
GPT-4o, May 13th (temp=1)9794929290878380806586.1%
DeepSeek V3 (2025-03-24)9797929284797775757484.3%
Qwen 2.5 72B10096949291888583653783.1%
Grok 410095848484848277745682.2%
Mistral Medium 3.19592908579797877726881.6%
ByteDance Seed 1.6 Flash10010095959490837776381.2%
Claude Haiku 4.58989868682777474746779.8%
DeepSeek V3 (2024-12-26)9290898787827768655979.6%
ByteDance Seed 1.6100100959486837976424179.5%
Mistral Large 39090908784848266595979.1%
Claude Sonnet 4.610097747474747474747478.9%
Mistral Small Creative9090908780797570656378.9%
Grok 4 Fast9392909090907268483576.7%
Ministral 3 14B9288858076757372695276.2%
Gemma 3 4B8888878077757265635274.8%
Writer: Palmyra X59089877978777373544874.8%
GPT-4.1 Nano8383828079777268665574.5%
Claude 3 Haiku9784848383757463612873.3%
Ministral 3 8B9287878578757263504273.2%
Qwen 3.5 Plus (2026-02-15)10084847472727065523570.9%
DeepSeek V3.18584828179757346373167.3%
Z.AI GLM 4.610096908077705542351866.1%
Mistral Small 3.2 24B8079777064595752514863.8%
Cohere Command R+ (Aug. 2024)9584817070635656181060.3%
Gemini 2.5 Flash Lite8878766161605039372257.2%
WizardLM 2 8x22b7774666659575452461957.0%
Ministral 3 3B827865645856545315752.9%
Arcee AI: Trinity Large (Preview)7167655958504541393252.5%
Gemini 2.5 Flash6962625452504541271447.6%
Mistral Large 28170574643281919171439.3%
Hermes 3 70B8157424140332487633.8%
Mistral Large655151423826212116433.5%
Z.AI GLM 4.5565448393939251110532.7%
Arcee AI: Trinity Mini64524838343129157732.5%
Mistral NeMO574845242415141110925.7%
Ministral 3B39323130302726217624.8%
Rocinante 12B5643353226211431023.1%
Ministral 8B43363428281615100021.1%
DeepSeek-V2 Chat2523181815131087414.2%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-5.21001001001001001001001001009599.5%
GPT-5100100100100100100100100969699.2%
MoonshotAI: Kimi K2.5100100100100100100100100948798.1%
Z.AI GLM 51001001001001001001001001008098.0%
GPT-5.1100100100100100100100100908897.8%
Minimax M2.51001001001001001001001001007597.5%
Gemini 3 Pro (Preview)100100100100100100100100938097.4%
Gemini 3 Flash (Preview)100100100100100959595949097.0%
Stealth: Aurora Alpha100100100100100100100100927596.7%
GPT-5 Mini1001001001001001009593908396.1%
o4 Mini High1001001001001001009592878395.7%
o4 Mini1001001001001001009292848395.1%
Z.AI GLM 4.71001001001001001001001001005195.1%
ByteDance Seed 1.6 Flash100100100100100907574745987.2%
Z.AI GLM 4.7 Flash1001001009493908474746087.0%
ByteDance Seed 1.61001001001001001008080741484.8%
Claude Opus 4.59692888884808075746081.8%
GPT-4.19392928987858460595779.8%
GPT-4o, May 13th (temp=0)100100908884806964441873.6%
GPT-4o Mini (temp=0)7272727272727272726871.3%
Gemini 2.5 Pro8987807474726964404068.9%
Grok 48078767269686464553966.4%
Claude Opus 48375707066646161615366.4%
GPT-4o Mini (temp=1)10080807270655955342964.3%
GPT-4o, May 13th (temp=1)9287807472575148391361.2%
Mistral Medium 3.17668676564646358533461.0%
Qwen 2.5 72B10061616161616161403560.0%
GPT-4.1 Nano7676767564595957302659.9%
Claude Opus 4.610093806155474737373258.9%
Llama 3.1 70B7472656160595146433857.0%
Claude Sonnet 4.5846969626058444217250.8%
GPT-4.1 Mini878267595942383432450.4%
Claude Sonnet 4837772726660231512348.3%
Claude 3.5 Haiku848376544947271615345.4%
Llama 3.1 8B878474484533302521545.2%
Claude 3.7 Sonnet595352525047452724441.2%
Llama 3.1 Nemotron 70B6655524543373029272741.1%
Ministral 3 14B67514949444235325337.7%
Writer: Palmyra X567625345423722188035.4%
GPT-4o, Aug. 6th (temp=1)6652504136342922141135.4%
Grok 4.1 Fast1009085221915310033.5%
Claude Sonnet 4.687773734282322161032.5%
DeepSeek V3 (2025-03-24)74585139352818118032.3%
Gemma 3 27B6056535148282122232.3%
Claude 3.5 Sonnet62545344191816136128.4%
Claude Haiku 4.5716652503310000028.3%
DeepSeek V3 (2024-12-26)53493221212018104022.8%
Mistral Small Creative44403028222116133021.7%
Mistral Small 3.2 24B5147271975100015.7%
GPT-4o, Aug. 6th (temp=0)2521171513131212121115.1%
Z.AI GLM 4.6593613655210012.7%
Gemini 2.5 Flash Lite57344420000010.1%
WizardLM 2 8x22b41211955110009.4%
Grok 4 Fast36231688100009.2%
Ministral 3B321915114000008.2%
Hermes 3 70B28161587700008.1%
Claude 3 Haiku271513104000006.9%
Ministral 3 3B23191042200005.9%
Gemma 3 12B14131093221005.5%
Cohere Command R+ (Aug. 2024)2519300000004.7%
Qwen 3.5 Plus (2026-02-15)2312332000004.4%
Hermes 3 405B2016000000003.6%
Gemma 3 4B1611210000003.0%
Ministral 8B137221000002.5%
Arcee AI: Trinity Mini88700000002.2%
Ministral 3 8B103000000001.4%
DeepSeek V3.176000000001.3%
Mistral Large 322210000000.6%
Mistral Large60000000000.6%
Mistral NeMO20000000000.2%
DeepSeek V3.210000000000.1%
Gemini 2.5 Flash00000000000.0%
Arcee AI: Trinity Large (Preview)00000000000.0%
Rocinante 12B00000000000.0%
DeepSeek-V2 Chat00000000000.0%
Z.AI GLM 4.500000000000.0%
Mistral Large 200000000000.0%