Count dialogue tags

Test: Dialogue tags

Avg. Score
82.8%
Scenarios
1

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Mistral Small 4100.0%$0.00022.9s100%
2GPT-4o Mini (temp=0)100.0%$0.00023.4s100%
3Mistral Small 3.2 24B100.0%$0.00014.1s100%
4Gemini 3.1 Flash Lite (Preview)100.0%$0.00042.0s100%
5Grok 4 Fast100.0%$0.00034.2s100%
6Claude 3 Haiku100.0%$0.00043.4s100%
7Gemma 3 12B100.0%$0.00007.4s100%
8Ministral 3 14B100.0%$0.00018.2s100%
9Qwen3 235B A22B Instruct 2507100.0%$0.00018.3s100%
10Gemini 3 Flash (Preview)100.0%$0.00093.2s100%
11Llama 3.1 Nemotron 70B100.0%$0.00019.6s100%
12Mistral Medium 3.1100.0%$0.00066.1s100%
13Mistral Large 3100.0%$0.00057.2s100%
14Grok 4.1 Fast100.0%$0.000310.2s100%
15DeepSeek V3 (2024-12-26)100.0%$0.000410.1s100%
16Gemma 3 27B100.0%$0.000112.5s100%
17GPT-4o Mini (temp=1)100.0%$0.000215.5s100%
18DeepSeek V3 (2025-03-24)100.0%$0.000314.6s100%
19DeepSeek-V2 Chat100.0%$0.000116.1s100%
20DeepSeek V3.1100.0%$0.000215.4s100%
21Stealth: Hunter Alpha100.0%$0.000016.9s100%
22Mistral Small 4 (Reasoning)100.0%$0.001011.0s100%
23Hermes 3 405B100.0%$0.000017.7s100%
24GPT-4.1100.0%$0.00234.7s100%
25Mistral Large 2100.0%$0.00198.1s100%
26Writer: Palmyra X5100.0%$0.00218.8s100%
27Aion 2.0100.0%$0.001116.7s100%
28GPT-4o, Aug. 6th (temp=0)100.0%$0.00303.8s100%
29GPT-4o, Aug. 6th (temp=1)100.0%$0.00303.7s100%
30Z.AI GLM 4.7 Flash100.0%$0.000624.6s100%
31Z.AI GLM 5 Turbo100.0%$0.00339.9s100%
32Claude Sonnet 4100.0%$0.00447.2s100%
33GPT-4o, May 13th (temp=0)100.0%$0.00485.2s100%
34Claude Sonnet 4.5100.0%$0.00457.3s100%
35GPT-4o, May 13th (temp=1)100.0%$0.00495.5s100%
36Claude 3.5 Sonnet100.0%$0.00496.2s100%
37Claude 3.7 Sonnet100.0%$0.00507.1s100%
38GPT-5.4100.0%$0.00489.6s100%
39Gemini 3 Flash (Preview, Reasoning)100.0%$0.004810.1s100%
40GPT-5.1100.0%$0.004912.4s100%
41GPT-5.4 (Reasoning, Low)100.0%$0.006010.3s100%
42Gemini 2.5 Flash Lite96.1%$0.00011.4s76%
43Llama 3.1 70B96.1%$0.00033.2s76%
44Z.AI GLM 4.596.1%$0.00055.8s76%
45Claude Opus 4.5100.0%$0.00778.7s100%
46Claude Opus 4.6100.0%$0.00799.5s100%
47Grok 4.20 (Beta)96.1%$0.00161.9s76%
48Claude 3.5 Haiku96.1%$0.00125.0s76%
49GPT-5.4 (Reasoning)100.0%$0.007315.2s100%
50Z.AI GLM 4.7100.0%$0.002751.3s100%
51Mistral Large100.0%$0.00918.4s100%
52Claude Opus 4.6 (Reasoning)100.0%$0.009110.3s100%
53Grok 4100.0%$0.008018.1s100%
54Z.AI GLM 4.6100.0%$0.003351.6s100%
55MoonshotAI: Kimi K2.5100.0%$0.005340.7s100%
56Z.AI GLM 5100.0%$0.004950.1s100%
57Claude Haiku 4.592.1%$0.00164.0s69%
58Mistral Small Creative88.2%$0.00012.2s64%
59Claude Sonnet 4.6 (Reasoning)96.1%$0.00548.4s76%
60Gemini 2.5 Flash Lite (Reasoning)92.1%$0.000620.1s69%
61Claude Sonnet 4.692.1%$0.00467.7s69%
62GPT-5.292.1%$0.00469.8s69%
63GPT-5 Nano96.1%$0.002358.8s76%
64Gemini 3 Pro (Preview)100.0%$0.01713.7s100%
65GPT-4.1 Mini87.4%$0.00043.3s45%
66DeepSeek V3.290.0%$0.000213.7s40%
67Inception Mercury 279.5%$0.00091.9s44%
68Cohere Command R+ (Aug. 2024)87.4%$0.00307.0s45%
69Gemini 2.5 Flash (Reasoning)87.4%$0.00357.1s45%
70GPT-5 Mini90.0%$0.002214.1s40%
71Gemini 2.5 Pro96.1%$0.01514.6s76%
72ByteDance Seed 1.690.0%$0.002327.7s40%
73GPT-5.4 Mini (Reasoning)78.2%$0.00203.7s37%
74Claude Opus 4100.0%$0.02315.8s100%
75Stealth: Healer Alpha74.8%$0.00006.3s32%
76o4 Mini High96.1%$0.01434.7s76%
77Gemini 2.5 Flash74.4%$0.00092.5s31%
78Qwen 3.5 27B100.0%$0.0161.3m100%
79Gemini 3.1 Pro (Preview)100.0%$0.02423.5s100%
80Qwen 3.5 Plus (2026-02-15)81.5%$0.000813.3s26%
81GPT-4.1 Nano69.7%$0.00012.9s23%
82Qwen 3.5 122B100.0%$0.02256.5s100%
83GPT-5.4 Mini67.0%$0.00142.2s22%
84Hermes 3 70B64.9%$0.000112.7s17%
85Qwen 3 32B67.0%$0.000423.9s22%
86GPT-5100.0%$0.02751.7s100%
87MiniMax M2.586.2%$0.00441.2m39%
88Mistral NeMO47.9%$0.00004.0s24%
89Arcee AI: Trinity Mini51.9%$0.00013.9s19%
90Qwen 2.5 72B57.5%$0.00017.5s14%
91ByteDance Seed 1.6 Flash52.3%$0.00036.9s15%
92o4 Mini72.1%$0.006315.9s22%
93LFM2 24B48.9%$0.00007.1s9%
94Grok 4.20 (Beta, Reasoning)90.0%$0.01914.1s40%
95Ministral 3 8B37.0%$0.00012.0s12%
96Arcee AI: Trinity Large (Preview)47.6%$0.00006.8s3%
97Nemotron 3 Nano66.1%$0.000950.4s10%
98Rocinante 12B48.9%$0.000214.5s4%
99Inception Mercury42.5%$0.00022.2s4%
100Ministral 3B40.2%$0.00001.8s2%
101Qwen 3.5 Flash70.0%$0.003051.7s8%
102Gemma 3 4B33.8%$0.00003.4s5%
103Ministral 3 3B29.8%$0.00001.5s7%
104Llama 3.1 8B33.8%$0.00011.3s1%
105Nemotron 3 Super56.1%$0.000056.2s5%
106GPT-5.4 Mini (Reasoning, Low)31.1%$0.00163.2s7%
107MiniMax M2.786.1%$0.00812.1m38%
108ByteDance Seed 2.0 Mini60.0%$0.00111.0m2%
109WizardLM 2 8x22b25.6%$0.000512.6s6%
110Qwen 3.5 9B71.4%$0.00121.8m12%
111Ministral 8B19.9%$0.00002.1s1%
112Stealth: Aurora Alpha49.7%3.3s11%
113GPT-5.4 Nano12.9%$0.00042.1s0%
114GPT-5.4 Nano (Reasoning, Low)9.1%$0.00042.0s1%
115GPT-5.4 Nano (Reasoning)8.9%$0.00042.3s0%
116Qwen 3.5 397B A17B100.0%$0.0263.1m100%
117Qwen 3.5 35B70.0%$0.01653.3s8%
118ByteDance Seed 2.0 Lite0.0%$0.002833.4s0%
82.78%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Llama 3.1 70B1001001001001001001001001006196.1%
Claude Sonnet 4.6 (Reasoning)1001001001001001001001001006196.1%
o4 Mini High1001001001001001001001001006196.1%
Gemini 2.5 Pro1001001001001001001001001006196.1%
Z.AI GLM 4.51001001001001001001001001006196.1%
Grok 4.20 (Beta)1001001001001001001001001006196.1%
Claude 3.5 Haiku1001001001001001001001001006196.1%
GPT-5 Nano1001001001001001001001001006196.1%
Gemini 2.5 Flash Lite1001001001001001001001001006196.1%
Claude Sonnet 4.6100100100100100100100100616192.1%
GPT-5.2100100100100100100100100616192.1%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100616192.1%
Claude Haiku 4.5100100100100100100100100616192.1%
GPT-5 Mini100100100100100100100100100090.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100090.0%
ByteDance Seed 1.6100100100100100100100100100090.0%
DeepSeek V3.2100100100100100100100100100090.0%
Mistral Small Creative10010010010010010010061616188.2%
Gemini 2.5 Flash (Reasoning)100100100100100100100100611487.4%
GPT-4.1 Mini100100100100100100100100611487.4%
Cohere Command R+ (Aug. 2024)100100100100100100100100611487.4%
MiniMax M2.510010010010010010010010061186.2%
MiniMax M2.710010010010010010010010061086.1%
Qwen 3.5 Plus (2026-02-15)10010010010010010010010014181.5%
Inception Mercury 21001001001001001006161611479.5%
GPT-5.4 Mini (Reasoning)100100100100100100616161078.2%
Stealth: Healer Alpha1001001001001001006161141474.8%
Gemini 2.5 Flash10010010010010061616161174.4%
o4 Mini10010010010010010061610072.1%
Qwen 3.5 9B100100100100100100100140071.4%
Qwen 3.5 35B10010010010010010010000070.0%
Qwen 3.5 Flash10010010010010010010000070.0%
GPT-4.1 Nano10010010010010061616114169.7%
GPT-5.4 Mini10010010010061616161141467.0%
Qwen 3 32B10010010010061616161141467.0%
Nemotron 3 Nano1001001001001001006100066.1%
Hermes 3 70B10010010010010061611414164.9%
ByteDance Seed 2.0 Mini100100100100100100000060.0%
Qwen 2.5 72B10010010010061611414141457.5%
Nemotron 3 Super10010010010010061000056.1%
ByteDance Seed 1.6 Flash100100100616161141414052.3%
Arcee AI: Trinity Mini1001006161616161141151.9%
Stealth: Aurora Alpha1001001006161611410049.7%
LFM2 24B1001001006161141414141448.9%
Rocinante 12B10010010010061141410048.9%
Mistral NeMO100616161616161141047.9%
Arcee AI: Trinity Large (Preview)1001001001006114110047.6%
Inception Mercury10010010061611110042.5%
Ministral 3B1001001006114141410040.2%
Ministral 3 8B10061616161141400037.0%
Gemma 3 4B100616161141414141133.8%
Llama 3.1 8B1001006161141110033.8%
GPT-5.4 Mini (Reasoning, Low)616161611414141414131.1%
Ministral 3 3B61616161141414141029.8%
WizardLM 2 8x22b1006114141414141414125.6%
Ministral 8B6161611411100019.9%
GPT-5.4 Nano1001414110000012.9%
GPT-5.4 Nano (Reasoning, Low)61141411100009.1%
GPT-5.4 Nano (Reasoning)61141410000008.9%
ByteDance Seed 2.0 Lite00000000000.0%