Sentence Counting

Subcategory of Utility. 155 models scored.

Model Leaderboard

All models ranked by their Sentence Counting subcategory score.

#	Model	Sentence Counting	Utility	Overall
1	Qwen3.7 Max	100.00%	99.54%	95.75%
2	Qwen3.6 Max Preview	100.00%	98.34%	94.54%
3	Gemini 3.1 Pro (Preview)	100.00%	99.91%	94.37%
4	Z.AI GLM 5.1	100.00%	97.51%	94.37%
5	Gemini 3.5 Flash (Reasoning)	100.00%	98.86%	94.08%
6	Grok 4.3 (Reasoning)	100.00%	92.94%	93.60%
7	GPT-5.5 (Reasoning, Low)	100.00%	96.36%	92.59%
8	GPT-5	100.00%	93.53%	91.93%
9	Qwen 3.5 397B A17B	100.00%	97.50%	91.73%
10	Qwen 3.5 122B	100.00%	96.36%	91.53%
11	Qwen 3.5 Plus (2026-04-20)	100.00%	96.42%	91.51%
12	Gemma 4 26B (Reasoning)	100.00%	95.69%	91.49%
13	GPT-5.4 (Reasoning, Low)	100.00%	95.32%	91.41%
14	Z.AI GLM 5	100.00%	94.11%	91.23%
15	Qwen 3.6 Flash	100.00%	96.09%	90.65%
16	Qwen 3.6 27B	100.00%	94.32%	89.72%
17	Qwen 3.5 35B	100.00%	96.42%	88.00%
18	Qwen 3.5 Flash	100.00%	96.11%	86.38%
19	Qwen 3.5 9B	100.00%	94.02%	86.05%
20	Qwen 3.5 Plus (2026-02-15)	100.00%	86.65%	85.96%
21	GPT-5.4	100.00%	81.95%	84.32%
22	GPT-5.4 (Reasoning)	100.00%	96.89%	93.24%
23	GPT-5 Mini	100.00%	98.39%	92.62%
24	MoonshotAI: Kimi K2.6	100.00%	97.42%	92.31%
25	Grok 4.20 (Beta, Reasoning)	100.00%	95.41%	91.49%
26	MoonshotAI: Kimi K2.5	100.00%	96.63%	91.04%
27	ByteDance Seed 1.6	100.00%	90.83%	90.70%
28	GPT-5.5	100.00%	81.88%	89.09%
29	Z.AI GLM 5 Turbo	100.00%	96.36%	94.27%
30	Gemini 3 Flash (Preview, Reasoning)	100.00%	97.20%	90.50%
31	Claude Opus 4.6 (Reasoning)	100.00%	98.93%	95.02%
32	Grok 4.20 (Reasoning)	100.00%	92.61%	91.39%
33	GPT-5.4 Mini (Reasoning)	100.00%	94.44%	90.65%
34	Z.AI GLM 4.7 Flash	100.00%	88.98%	84.82%
35	Z.AI GLM 4.7	99.99%	94.31%	88.69%
36	Gemma 4 31B	99.99%	86.69%	86.91%
37	Gemma 4 31B (Reasoning)	99.99%	96.32%	91.71%
38	Gemini 3 Flash (Preview)	99.99%	86.39%	85.35%
39	Llama 3.1 Nemotron 70B	99.99%	88.31%	74.70%
40	Gemini 3 Pro (Preview)	99.97%	96.14%	88.79%
41	Mistral Small 3.2 24B	99.97%	73.17%	78.58%
42	ByteDance Seed 2.0 Lite	99.96%	92.23%	84.80%
43	Claude Sonnet 4.6 (Reasoning)	99.96%	97.88%	93.66%
44	ByteDance Seed 2.0 Mini	99.96%	91.88%	86.91%
45	Mistral Large 3	99.95%	84.91%	85.43%
46	GPT-5.5 (Reasoning)	99.93%	96.60%	92.98%
47	Claude Opus 4.6	99.93%	90.72%	92.35%
48	GPT-5.4 Mini (Reasoning, Low)	99.92%	88.49%	85.75%
49	Claude Opus 4.7 (Reasoning)	99.91%	97.87%	93.23%
50	o4 Mini High	99.90%	98.67%	90.29%
51	GPT-5.1	99.80%	95.33%	92.54%
52	GPT-5.2	99.77%	96.22%	90.26%
53	Claude Opus 4.7	99.57%	95.77%	89.93%
54	DeepSeek V4 Pro (Reasoning)	99.51%	93.24%	90.10%
55	Gemini 2.5 Pro	99.43%	92.18%	88.53%
56	Grok 4.20 (Beta)	99.42%	82.15%	83.85%
57	DeepSeek V4 Flash	99.32%	83.26%	82.02%
58	Grok 4.20	99.17%	84.11%	81.70%
59	Claude Opus 4.8 (Reasoning)	99.07%	99.26%	92.22%
60	Gemini 3.1 Flash Lite (Preview)	98.60%	94.00%	85.87%
61	Claude Opus 4.8 (Reasoning, Low)	98.48%	98.00%	92.14%
62	o4 Mini	98.35%	96.31%	88.35%
63	GPT-5.4 Mini	98.23%	79.37%	82.43%
64	Qwen 3.6 35B	98.00%	96.20%	89.05%
65	GPT-5 Nano	97.99%	93.91%	82.60%
66	Nemotron 3 Super	97.96%	95.29%	84.56%
67	DeepSeek V3 (2025-03-24)	97.96%	80.62%	81.99%
68	DeepSeek-V2 Chat	97.87%	83.82%	84.83%
69	Z.AI GLM 4.6	97.70%	88.58%	89.11%
70	Stealth: Aurora Alpha	97.37%	92.59%	83.79%
71	Mistral Medium 3.1	97.23%	80.13%	77.83%
72	Inception Mercury 2	97.02%	92.86%	83.85%
73	GPT-OSS 120B	96.88%	92.03%	86.44%
74	Mistral Small Creative	96.83%	76.28%	73.27%
75	Inception Mercury	96.78%	87.38%	79.50%
76	Gemini 2.5 Flash Lite (Reasoning)	96.54%	89.63%	85.75%
77	Mistral Small 4 (Reasoning)	96.06%	85.61%	82.39%
78	Claude Opus 4.5	95.88%	89.84%	89.69%
79	GPT-5.4 Nano (Reasoning)	95.83%	93.34%	81.36%
80	Qwen 3 32B	95.82%	81.66%	82.21%
81	Grok 4.1 Fast	95.61%	84.12%	89.55%
82	MiniMax M2.7	95.47%	95.50%	89.10%
83	Gemma 3 27B	95.44%	76.82%	77.85%
84	Gemma 3 4B	95.04%	60.30%	68.57%
85	Gemini 3.1 Flash Lite	94.68%	92.77%	85.75%
86	Claude Sonnet 4.6	94.66%	88.52%	91.15%
87	GPT-5.4 Nano	93.46%	78.57%	74.40%
88	Claude 3.5 Sonnet	93.24%	76.75%	84.24%
89	GPT-5.4 Nano (Reasoning, Low)	93.19%	91.42%	79.48%
90	DeepSeek V4 Flash (Reasoning)	93.07%	87.53%	89.01%
91	Nemotron 3 Nano	92.81%	86.00%	77.73%
92	MiniMax M3	92.60%	93.59%	90.88%
93	Qwen 3.5 27B	92.55%	95.67%	90.85%
94	Ministral 3 14B	92.10%	79.03%	72.54%
95	DeepSeek V3.2	92.09%	81.58%	82.25%
96	Claude Sonnet 4.5	91.98%	83.78%	88.03%
97	Stealth: Healer Alpha	90.46%	82.30%	85.93%
98	Gemini 3.1 Flash Lite (Reasoning)	90.14%	92.32%	86.41%
99	DeepSeek V4 Pro	90.01%	77.57%	82.63%
100	Grok 4 Fast	89.92%	76.76%	86.15%
101	Gemma 4 26B	89.58%	83.17%	85.84%
102	Grok 4.3	89.39%	66.41%	78.66%
103	Aion 2.0	89.10%	90.91%	89.21%
104	DeepSeek V3 (2024-12-26)	88.56%	81.87%	83.68%
105	Grok 4	88.00%	89.67%	88.12%
106	Stealth: Hunter Alpha	87.94%	84.63%	87.34%
107	Xiaomi MIMO v2.5	86.72%	81.15%	85.05%
108	GPT-4.1	86.01%	90.57%	88.68%
109	Mistral Small 4	85.82%	78.28%	76.46%
110	Llama 3.1 8B	85.64%	74.82%	63.35%
111	Cydonia 24B V4.1	85.28%	69.32%	75.09%
112	Gemini 2.5 Flash (Reasoning)	85.21%	82.25%	86.51%
113	Claude Sonnet 4	84.17%	84.02%	88.72%
114	Z.AI GLM 4.5 Air	83.68%	76.57%	83.12%
115	Xiaomi MIMO v2.5 Pro	83.55%	82.62%	87.36%
116	DeepSeek V3.1	83.32%	76.65%	82.39%
117	ByteDance Seed 1.6 Flash	82.27%	84.16%	73.27%
118	Qwen3 235B A22B Instruct 2507	80.58%	83.15%	80.10%
119	Llama 3.1 70B	80.53%	81.03%	78.40%
120	Claude Opus 4	80.19%	88.81%	87.69%
121	Gemma 3 12B	80.06%	79.28%	78.41%
122	Hermes 3 405B	79.99%	69.02%	82.86%
123	Writer: Palmyra X5	79.93%	79.71%	79.57%
124	GPT-4o Mini (temp=1)	79.62%	82.16%	79.08%
125	GPT-4o Mini (temp=0)	79.49%	81.43%	78.29%
126	GPT-4.1 Mini	79.41%	82.30%	83.20%
127	Gemini 3.5 Flash (Reasoning, Minimal)	79.24%	83.90%	86.47%
128	MiniMax M2.5	75.58%	90.42%	88.71%
129	GPT-4.1 Nano	75.17%	68.45%	71.94%
130	Ministral 3 3B	74.85%	72.38%	67.22%
131	Z.AI GLM 4.5	74.82%	79.19%	86.27%
132	GPT-4o, Aug. 6th (temp=1)	74.24%	82.44%	82.62%
133	Ministral 3 8B	74.12%	74.43%	71.76%
134	GPT-4o, May 13th (temp=1)	74.04%	80.69%	83.80%
135	GPT-4o, Aug. 6th (temp=0)	74.00%	82.11%	82.45%
136	Gemini 2.5 Flash Lite	73.32%	80.14%	81.08%
137	GPT-4o, May 13th (temp=0)	71.82%	83.13%	85.36%
138	WizardLM 2 8x22b	71.48%	67.14%	71.06%
139	Claude Haiku 4.5	70.28%	72.48%	85.14%
140	Qwen 2.5 72B	66.70%	76.43%	75.46%
141	Claude 3.7 Sonnet	66.35%	62.54%	83.39%
142	Gemini 2.5 Flash	64.79%	61.45%	80.60%
143	LFM2 24B	63.93%	69.48%	58.77%
144	Mistral Large	63.80%	73.04%	80.15%
145	Ministral 3B	62.17%	49.17%	61.29%
146	Claude 3 Haiku	61.88%	68.47%	71.19%
147	Cohere Command R+ (Aug. 2024)	60.43%	59.51%	69.03%
148	Ministral 8B	60.35%	46.82%	64.87%
149	Mistral Large 2	58.78%	69.19%	82.41%
150	Hermes 3 70B	58.14%	61.15%	72.57%
151	Arcee AI: Trinity Large (Preview)	57.16%	60.74%	73.33%
152	Rocinante 12B	53.12%	48.47%	54.54%
153	Skyfall 36B V2	52.83%	52.53%	65.76%
154	Mistral NeMO	47.57%	51.55%	65.04%
155	Arcee AI: Trinity Mini	43.97%	59.94%	70.90%