Word Counting

Subcategory of Utility. 155 models scored.

Model Leaderboard

All models ranked by their Word Counting subcategory score.

#	Model	Word Counting	Utility	Overall
1	Gemini 3.1 Pro (Preview)	99.56%	99.91%	94.37%
2	GPT-5 Mini	97.93%	98.39%	92.62%
3	Qwen3.7 Max	97.69%	99.54%	95.75%
4	Claude Opus 4.8 (Reasoning, Low)	97.54%	98.00%	92.14%
5	Claude Opus 4.8 (Reasoning)	97.26%	99.26%	92.22%
6	Gemini 3.5 Flash (Reasoning)	96.80%	98.86%	94.08%
7	GPT-5	96.13%	93.53%	91.93%
8	o4 Mini High	95.97%	98.67%	90.29%
9	Z.AI GLM 5 Turbo	95.82%	96.36%	94.27%
10	Claude Opus 4.6 (Reasoning)	94.65%	98.93%	95.02%
11	MiniMax M2.7	94.05%	95.50%	89.10%
12	Inception Mercury 2	93.76%	92.86%	83.85%
13	Nemotron 3 Super	93.51%	95.29%	84.56%
14	Z.AI GLM 5.1	92.55%	97.51%	94.37%
15	Qwen3.6 Max Preview	91.71%	98.34%	94.54%
16	Grok 4.3 (Reasoning)	91.18%	92.94%	93.60%
17	o4 Mini	90.72%	96.31%	88.35%
18	MoonshotAI: Kimi K2.6	90.45%	97.42%	92.31%
19	Claude Sonnet 4.6 (Reasoning)	90.42%	97.88%	93.66%
20	Gemini 3.1 Flash Lite (Reasoning)	89.98%	92.32%	86.41%
21	Claude Opus 4.7 (Reasoning)	89.46%	97.87%	93.23%
22	Gemini 3.1 Flash Lite (Preview)	89.38%	94.00%	85.87%
23	Stealth: Aurora Alpha	88.58%	92.59%	83.79%
24	GPT-OSS 120B	87.79%	92.03%	86.44%
25	Gemini 3.1 Flash Lite	87.68%	92.77%	85.75%
26	Qwen 3.5 397B A17B	87.52%	97.50%	91.73%
27	Claude Opus 4.7	87.30%	95.77%	89.93%
28	Nemotron 3 Nano	86.70%	86.00%	77.73%
29	GPT-5 Nano	86.54%	93.91%	82.60%
30	MiniMax M2.5	86.03%	90.42%	88.71%
31	Gemini 3 Flash (Preview, Reasoning)	86.00%	97.20%	90.50%
32	Qwen 3.5 27B	85.82%	95.67%	90.85%
33	GPT-5.1	85.32%	95.33%	92.54%
34	Inception Mercury	85.14%	87.38%	79.50%
35	GPT-5.4 Nano (Reasoning)	84.86%	93.34%	81.36%
36	Qwen 3.5 Plus (2026-04-20)	84.60%	96.42%	91.51%
37	GPT-5.4 (Reasoning)	84.46%	96.89%	93.24%
38	GPT-5.2	84.32%	96.22%	90.26%
39	Gemma 4 31B (Reasoning)	84.12%	96.32%	91.71%
40	Gemini 3 Flash (Preview)	83.97%	86.39%	85.35%
41	Claude Opus 4.6	83.65%	90.72%	92.35%
42	Gemma 4 26B (Reasoning)	83.45%	95.69%	91.49%
43	MoonshotAI: Kimi K2.5	83.17%	96.63%	91.04%
44	GPT-5.5 (Reasoning)	83.08%	96.60%	92.98%
45	Qwen 3.6 35B	83.01%	96.20%	89.05%
46	GPT-4o Mini (temp=0)	82.65%	81.43%	78.29%
47	Qwen 3.6 Flash	82.44%	96.09%	90.65%
48	MiniMax M3	82.35%	93.59%	90.88%
49	Qwen 3.5 35B	82.08%	96.42%	88.00%
50	Qwen 3.5 122B	81.81%	96.36%	91.53%
51	GPT-5.5 (Reasoning, Low)	81.80%	96.36%	92.59%
52	GPT-5.4 Mini (Reasoning)	81.72%	94.44%	90.65%
53	GPT-4.1	81.33%	90.57%	88.68%
54	Qwen 3.6 27B	81.10%	94.32%	89.72%
55	Claude Opus 4.5	80.83%	89.84%	89.69%
56	Gemini 3 Pro (Preview)	80.75%	96.14%	88.79%
57	Qwen 3.5 Flash	80.57%	96.11%	86.38%
58	Z.AI GLM 4.7	78.57%	94.31%	88.69%
59	GPT-4o Mini (temp=1)	78.18%	82.16%	79.08%
60	GPT-4o, Aug. 6th (temp=1)	77.45%	82.44%	82.62%
61	Grok 4.20 (Beta, Reasoning)	77.05%	95.41%	91.49%
62	GPT-4o, Aug. 6th (temp=0)	77.04%	82.11%	82.45%
63	GPT-5.4 (Reasoning, Low)	76.62%	95.32%	91.41%
64	Grok 4.20 (Reasoning)	75.57%	92.61%	91.39%
65	Z.AI GLM 5	75.55%	94.11%	91.23%
66	Grok 4	75.34%	89.67%	88.12%
67	DeepSeek V4 Pro (Reasoning)	75.21%	93.24%	90.10%
68	GPT-5.4 Mini (Reasoning, Low)	75.01%	88.49%	85.75%
69	Qwen 3.5 9B	74.59%	94.02%	86.05%
70	Claude Opus 4	74.37%	88.81%	87.69%
71	ByteDance Seed 1.6	74.14%	90.83%	90.70%
72	ByteDance Seed 2.0 Mini	73.92%	91.88%	86.91%
73	ByteDance Seed 2.0 Lite	73.68%	92.23%	84.80%
74	GPT-5.4 Nano (Reasoning, Low)	73.42%	91.42%	79.48%
75	Gemini 3.5 Flash (Reasoning, Minimal)	73.28%	83.90%	86.47%
76	Claude Sonnet 4.6	71.94%	88.52%	91.15%
77	GPT-5.5	71.88%	81.88%	89.09%
78	Z.AI GLM 4.7 Flash	68.93%	88.98%	84.82%
79	GPT-4.1 Mini	66.59%	82.30%	83.20%
80	Gemini 2.5 Pro	66.47%	92.18%	88.53%
81	Gemma 4 26B	66.29%	83.17%	85.84%
82	GPT-5.4 Mini	66.13%	79.37%	82.43%
83	Claude 3.7 Sonnet	66.00%	62.54%	83.39%
84	GPT-4o, May 13th (temp=1)	65.90%	80.69%	83.80%
85	GPT-4o, May 13th (temp=0)	65.85%	83.13%	85.36%
86	Aion 2.0	65.48%	90.91%	89.21%
87	GPT-4.1 Nano	64.57%	68.45%	71.94%
88	Gemma 4 31B	64.43%	86.69%	86.91%
89	Mistral Medium 3.1	64.40%	80.13%	77.83%
90	GPT-5.4	64.23%	81.95%	84.32%
91	Grok 4.20	62.89%	84.11%	81.70%
92	Stealth: Healer Alpha	61.71%	82.30%	85.93%
93	Grok 4 Fast	61.39%	76.76%	86.15%
94	Grok 4.1 Fast	60.51%	84.12%	89.55%
95	Grok 4.20 (Beta)	59.16%	82.15%	83.85%
96	Claude Sonnet 4.5	58.93%	83.78%	88.03%
97	DeepSeek V3 (2025-03-24)	58.32%	80.62%	81.99%
98	DeepSeek V3 (2024-12-26)	58.30%	81.87%	83.68%
99	Llama 3.1 8B	56.94%	74.82%	63.35%
100	Llama 3.1 70B	56.62%	81.03%	78.40%
101	Mistral Small 4 (Reasoning)	56.00%	85.61%	82.39%
102	Stealth: Hunter Alpha	55.55%	84.63%	87.34%
103	Gemini 2.5 Flash Lite (Reasoning)	55.13%	89.63%	85.75%
104	DeepSeek V4 Flash	54.98%	83.26%	82.02%
105	Xiaomi MIMO v2.5	54.18%	81.15%	85.05%
106	Claude Sonnet 4	53.92%	84.02%	88.72%
107	Claude 3.5 Sonnet	53.50%	76.75%	84.24%
108	Gemini 2.5 Flash Lite	53.37%	80.14%	81.08%
109	Mistral Large 3	52.60%	84.91%	85.43%
110	Xiaomi MIMO v2.5 Pro	52.55%	82.62%	87.36%
111	Claude Haiku 4.5	51.60%	72.48%	85.14%
112	Qwen 2.5 72B	51.43%	76.43%	75.46%
113	Hermes 3 405B	51.42%	69.02%	82.86%
114	DeepSeek V4 Flash (Reasoning)	49.57%	87.53%	89.01%
115	ByteDance Seed 1.6 Flash	48.55%	84.16%	73.27%
116	GPT-5.4 Nano	47.90%	78.57%	74.40%
117	Ministral 3 14B	47.57%	79.03%	72.54%
118	Z.AI GLM 4.6	47.21%	88.58%	89.11%
119	DeepSeek V3.1	47.10%	76.65%	82.39%
120	Llama 3.1 Nemotron 70B	47.07%	88.31%	74.70%
121	Gemma 3 12B	46.84%	79.28%	78.41%
122	Mistral Small 4	46.07%	78.28%	76.46%
123	Gemini 2.5 Flash (Reasoning)	46.02%	82.25%	86.51%
124	DeepSeek V3.2	45.81%	81.58%	82.25%
125	Gemma 3 27B	45.66%	76.82%	77.85%
126	DeepSeek V4 Pro	45.34%	77.57%	82.63%
127	LFM2 24B	44.98%	69.48%	58.77%
128	Qwen 3.5 Plus (2026-02-15)	44.24%	86.65%	85.96%
129	Arcee AI: Trinity Large (Preview)	43.69%	60.74%	73.33%
130	Mistral Small 3.2 24B	42.38%	73.17%	78.58%
131	DeepSeek-V2 Chat	41.72%	83.82%	84.83%
132	Qwen3 235B A22B Instruct 2507	40.15%	83.15%	80.10%
133	Claude 3 Haiku	39.32%	68.47%	71.19%
134	Gemini 2.5 Flash	38.47%	61.45%	80.60%
135	Z.AI GLM 4.5	38.11%	79.19%	86.27%
136	Ministral 3 8B	38.01%	74.43%	71.76%
137	Ministral 3 3B	37.04%	72.38%	67.22%
138	Cydonia 24B V4.1	36.47%	69.32%	75.09%
139	Writer: Palmyra X5	35.61%	79.71%	79.57%
140	Cohere Command R+ (Aug. 2024)	35.13%	59.51%	69.03%
141	Z.AI GLM 4.5 Air	34.68%	76.57%	83.12%
142	Mistral Small Creative	33.57%	76.28%	73.27%
143	Grok 4.3	33.50%	66.41%	78.66%
144	Qwen 3 32B	31.96%	81.66%	82.21%
145	Mistral Large	31.90%	73.04%	80.15%
146	Hermes 3 70B	28.43%	61.15%	72.57%
147	Arcee AI: Trinity Mini	26.25%	59.94%	70.90%
148	Gemma 3 4B	22.12%	60.30%	68.57%
149	Ministral 8B	21.93%	46.82%	64.87%
150	Mistral Large 2	21.18%	69.19%	82.41%
151	WizardLM 2 8x22b	20.73%	67.14%	71.06%
152	Skyfall 36B V2	18.30%	52.53%	65.76%
153	Ministral 3B	17.87%	49.17%	61.29%
154	Mistral NeMO	15.70%	51.55%	65.04%
155	Rocinante 12B	14.07%	48.47%	54.54%