Structural Counting

Subcategory of Utility. 155 models scored.

Model Leaderboard

All models ranked by their Structural Counting subcategory score.

#	Model	Structural Counting	Utility	Overall
1	Qwen3.7 Max	100.00%	99.54%	95.75%
2	Claude Opus 4.6 (Reasoning)	100.00%	98.93%	95.02%
3	Qwen3.6 Max Preview	100.00%	98.34%	94.54%
4	Gemini 3.1 Pro (Preview)	100.00%	99.91%	94.37%
5	GPT-5.4 (Reasoning)	100.00%	96.89%	93.24%
6	Claude Opus 4.7 (Reasoning)	100.00%	97.87%	93.23%
7	GPT-5.5 (Reasoning)	100.00%	96.60%	92.98%
8	GPT-5.5 (Reasoning, Low)	100.00%	96.36%	92.59%
9	MoonshotAI: Kimi K2.6	100.00%	97.42%	92.31%
10	Claude Opus 4.8 (Reasoning)	100.00%	99.26%	92.22%
11	Qwen 3.5 397B A17B	100.00%	97.50%	91.73%
12	Qwen 3.5 122B	100.00%	96.36%	91.53%
13	Grok 4.20 (Beta, Reasoning)	100.00%	95.41%	91.49%
14	GPT-5.4 (Reasoning, Low)	100.00%	95.32%	91.41%
15	MoonshotAI: Kimi K2.5	100.00%	96.63%	91.04%
16	Qwen 3.5 27B	100.00%	95.67%	90.85%
17	Gemini 3 Flash (Preview, Reasoning)	100.00%	97.20%	90.50%
18	Aion 2.0	100.00%	90.91%	89.21%
19	Qwen 3.6 35B	100.00%	96.20%	89.05%
20	Gemini 3 Pro (Preview)	100.00%	96.14%	88.79%
21	Gemini 2.5 Pro	100.00%	92.18%	88.53%
22	Qwen 3.5 35B	100.00%	96.42%	88.00%
23	Qwen 3.5 Flash	100.00%	96.11%	86.38%
24	Claude Sonnet 4.6 (Reasoning)	99.00%	97.88%	93.66%
25	Qwen 3.6 Flash	98.00%	96.09%	90.65%
26	Z.AI GLM 4.6	98.00%	88.58%	89.11%
27	Gemini 3.5 Flash (Reasoning)	97.50%	98.86%	94.08%
28	Gemma 4 31B (Reasoning)	97.50%	96.32%	91.71%
29	Qwen 3.5 Plus (2026-04-20)	97.50%	96.42%	91.51%
30	o4 Mini High	97.50%	98.67%	90.29%
31	GPT-5.2	97.00%	96.22%	90.26%
32	Gemini 2.5 Flash Lite (Reasoning)	96.50%	89.63%	85.75%
33	Z.AI GLM 5 Turbo	96.00%	96.36%	94.27%
34	Qwen 3.6 27B	95.50%	94.32%	89.72%
35	Qwen 3.5 9B	95.50%	94.02%	86.05%
36	Z.AI GLM 5.1	95.00%	97.51%	94.37%
37	Gemma 4 26B (Reasoning)	95.00%	95.69%	91.49%
38	Z.AI GLM 5	95.00%	94.11%	91.23%
39	DeepSeek V4 Flash (Reasoning)	95.00%	87.53%	89.01%
40	Qwen3 235B A22B Instruct 2507	95.00%	83.15%	80.10%
41	Llama 3.1 Nemotron 70B	94.50%	88.31%	74.70%
42	GPT-5 Mini	94.00%	98.39%	92.62%
43	Claude Opus 4.8 (Reasoning, Low)	94.00%	98.00%	92.14%
44	MiniMax M3	93.00%	93.59%	90.88%
45	Z.AI GLM 4.7	93.00%	94.31%	88.69%
46	o4 Mini	92.50%	96.31%	88.35%
47	Claude Opus 4.7	92.00%	95.77%	89.93%
48	GPT-5.1	91.50%	95.33%	92.54%
49	DeepSeek V4 Pro (Reasoning)	91.50%	93.24%	90.10%
50	GPT-5.4 Nano (Reasoning)	91.00%	93.34%	81.36%
51	GPT-5.4 Mini (Reasoning)	90.50%	94.44%	90.65%
52	MiniMax M2.5	90.50%	90.42%	88.71%
53	GPT-5.4 Nano (Reasoning, Low)	90.50%	91.42%	79.48%
54	Gemini 2.5 Flash (Reasoning)	90.00%	82.25%	86.51%
55	ByteDance Seed 1.6 Flash	90.00%	84.16%	73.27%
56	Claude Opus 4	89.50%	88.81%	87.69%
57	Qwen 3.5 Plus (2026-02-15)	89.00%	86.65%	85.96%
58	MiniMax M2.7	88.00%	95.50%	89.10%
59	Grok 4.20 (Reasoning)	87.50%	92.61%	91.39%
60	ByteDance Seed 2.0 Lite	87.50%	92.23%	84.80%
61	GPT-4.1	85.50%	90.57%	88.68%
62	ByteDance Seed 2.0 Mini	85.50%	91.88%	86.91%
63	Nemotron 3 Super	85.00%	95.29%	84.56%
64	Grok 4	85.00%	89.67%	88.12%
65	GPT-5 Nano	85.00%	93.91%	82.60%
66	Stealth: Hunter Alpha	83.00%	84.63%	87.34%
67	Z.AI GLM 4.5	83.00%	79.19%	86.27%
68	Writer: Palmyra X5	83.00%	79.71%	79.57%
69	Gemini 3.1 Flash Lite (Preview)	82.00%	94.00%	85.87%
70	Claude Sonnet 4	82.00%	84.02%	88.72%
71	Gemini 3.1 Flash Lite (Reasoning)	81.50%	92.32%	86.41%
72	Gemini 3.1 Flash Lite	81.50%	92.77%	85.75%
73	Z.AI GLM 4.7 Flash	81.00%	88.98%	84.82%
74	Qwen 3 32B	80.50%	81.66%	82.21%
75	ByteDance Seed 1.6	80.00%	90.83%	90.70%
76	DeepSeek-V2 Chat	79.50%	83.82%	84.83%
77	GPT-4o, May 13th (temp=0)	78.00%	83.13%	85.36%
78	Xiaomi MIMO v2.5 Pro	77.00%	82.62%	87.36%
79	Stealth: Aurora Alpha	77.00%	92.59%	83.79%
80	GPT-5	76.50%	93.53%	91.93%
81	Claude Sonnet 4.6	76.00%	88.52%	91.15%
82	Mistral Small 4 (Reasoning)	76.00%	85.61%	82.39%
83	GPT-OSS 120B	75.50%	92.03%	86.44%
84	Gemini 2.5 Flash Lite	74.00%	80.14%	81.08%
85	Inception Mercury 2	73.50%	92.86%	83.85%
86	Grok 4.3 (Reasoning)	73.50%	92.94%	93.60%
87	Claude Opus 4.5	72.50%	89.84%	89.69%
88	Mistral Large 3	72.00%	84.91%	85.43%
89	Xiaomi MIMO v2.5	71.50%	81.15%	85.05%
90	Stealth: Healer Alpha	71.00%	82.30%	85.93%
91	Claude Haiku 4.5	70.50%	72.48%	85.14%
92	Claude Opus 4.6	70.00%	90.72%	92.35%
93	DeepSeek V3.2	70.00%	81.58%	82.25%
94	Mistral Large	69.50%	73.04%	80.15%
95	Gemma 3 12B	69.50%	79.28%	78.41%
96	Gemma 4 31B	69.00%	86.69%	86.91%
97	Claude Sonnet 4.5	68.00%	83.78%	88.03%
98	Llama 3.1 70B	68.00%	81.03%	78.40%
99	GPT-5.4 Mini (Reasoning, Low)	67.50%	88.49%	85.75%
100	Gemini 3.5 Flash (Reasoning, Minimal)	67.00%	83.90%	86.47%
101	Mistral Large 2	66.00%	69.19%	82.41%
102	GPT-4.1 Mini	65.50%	82.30%	83.20%
103	Grok 4.1 Fast	64.50%	84.12%	89.55%
104	Z.AI GLM 4.5 Air	64.50%	76.57%	83.12%
105	Qwen 2.5 72B	64.00%	76.43%	75.46%
106	GPT-4o, May 13th (temp=1)	63.50%	80.69%	83.80%
107	DeepSeek V3 (2024-12-26)	62.50%	81.87%	83.68%
108	DeepSeek V4 Flash	62.00%	83.26%	82.02%
109	Grok 4.20 (Beta)	60.50%	82.15%	83.85%
110	GPT-4o, Aug. 6th (temp=1)	60.50%	82.44%	82.62%
111	Inception Mercury	60.00%	87.38%	79.50%
112	Gemma 4 26B	60.00%	83.17%	85.84%
113	Ministral 3 8B	60.00%	74.43%	71.76%
114	GPT-4o, Aug. 6th (temp=0)	59.50%	82.11%	82.45%
115	DeepSeek V3.1	59.50%	76.65%	82.39%
116	Mistral Small 4	59.50%	78.28%	76.46%
117	Grok 4.20	58.50%	84.11%	81.70%
118	Ministral 3 14B	55.50%	79.03%	72.54%
119	Mistral NeMO	54.50%	51.55%	65.04%
120	Gemini 2.5 Flash	54.00%	61.45%	80.60%
121	WizardLM 2 8x22b	53.50%	67.14%	71.06%
122	DeepSeek V3 (2025-03-24)	53.50%	80.62%	81.99%
123	GPT-4o Mini (temp=1)	53.00%	82.16%	79.08%
124	DeepSeek V4 Pro	52.50%	77.57%	82.63%
125	Hermes 3 70B	52.50%	61.15%	72.57%
126	GPT-4.1 Nano	52.50%	68.45%	71.94%
127	GPT-5.4 Nano	51.50%	78.57%	74.40%
128	Mistral Small Creative	51.00%	76.28%	73.27%
129	Nemotron 3 Nano	50.50%	86.00%	77.73%
130	Ministral 3 3B	50.00%	72.38%	67.22%
131	Arcee AI: Trinity Mini	49.50%	59.94%	70.90%
132	Gemini 3 Flash (Preview)	48.00%	86.39%	85.35%
133	Ministral 3B	47.50%	49.17%	61.29%
134	Claude 3.7 Sonnet	47.00%	62.54%	83.39%
135	Cydonia 24B V4.1	46.50%	69.32%	75.09%
136	Skyfall 36B V2	46.50%	52.53%	65.76%
137	Llama 3.1 8B	46.50%	74.82%	63.35%
138	GPT-5.4	45.50%	81.95%	84.32%
139	GPT-4o Mini (temp=0)	45.00%	81.43%	78.29%
140	Arcee AI: Trinity Large (Preview)	44.50%	60.74%	73.33%
141	Claude 3 Haiku	44.50%	68.47%	71.19%
142	Rocinante 12B	43.50%	48.47%	54.54%
143	Gemma 3 27B	43.00%	76.82%	77.85%
144	Cohere Command R+ (Aug. 2024)	42.00%	59.51%	69.03%
145	Mistral Medium 3.1	39.00%	80.13%	77.83%
146	LFM2 24B	38.50%	69.48%	58.77%
147	GPT-5.5	37.50%	81.88%	89.09%
148	Claude 3.5 Sonnet	37.00%	76.75%	84.24%
149	Hermes 3 405B	37.00%	69.02%	82.86%
150	Ministral 8B	33.50%	46.82%	64.87%
151	Grok 4 Fast	32.50%	76.76%	86.15%
152	GPT-5.4 Mini	32.50%	79.37%	82.43%
153	Gemma 3 4B	31.00%	60.30%	68.57%
154	Grok 4.3	27.50%	66.41%	78.66%
155	Mistral Small 3.2 24B	23.50%	73.17%	78.58%