Utility

32 scenarios across 5 subcategories. 155 models scored.

Subcategories

Subcategory Avg Score Best Model Best Score
Word Counting 65.78% Gemini 3.1 Pro (Preview) 99.56%
Sentence Counting 89.58% Qwen3.7 Max 100.00%
Paragraph Counting 96.04% Qwen3.7 Max 100.00%
Structural Counting 74.58% Qwen3.7 Max 100.00%
Data Extraction 98.06% Qwen3.7 Max 100.00%

Model Leaderboard

All models ranked by their Utility category score.

# Model Utility Word Counting Sentence Counting Paragraph Counting Structural Counting Data Extraction Overall
1 Gemini 3.1 Pro (Preview) 99.91% 99.56% 100.00% 100.00% 100.00% 100.00% 94.37%
2 Qwen3.7 Max 99.54% 97.69% 100.00% 100.00% 100.00% 100.00% 95.75%
3 Claude Opus 4.8 (Reasoning) 99.26% 97.26% 99.07% 100.00% 100.00% 100.00% 92.22%
4 Claude Opus 4.6 (Reasoning) 98.93% 94.65% 100.00% 100.00% 100.00% 100.00% 95.02%
5 Gemini 3.5 Flash (Reasoning) 98.86% 96.80% 100.00% 100.00% 97.50% 100.00% 94.08%
6 o4 Mini High 98.67% 95.97% 99.90% 100.00% 97.50% 100.00% 90.29%
7 GPT-5 Mini 98.39% 97.93% 100.00% 100.00% 94.00% 100.00% 92.62%
8 Qwen3.6 Max Preview 98.34% 91.71% 100.00% 100.00% 100.00% 100.00% 94.54%
9 Claude Opus 4.8 (Reasoning, Low) 98.00% 97.54% 98.48% 100.00% 94.00% 100.00% 92.14%
10 Claude Sonnet 4.6 (Reasoning) 97.88% 90.42% 99.96% 100.00% 99.00% 100.00% 93.66%
11 Claude Opus 4.7 (Reasoning) 97.87% 89.46% 99.91% 100.00% 100.00% 100.00% 93.23%
12 Z.AI GLM 5.1 97.51% 92.55% 100.00% 100.00% 95.00% 100.00% 94.37%
13 Qwen 3.5 397B A17B 97.50% 87.52% 100.00% 100.00% 100.00% 100.00% 91.73%
14 MoonshotAI: Kimi K2.6 97.42% 90.45% 100.00% 96.67% 100.00% 100.00% 92.31%
15 Gemini 3 Flash (Preview, Reasoning) 97.20% 86.00% 100.00% 100.00% 100.00% 100.00% 90.50%
16 GPT-5.4 (Reasoning) 96.89% 84.46% 100.00% 100.00% 100.00% 100.00% 93.24%
17 MoonshotAI: Kimi K2.5 96.63% 83.17% 100.00% 100.00% 100.00% 100.00% 91.04%
18 GPT-5.5 (Reasoning) 96.60% 83.08% 99.93% 100.00% 100.00% 100.00% 92.98%
19 Qwen 3.5 Plus (2026-04-20) 96.42% 84.60% 100.00% 100.00% 97.50% 100.00% 91.51%
20 Qwen 3.5 35B 96.42% 82.08% 100.00% 100.00% 100.00% 100.00% 88.00%
21 Z.AI GLM 5 Turbo 96.36% 95.82% 100.00% 100.00% 96.00% 90.00% 94.27%
22 Qwen 3.5 122B 96.36% 81.81% 100.00% 100.00% 100.00% 100.00% 91.53%
23 GPT-5.5 (Reasoning, Low) 96.36% 81.80% 100.00% 100.00% 100.00% 100.00% 92.59%
24 Gemma 4 31B (Reasoning) 96.32% 84.12% 99.99% 100.00% 97.50% 100.00% 91.71%
25 o4 Mini 96.31% 90.72% 98.35% 100.00% 92.50% 100.00% 88.35%
26 GPT-5.2 96.22% 84.32% 99.77% 100.00% 97.00% 100.00% 90.26%
27 Qwen 3.6 35B 96.20% 83.01% 98.00% 100.00% 100.00% 100.00% 89.05%
28 Gemini 3 Pro (Preview) 96.14% 80.75% 99.97% 100.00% 100.00% 100.00% 88.79%
29 Qwen 3.5 Flash 96.11% 80.57% 100.00% 100.00% 100.00% 100.00% 86.38%
30 Qwen 3.6 Flash 96.09% 82.44% 100.00% 100.00% 98.00% 100.00% 90.65%
31 Claude Opus 4.7 95.77% 87.30% 99.57% 100.00% 92.00% 100.00% 89.93%
32 Gemma 4 26B (Reasoning) 95.69% 83.45% 100.00% 100.00% 95.00% 100.00% 91.49%
33 Qwen 3.5 27B 95.67% 85.82% 92.55% 100.00% 100.00% 100.00% 90.85%
34 MiniMax M2.7 95.50% 94.05% 95.47% 100.00% 88.00% 100.00% 89.10%
35 Grok 4.20 (Beta, Reasoning) 95.41% 77.05% 100.00% 100.00% 100.00% 100.00% 91.49%
36 GPT-5.1 95.33% 85.32% 99.80% 100.00% 91.50% 100.00% 92.54%
37 GPT-5.4 (Reasoning, Low) 95.32% 76.62% 100.00% 100.00% 100.00% 100.00% 91.41%
38 Nemotron 3 Super 95.29% 93.51% 97.96% 100.00% 85.00% 100.00% 84.56%
39 GPT-5.4 Mini (Reasoning) 94.44% 81.72% 100.00% 100.00% 90.50% 100.00% 90.65%
40 Qwen 3.6 27B 94.32% 81.10% 100.00% 100.00% 95.50% 95.00% 89.72%
41 Z.AI GLM 4.7 94.31% 78.57% 99.99% 100.00% 93.00% 100.00% 88.69%
42 Z.AI GLM 5 94.11% 75.55% 100.00% 100.00% 95.00% 100.00% 91.23%
43 Qwen 3.5 9B 94.02% 74.59% 100.00% 100.00% 95.50% 100.00% 86.05%
44 Gemini 3.1 Flash Lite (Preview) 94.00% 89.38% 98.60% 100.00% 82.00% 100.00% 85.87%
45 GPT-5 Nano 93.91% 86.54% 97.99% 100.00% 85.00% 100.00% 82.60%
46 MiniMax M3 93.59% 82.35% 92.60% 100.00% 93.00% 100.00% 90.88%
47 GPT-5 93.53% 96.13% 100.00% 100.00% 76.50% 95.00% 91.93%
48 GPT-5.4 Nano (Reasoning) 93.34% 84.86% 95.83% 100.00% 91.00% 95.00% 81.36%
49 DeepSeek V4 Pro (Reasoning) 93.24% 75.21% 99.51% 100.00% 91.50% 100.00% 90.10%
50 Grok 4.3 (Reasoning) 92.94% 91.18% 100.00% 100.00% 73.50% 100.00% 93.60%
51 Inception Mercury 2 92.86% 93.76% 97.02% 100.00% 73.50% 100.00% 83.85%
52 Gemini 3.1 Flash Lite 92.77% 87.68% 94.68% 100.00% 81.50% 100.00% 85.75%
53 Grok 4.20 (Reasoning) 92.61% 75.57% 100.00% 100.00% 87.50% 100.00% 91.39%
54 Stealth: Aurora Alpha 92.59% 88.58% 97.37% 100.00% 77.00% 100.00% 83.79%
55 Gemini 3.1 Flash Lite (Reasoning) 92.32% 89.98% 90.14% 100.00% 81.50% 100.00% 86.41%
56 ByteDance Seed 2.0 Lite 92.23% 73.68% 99.96% 100.00% 87.50% 100.00% 84.80%
57 Gemini 2.5 Pro 92.18% 66.47% 99.43% 100.00% 100.00% 95.00% 88.53%
58 GPT-OSS 120B 92.03% 87.79% 96.88% 100.00% 75.50% 100.00% 86.44%
59 ByteDance Seed 2.0 Mini 91.88% 73.92% 99.96% 100.00% 85.50% 100.00% 86.91%
60 GPT-5.4 Nano (Reasoning, Low) 91.42% 73.42% 93.19% 100.00% 90.50% 100.00% 79.48%
61 Aion 2.0 90.91% 65.48% 89.10% 100.00% 100.00% 100.00% 89.21%
62 ByteDance Seed 1.6 90.83% 74.14% 100.00% 100.00% 80.00% 100.00% 90.70%
63 Claude Opus 4.6 90.72% 83.65% 99.93% 100.00% 70.00% 100.00% 92.35%
64 GPT-4.1 90.57% 81.33% 86.01% 100.00% 85.50% 100.00% 88.68%
65 MiniMax M2.5 90.42% 86.03% 75.58% 100.00% 90.50% 100.00% 88.71%
66 Claude Opus 4.5 89.84% 80.83% 95.88% 100.00% 72.50% 100.00% 89.69%
67 Grok 4 89.67% 75.34% 88.00% 100.00% 85.00% 100.00% 88.12%
68 Gemini 2.5 Flash Lite (Reasoning) 89.63% 55.13% 96.54% 100.00% 96.50% 100.00% 85.75%
69 Z.AI GLM 4.7 Flash 88.98% 68.93% 100.00% 100.00% 81.00% 95.00% 84.82%
70 Claude Opus 4 88.81% 74.37% 80.19% 100.00% 89.50% 100.00% 87.69%
71 Z.AI GLM 4.6 88.58% 47.21% 97.70% 100.00% 98.00% 100.00% 89.11%
72 Claude Sonnet 4.6 88.52% 71.94% 94.66% 100.00% 76.00% 100.00% 91.15%
73 GPT-5.4 Mini (Reasoning, Low) 88.49% 75.01% 99.92% 100.00% 67.50% 100.00% 85.75%
74 Llama 3.1 Nemotron 70B 88.31% 47.07% 99.99% 100.00% 94.50% 100.00% 74.70%
75 DeepSeek V4 Flash (Reasoning) 87.53% 49.57% 93.07% 100.00% 95.00% 100.00% 89.01%
76 Inception Mercury 87.38% 85.14% 96.78% 100.00% 60.00% 95.00% 79.50%
77 Gemma 4 31B 86.69% 64.43% 99.99% 100.00% 69.00% 100.00% 86.91%
78 Qwen 3.5 Plus (2026-02-15) 86.65% 44.24% 100.00% 100.00% 89.00% 100.00% 85.96%
79 Gemini 3 Flash (Preview) 86.39% 83.97% 99.99% 100.00% 48.00% 100.00% 85.35%
80 Nemotron 3 Nano 86.00% 86.70% 92.81% 100.00% 50.50% 100.00% 77.73%
81 Mistral Small 4 (Reasoning) 85.61% 56.00% 96.06% 100.00% 76.00% 100.00% 82.39%
82 Mistral Large 3 84.91% 52.60% 99.95% 100.00% 72.00% 100.00% 85.43%
83 Stealth: Hunter Alpha 84.63% 55.55% 87.94% 96.67% 83.00% 100.00% 87.34%
84 ByteDance Seed 1.6 Flash 84.16% 48.55% 82.27% 100.00% 90.00% 100.00% 73.27%
85 Grok 4.1 Fast 84.12% 60.51% 95.61% 100.00% 64.50% 100.00% 89.55%
86 Grok 4.20 84.11% 62.89% 99.17% 100.00% 58.50% 100.00% 81.70%
87 Claude Sonnet 4 84.02% 53.92% 84.17% 100.00% 82.00% 100.00% 88.72%
88 Gemini 3.5 Flash (Reasoning, Minimal) 83.90% 73.28% 79.24% 100.00% 67.00% 100.00% 86.47%
89 DeepSeek-V2 Chat 83.82% 41.72% 97.87% 100.00% 79.50% 100.00% 84.83%
90 Claude Sonnet 4.5 83.78% 58.93% 91.98% 100.00% 68.00% 100.00% 88.03%
91 DeepSeek V4 Flash 83.26% 54.98% 99.32% 100.00% 62.00% 100.00% 82.02%
92 Gemma 4 26B 83.17% 66.29% 89.58% 100.00% 60.00% 100.00% 85.84%
93 Qwen3 235B A22B Instruct 2507 83.15% 40.15% 80.58% 100.00% 95.00% 100.00% 80.10%
94 GPT-4o, May 13th (temp=0) 83.13% 65.85% 71.82% 100.00% 78.00% 100.00% 85.36%
95 Xiaomi MIMO v2.5 Pro 82.62% 52.55% 83.55% 100.00% 77.00% 100.00% 87.36%
96 GPT-4o, Aug. 6th (temp=1) 82.44% 77.45% 74.24% 100.00% 60.50% 100.00% 82.62%
97 Stealth: Healer Alpha 82.30% 61.71% 90.46% 93.33% 71.00% 95.00% 85.93%
98 GPT-4.1 Mini 82.30% 66.59% 79.41% 100.00% 65.50% 100.00% 83.20%
99 Gemini 2.5 Flash (Reasoning) 82.25% 46.02% 85.21% 100.00% 90.00% 90.00% 86.51%
100 GPT-4o Mini (temp=1) 82.16% 78.18% 79.62% 100.00% 53.00% 100.00% 79.08%
101 Grok 4.20 (Beta) 82.15% 59.16% 99.42% 96.67% 60.50% 95.00% 83.85%
102 GPT-4o, Aug. 6th (temp=0) 82.11% 77.04% 74.00% 100.00% 59.50% 100.00% 82.45%
103 GPT-5.4 81.95% 64.23% 100.00% 100.00% 45.50% 100.00% 84.32%
104 GPT-5.5 81.88% 71.88% 100.00% 100.00% 37.50% 100.00% 89.09%
105 DeepSeek V3 (2024-12-26) 81.87% 58.30% 88.56% 100.00% 62.50% 100.00% 83.68%
106 Qwen 3 32B 81.66% 31.96% 95.82% 100.00% 80.50% 100.00% 82.21%
107 DeepSeek V3.2 81.58% 45.81% 92.09% 100.00% 70.00% 100.00% 82.25%
108 GPT-4o Mini (temp=0) 81.43% 82.65% 79.49% 100.00% 45.00% 100.00% 78.29%
109 Xiaomi MIMO v2.5 81.15% 54.18% 86.72% 93.33% 71.50% 100.00% 85.05%
110 Llama 3.1 70B 81.03% 56.62% 80.53% 100.00% 68.00% 100.00% 78.40%
111 GPT-4o, May 13th (temp=1) 80.69% 65.90% 74.04% 100.00% 63.50% 100.00% 83.80%
112 DeepSeek V3 (2025-03-24) 80.62% 58.32% 97.96% 93.33% 53.50% 100.00% 81.99%
113 Gemini 2.5 Flash Lite 80.14% 53.37% 73.32% 100.00% 74.00% 100.00% 81.08%
114 Mistral Medium 3.1 80.13% 64.40% 97.23% 100.00% 39.00% 100.00% 77.83%
115 Writer: Palmyra X5 79.71% 35.61% 79.93% 100.00% 83.00% 100.00% 79.57%
116 GPT-5.4 Mini 79.37% 66.13% 98.23% 100.00% 32.50% 100.00% 82.43%
117 Gemma 3 12B 79.28% 46.84% 80.06% 100.00% 69.50% 100.00% 78.41%
118 Z.AI GLM 4.5 79.19% 38.11% 74.82% 100.00% 83.00% 100.00% 86.27%
119 Ministral 3 14B 79.03% 47.57% 92.10% 100.00% 55.50% 100.00% 72.54%
120 GPT-5.4 Nano 78.57% 47.90% 93.46% 100.00% 51.50% 100.00% 74.40%
121 Mistral Small 4 78.28% 46.07% 85.82% 100.00% 59.50% 100.00% 76.46%
122 DeepSeek V4 Pro 77.57% 45.34% 90.01% 100.00% 52.50% 100.00% 82.63%
123 Gemma 3 27B 76.82% 45.66% 95.44% 100.00% 43.00% 100.00% 77.85%
124 Grok 4 Fast 76.76% 61.39% 89.92% 100.00% 32.50% 100.00% 86.15%
125 Claude 3.5 Sonnet 76.75% 53.50% 93.24% 100.00% 37.00% 100.00% 84.24%
126 DeepSeek V3.1 76.65% 47.10% 83.32% 93.33% 59.50% 100.00% 82.39%
127 Z.AI GLM 4.5 Air 76.57% 34.68% 83.68% 100.00% 64.50% 100.00% 83.12%
128 Qwen 2.5 72B 76.43% 51.43% 66.70% 100.00% 64.00% 100.00% 75.46%
129 Mistral Small Creative 76.28% 33.57% 96.83% 100.00% 51.00% 100.00% 73.27%
130 Llama 3.1 8B 74.82% 56.94% 85.64% 90.00% 46.50% 95.00% 63.35%
131 Ministral 3 8B 74.43% 38.01% 74.12% 100.00% 60.00% 100.00% 71.76%
132 Mistral Small 3.2 24B 73.17% 42.38% 99.97% 100.00% 23.50% 100.00% 78.58%
133 Mistral Large 73.04% 31.90% 63.80% 100.00% 69.50% 100.00% 80.15%
134 Claude Haiku 4.5 72.48% 51.60% 70.28% 70.00% 70.50% 100.00% 85.14%
135 Ministral 3 3B 72.38% 37.04% 74.85% 100.00% 50.00% 100.00% 67.22%
136 LFM2 24B 69.48% 44.98% 63.93% 100.00% 38.50% 100.00% 58.77%
137 Cydonia 24B V4.1 69.32% 36.47% 85.28% 93.33% 46.50% 85.00% 75.09%
138 Mistral Large 2 69.19% 21.18% 58.78% 100.00% 66.00% 100.00% 82.41%
139 Hermes 3 405B 69.02% 51.42% 79.99% 86.67% 37.00% 90.00% 82.86%
140 Claude 3 Haiku 68.47% 39.32% 61.88% 96.67% 44.50% 100.00% 71.19%
141 GPT-4.1 Nano 68.45% 64.57% 75.17% 100.00% 52.50% 50.00% 71.94%
142 WizardLM 2 8x22b 67.14% 20.73% 71.48% 90.00% 53.50% 100.00% 71.06%
143 Grok 4.3 66.41% 33.50% 89.39% 86.67% 27.50% 95.00% 78.66%
144 Claude 3.7 Sonnet 62.54% 66.00% 66.35% 33.33% 47.00% 100.00% 83.39%
145 Gemini 2.5 Flash 61.45% 38.47% 64.79% 100.00% 54.00% 50.00% 80.60%
146 Hermes 3 70B 61.15% 28.43% 58.14% 66.67% 52.50% 100.00% 72.57%
147 Arcee AI: Trinity Large (Preview) 60.74% 43.69% 57.16% 73.33% 44.50% 85.00% 73.33%
148 Gemma 3 4B 60.30% 22.12% 95.04% 53.33% 31.00% 100.00% 68.57%
149 Arcee AI: Trinity Mini 59.94% 26.25% 43.97% 80.00% 49.50% 100.00% 70.90%
150 Cohere Command R+ (Aug. 2024) 59.51% 35.13% 60.43% 90.00% 42.00% 70.00% 69.03%
151 Skyfall 36B V2 52.53% 18.30% 52.83% 70.00% 46.50% 75.00% 65.76%
152 Mistral NeMO 51.55% 15.70% 47.57% 40.00% 54.50% 100.00% 65.04%
153 Ministral 3B 49.17% 17.87% 62.17% 33.33% 47.50% 85.00% 61.29%
154 Rocinante 12B 48.47% 14.07% 53.12% 36.67% 43.50% 95.00% 54.54%
155 Ministral 8B 46.82% 21.93% 60.35% 33.33% 33.50% 85.00% 64.87%