Utility

32 scenarios across 5 subcategories. 155 models scored.

Model Leaderboard

All models ranked by their Utility category score.

# Model Utility Word Counting Sentence Counting Paragraph Counting Structural Counting Data Extraction Overall
1 Gemini 3.1 Pro (Preview) 99.91% 99.56% 100.00% 100.00% 100.00% 100.00% 94.08%
2 Qwen3.7 Max 99.54% 97.69% 100.00% 100.00% 100.00% 100.00% 94.55%
3 Claude Opus 4.8 (Reasoning) 99.26% 97.26% 99.07% 100.00% 100.00% 100.00% 92.33%
4 Claude Opus 4.6 (Reasoning) 98.93% 94.65% 100.00% 100.00% 100.00% 100.00% 95.06%
5 Gemini 3.5 Flash (Reasoning) 98.86% 96.80% 100.00% 100.00% 97.50% 100.00% 93.35%
6 o4 Mini High 98.67% 95.97% 99.90% 100.00% 97.50% 100.00% 88.78%
7 GPT-5 Mini 98.39% 97.93% 100.00% 100.00% 94.00% 100.00% 91.31%
8 Qwen3.6 Max Preview 98.34% 91.71% 100.00% 100.00% 100.00% 100.00% 93.72%
9 Claude Opus 4.8 (Reasoning, Low) 98.00% 97.54% 98.48% 100.00% 94.00% 100.00% 91.89%
10 Claude Sonnet 4.6 (Reasoning) 97.88% 90.42% 99.96% 100.00% 99.00% 100.00% 93.64%
11 Claude Opus 4.7 (Reasoning) 97.87% 89.46% 99.91% 100.00% 100.00% 100.00% 92.53%
12 Z.AI GLM 5.1 97.51% 92.55% 100.00% 100.00% 95.00% 100.00% 93.74%
13 Qwen 3.5 397B A17B 97.50% 87.52% 100.00% 100.00% 100.00% 100.00% 91.09%
14 MoonshotAI: Kimi K2.6 97.42% 90.45% 100.00% 96.67% 100.00% 100.00% 92.57%
15 Gemini 3 Flash (Preview, Reasoning) 97.20% 86.00% 100.00% 100.00% 100.00% 100.00% 89.93%
16 GPT-5.4 (Reasoning) 96.89% 84.46% 100.00% 100.00% 100.00% 100.00% 93.85%
17 MoonshotAI: Kimi K2.5 96.63% 83.17% 100.00% 100.00% 100.00% 100.00% 90.86%
18 GPT-5.5 (Reasoning) 96.60% 83.08% 99.93% 100.00% 100.00% 100.00% 93.72%
19 Qwen 3.5 Plus (2026-04-20) 96.42% 84.60% 100.00% 100.00% 97.50% 100.00% 89.79%
20 Qwen 3.5 35B 96.42% 82.08% 100.00% 100.00% 100.00% 100.00% 87.01%
21 Z.AI GLM 5 Turbo 96.36% 95.82% 100.00% 100.00% 96.00% 90.00% 93.29%
22 Qwen 3.5 122B 96.36% 81.81% 100.00% 100.00% 100.00% 100.00% 90.32%
23 GPT-5.5 (Reasoning, Low) 96.36% 81.80% 100.00% 100.00% 100.00% 100.00% 92.51%
24 Gemma 4 31B (Reasoning) 96.32% 84.12% 99.99% 100.00% 97.50% 100.00% 89.64%
25 o4 Mini 96.31% 90.72% 98.35% 100.00% 92.50% 100.00% 86.56%
26 GPT-5.2 96.22% 84.32% 99.77% 100.00% 97.00% 100.00% 89.45%
27 Qwen 3.6 35B 96.20% 83.01% 98.00% 100.00% 100.00% 100.00% 87.66%
28 Gemini 3 Pro (Preview) 96.14% 80.75% 99.97% 100.00% 100.00% 100.00% 88.79%
29 Qwen 3.5 Flash 96.11% 80.57% 100.00% 100.00% 100.00% 100.00% 85.66%
30 Qwen 3.6 Flash 96.09% 82.44% 100.00% 100.00% 98.00% 100.00% 89.31%
31 Claude Opus 4.7 95.77% 87.30% 99.57% 100.00% 92.00% 100.00% 89.90%
32 Gemma 4 26B (Reasoning) 95.69% 83.45% 100.00% 100.00% 95.00% 100.00% 89.02%
33 Qwen 3.5 27B 95.67% 85.82% 92.55% 100.00% 100.00% 100.00% 90.05%
34 MiniMax M2.7 95.50% 94.05% 95.47% 100.00% 88.00% 100.00% 86.23%
35 Grok 4.20 (Beta, Reasoning) 95.41% 77.05% 100.00% 100.00% 100.00% 100.00% 90.98%
36 GPT-5.1 95.33% 85.32% 99.80% 100.00% 91.50% 100.00% 90.73%
37 GPT-5.4 (Reasoning, Low) 95.32% 76.62% 100.00% 100.00% 100.00% 100.00% 90.91%
38 Nemotron 3 Super 95.29% 93.51% 97.96% 100.00% 85.00% 100.00% 78.99%
39 GPT-5.4 Mini (Reasoning) 94.44% 81.72% 100.00% 100.00% 90.50% 100.00% 89.82%
40 Qwen 3.6 27B 94.32% 81.10% 100.00% 100.00% 95.50% 95.00% 88.33%
41 Z.AI GLM 4.7 94.31% 78.57% 99.99% 100.00% 93.00% 100.00% 87.67%
42 Z.AI GLM 5 94.11% 75.55% 100.00% 100.00% 95.00% 100.00% 89.60%
43 Qwen 3.5 9B 94.02% 74.59% 100.00% 100.00% 95.50% 100.00% 84.05%
44 Gemini 3.1 Flash Lite (Preview) 94.00% 89.38% 98.60% 100.00% 82.00% 100.00% 85.41%
45 GPT-5 Nano 93.91% 86.54% 97.99% 100.00% 85.00% 100.00% 80.16%
46 MiniMax M3 93.59% 82.35% 92.60% 100.00% 93.00% 100.00% 90.45%
47 GPT-5 93.53% 96.13% 100.00% 100.00% 76.50% 95.00% 91.48%
48 GPT-5.4 Nano (Reasoning) 93.34% 84.86% 95.83% 100.00% 91.00% 95.00% 80.02%
49 DeepSeek V4 Pro (Reasoning) 93.24% 75.21% 99.51% 100.00% 91.50% 100.00% 89.28%
50 Grok 4.3 (Reasoning) 92.94% 91.18% 100.00% 100.00% 73.50% 100.00% 90.99%
51 Inception Mercury 2 92.86% 93.76% 97.02% 100.00% 73.50% 100.00% 81.99%
52 Gemini 3.1 Flash Lite 92.77% 87.68% 94.68% 100.00% 81.50% 100.00% 85.09%
53 Grok 4.20 (Reasoning) 92.61% 75.57% 100.00% 100.00% 87.50% 100.00% 90.87%
54 Stealth: Aurora Alpha 92.59% 88.58% 97.37% 100.00% 77.00% 100.00% 83.79%
55 Gemini 3.1 Flash Lite (Reasoning) 92.32% 89.98% 90.14% 100.00% 81.50% 100.00% 85.91%
56 ByteDance Seed 2.0 Lite 92.23% 73.68% 99.96% 100.00% 87.50% 100.00% 84.27%
57 Gemini 2.5 Pro 92.18% 66.47% 99.43% 100.00% 100.00% 95.00% 88.44%
58 GPT-OSS 120B 92.03% 87.79% 96.88% 100.00% 75.50% 100.00% 84.81%
59 ByteDance Seed 2.0 Mini 91.88% 73.92% 99.96% 100.00% 85.50% 100.00% 85.69%
60 GPT-5.4 Nano (Reasoning, Low) 91.42% 73.42% 93.19% 100.00% 90.50% 100.00% 77.46%
61 Aion 2.0 90.91% 65.48% 89.10% 100.00% 100.00% 100.00% 86.66%
62 ByteDance Seed 1.6 90.83% 74.14% 100.00% 100.00% 80.00% 100.00% 89.59%
63 Claude Opus 4.6 90.72% 83.65% 99.93% 100.00% 70.00% 100.00% 92.31%
64 GPT-4.1 90.57% 81.33% 86.01% 100.00% 85.50% 100.00% 86.82%
65 MiniMax M2.5 90.42% 86.03% 75.58% 100.00% 90.50% 100.00% 86.71%
66 Claude Opus 4.5 89.84% 80.83% 95.88% 100.00% 72.50% 100.00% 89.60%
67 Grok 4 89.67% 75.34% 88.00% 100.00% 85.00% 100.00% 88.12%
68 Gemini 2.5 Flash Lite (Reasoning) 89.63% 55.13% 96.54% 100.00% 96.50% 100.00% 83.10%
69 Z.AI GLM 4.7 Flash 88.98% 68.93% 100.00% 100.00% 81.00% 95.00% 82.21%
70 Claude Opus 4 88.81% 74.37% 80.19% 100.00% 89.50% 100.00% 87.22%
71 Z.AI GLM 4.6 88.58% 47.21% 97.70% 100.00% 98.00% 100.00% 87.64%
72 Claude Sonnet 4.6 88.52% 71.94% 94.66% 100.00% 76.00% 100.00% 90.66%
73 GPT-5.4 Mini (Reasoning, Low) 88.49% 75.01% 99.92% 100.00% 67.50% 100.00% 83.57%
74 Llama 3.1 Nemotron 70B 88.31% 47.07% 99.99% 100.00% 94.50% 100.00% 74.70%
75 DeepSeek V4 Flash (Reasoning) 87.53% 49.57% 93.07% 100.00% 95.00% 100.00% 88.06%
76 Inception Mercury 87.38% 85.14% 96.78% 100.00% 60.00% 95.00% 79.50%
77 Gemma 4 31B 86.69% 64.43% 99.99% 100.00% 69.00% 100.00% 85.23%
78 Qwen 3.5 Plus (2026-02-15) 86.65% 44.24% 100.00% 100.00% 89.00% 100.00% 86.17%
79 Gemini 3 Flash (Preview) 86.39% 83.97% 99.99% 100.00% 48.00% 100.00% 85.47%
80 Nemotron 3 Nano 86.00% 86.70% 92.81% 100.00% 50.50% 100.00% 74.50%
81 Mistral Small 4 (Reasoning) 85.61% 56.00% 96.06% 100.00% 76.00% 100.00% 79.48%
82 Mistral Large 3 84.91% 52.60% 99.95% 100.00% 72.00% 100.00% 84.29%
83 Stealth: Hunter Alpha 84.63% 55.55% 87.94% 96.67% 83.00% 100.00% 87.34%
84 ByteDance Seed 1.6 Flash 84.16% 48.55% 82.27% 100.00% 90.00% 100.00% 71.22%
85 Grok 4.1 Fast 84.12% 60.51% 95.61% 100.00% 64.50% 100.00% 89.55%
86 Grok 4.20 84.11% 62.89% 99.17% 100.00% 58.50% 100.00% 81.21%
87 Claude Sonnet 4 84.02% 53.92% 84.17% 100.00% 82.00% 100.00% 87.64%
88 Gemini 3.5 Flash (Reasoning, Minimal) 83.90% 73.28% 79.24% 100.00% 67.00% 100.00% 85.88%
89 DeepSeek-V2 Chat 83.82% 41.72% 97.87% 100.00% 79.50% 100.00% 84.09%
90 Claude Sonnet 4.5 83.78% 58.93% 91.98% 100.00% 68.00% 100.00% 87.54%
91 DeepSeek V4 Flash 83.26% 54.98% 99.32% 100.00% 62.00% 100.00% 82.02%
92 Gemma 4 26B 83.17% 66.29% 89.58% 100.00% 60.00% 100.00% 84.89%
93 Qwen3 235B A22B Instruct 2507 83.15% 40.15% 80.58% 100.00% 95.00% 100.00% 78.07%
94 GPT-4o, May 13th (temp=0) 83.13% 65.85% 71.82% 100.00% 78.00% 100.00% 84.73%
95 Xiaomi MIMO v2.5 Pro 82.62% 52.55% 83.55% 100.00% 77.00% 100.00% 86.05%
96 GPT-4o, Aug. 6th (temp=1) 82.44% 77.45% 74.24% 100.00% 60.50% 100.00% 81.28%
97 Stealth: Healer Alpha 82.30% 61.71% 90.46% 93.33% 71.00% 95.00% 85.93%
98 GPT-4.1 Mini 82.30% 66.59% 79.41% 100.00% 65.50% 100.00% 81.40%
99 Gemini 2.5 Flash (Reasoning) 82.25% 46.02% 85.21% 100.00% 90.00% 90.00% 84.14%
100 GPT-4o Mini (temp=1) 82.16% 78.18% 79.62% 100.00% 53.00% 100.00% 79.08%
101 Grok 4.20 (Beta) 82.15% 59.16% 99.42% 96.67% 60.50% 95.00% 82.64%
102 GPT-4o, Aug. 6th (temp=0) 82.11% 77.04% 74.00% 100.00% 59.50% 100.00% 82.18%
103 GPT-5.4 81.95% 64.23% 100.00% 100.00% 45.50% 100.00% 84.31%
104 GPT-5.5 81.88% 71.88% 100.00% 100.00% 37.50% 100.00% 89.37%
105 DeepSeek V3 (2024-12-26) 81.87% 58.30% 88.56% 100.00% 62.50% 100.00% 82.62%
106 Qwen 3 32B 81.66% 31.96% 95.82% 100.00% 80.50% 100.00% 79.37%
107 DeepSeek V3.2 81.58% 45.81% 92.09% 100.00% 70.00% 100.00% 82.22%
108 GPT-4o Mini (temp=0) 81.43% 82.65% 79.49% 100.00% 45.00% 100.00% 78.29%
109 Xiaomi MIMO v2.5 81.15% 54.18% 86.72% 93.33% 71.50% 100.00% 83.95%
110 Llama 3.1 70B 81.03% 56.62% 80.53% 100.00% 68.00% 100.00% 77.41%
111 GPT-4o, May 13th (temp=1) 80.69% 65.90% 74.04% 100.00% 63.50% 100.00% 82.99%
112 DeepSeek V3 (2025-03-24) 80.62% 58.32% 97.96% 93.33% 53.50% 100.00% 79.93%
113 Gemini 2.5 Flash Lite 80.14% 53.37% 73.32% 100.00% 74.00% 100.00% 79.91%
114 Mistral Medium 3.1 80.13% 64.40% 97.23% 100.00% 39.00% 100.00% 76.08%
115 Writer: Palmyra X5 79.71% 35.61% 79.93% 100.00% 83.00% 100.00% 78.11%
116 GPT-5.4 Mini 79.37% 66.13% 98.23% 100.00% 32.50% 100.00% 80.45%
117 Gemma 3 12B 79.28% 46.84% 80.06% 100.00% 69.50% 100.00% 76.07%
118 Z.AI GLM 4.5 79.19% 38.11% 74.82% 100.00% 83.00% 100.00% 84.95%
119 Ministral 3 14B 79.03% 47.57% 92.10% 100.00% 55.50% 100.00% 70.45%
120 GPT-5.4 Nano 78.57% 47.90% 93.46% 100.00% 51.50% 100.00% 72.16%
121 Mistral Small 4 78.28% 46.07% 85.82% 100.00% 59.50% 100.00% 75.23%
122 DeepSeek V4 Pro 77.57% 45.34% 90.01% 100.00% 52.50% 100.00% 82.05%
123 Gemma 3 27B 76.82% 45.66% 95.44% 100.00% 43.00% 100.00% 75.70%
124 Grok 4 Fast 76.76% 61.39% 89.92% 100.00% 32.50% 100.00% 86.15%
125 Claude 3.5 Sonnet 76.75% 53.50% 93.24% 100.00% 37.00% 100.00% 84.24%
126 DeepSeek V3.1 76.65% 47.10% 83.32% 93.33% 59.50% 100.00% 82.35%
127 Z.AI GLM 4.5 Air 76.57% 34.68% 83.68% 100.00% 64.50% 100.00% 80.74%
128 Qwen 2.5 72B 76.43% 51.43% 66.70% 100.00% 64.00% 100.00% 73.17%
129 Mistral Small Creative 76.28% 33.57% 96.83% 100.00% 51.00% 100.00% 73.27%
130 Llama 3.1 8B 74.82% 56.94% 85.64% 90.00% 46.50% 95.00% 61.44%
131 Ministral 3 8B 74.43% 38.01% 74.12% 100.00% 60.00% 100.00% 69.98%
132 Mistral Small 3.2 24B 73.17% 42.38% 99.97% 100.00% 23.50% 100.00% 77.36%
133 Mistral Large 73.04% 31.90% 63.80% 100.00% 69.50% 100.00% 79.91%
134 Claude Haiku 4.5 72.48% 51.60% 70.28% 70.00% 70.50% 100.00% 83.36%
135 Ministral 3 3B 72.38% 37.04% 74.85% 100.00% 50.00% 100.00% 65.02%
136 LFM2 24B 69.48% 44.98% 63.93% 100.00% 38.50% 100.00% 57.93%
137 Cydonia 24B V4.1 69.32% 36.47% 85.28% 93.33% 46.50% 85.00% 72.68%
138 Mistral Large 2 69.19% 21.18% 58.78% 100.00% 66.00% 100.00% 81.50%
139 Hermes 3 405B 69.02% 51.42% 79.99% 86.67% 37.00% 90.00% 80.80%
140 Claude 3 Haiku 68.47% 39.32% 61.88% 96.67% 44.50% 100.00% 70.13%
141 GPT-4.1 Nano 68.45% 64.57% 75.17% 100.00% 52.50% 50.00% 69.90%
142 WizardLM 2 8x22b 67.14% 20.73% 71.48% 90.00% 53.50% 100.00% 71.45%
143 Grok 4.3 66.41% 33.50% 89.39% 86.67% 27.50% 95.00% 78.00%
144 Claude 3.7 Sonnet 62.54% 66.00% 66.35% 33.33% 47.00% 100.00% 83.39%
145 Gemini 2.5 Flash 61.45% 38.47% 64.79% 100.00% 54.00% 50.00% 80.61%
146 Hermes 3 70B 61.15% 28.43% 58.14% 66.67% 52.50% 100.00% 69.74%
147 Arcee AI: Trinity Large (Preview) 60.74% 43.69% 57.16% 73.33% 44.50% 85.00% 73.33%
148 Gemma 3 4B 60.30% 22.12% 95.04% 53.33% 31.00% 100.00% 66.33%
149 Arcee AI: Trinity Mini 59.94% 26.25% 43.97% 80.00% 49.50% 100.00% 67.68%
150 Cohere Command R+ (Aug. 2024) 59.51% 35.13% 60.43% 90.00% 42.00% 70.00% 67.04%
151 Skyfall 36B V2 52.53% 18.30% 52.83% 70.00% 46.50% 75.00% 63.65%
152 Mistral NeMO 51.55% 15.70% 47.57% 40.00% 54.50% 100.00% 63.80%
153 Ministral 3B 49.17% 17.87% 62.17% 33.33% 47.50% 85.00% 59.25%
154 Rocinante 12B 48.47% 14.07% 53.12% 36.67% 43.50% 95.00% 54.02%
155 Ministral 8B 46.82% 21.93% 60.35% 33.33% 33.50% 85.00% 63.77%