Utility

32 scenarios across 5 subcategories. 118 models scored.

Model Leaderboard

All models ranked by their Utility category score.

# Model Utility Word Counting Sentence Counting Paragraph Counting Structural Counting Data Extraction Overall
1 Gemini 3.1 Pro (Preview) 99.91% 99.56% 100.00% 100.00% 100.00% 100.00% 94.37%
2 Claude Opus 4.6 (Reasoning) 98.93% 94.65% 100.00% 100.00% 100.00% 100.00% 95.02%
3 o4 Mini High 98.67% 95.97% 99.90% 100.00% 97.50% 100.00% 90.29%
4 GPT-5 Mini 98.39% 97.93% 100.00% 100.00% 94.00% 100.00% 92.62%
5 Claude Sonnet 4.6 (Reasoning) 97.88% 90.42% 99.96% 100.00% 99.00% 100.00% 93.66%
6 Qwen 3.5 397B A17B 97.50% 87.52% 100.00% 100.00% 100.00% 100.00% 91.73%
7 Gemini 3 Flash (Preview, Reasoning) 97.20% 86.00% 100.00% 100.00% 100.00% 100.00% 90.50%
8 GPT-5.4 (Reasoning) 96.89% 84.46% 100.00% 100.00% 100.00% 100.00% 93.24%
9 MoonshotAI: Kimi K2.5 96.63% 83.17% 100.00% 100.00% 100.00% 100.00% 91.04%
10 Qwen 3.5 35B 96.42% 82.08% 100.00% 100.00% 100.00% 100.00% 88.00%
11 Z.AI GLM 5 Turbo 96.36% 95.82% 100.00% 100.00% 96.00% 90.00% 94.27%
12 Qwen 3.5 122B 96.36% 81.81% 100.00% 100.00% 100.00% 100.00% 91.53%
13 o4 Mini 96.31% 90.72% 98.35% 100.00% 92.50% 100.00% 88.35%
14 GPT-5.2 96.22% 84.32% 99.77% 100.00% 97.00% 100.00% 90.26%
15 Gemini 3 Pro (Preview) 96.14% 80.75% 99.97% 100.00% 100.00% 100.00% 88.79%
16 Qwen 3.5 Flash 96.11% 80.57% 100.00% 100.00% 100.00% 100.00% 86.38%
17 Qwen 3.5 27B 95.67% 85.82% 92.55% 100.00% 100.00% 100.00% 90.85%
18 MiniMax M2.7 95.50% 94.05% 95.47% 100.00% 88.00% 100.00% 89.10%
19 Grok 4.20 (Beta, Reasoning) 95.41% 77.05% 100.00% 100.00% 100.00% 100.00% 91.49%
20 GPT-5.1 95.33% 85.32% 99.80% 100.00% 91.50% 100.00% 92.54%
21 GPT-5.4 (Reasoning, Low) 95.32% 76.62% 100.00% 100.00% 100.00% 100.00% 91.41%
22 Nemotron 3 Super 95.29% 93.51% 97.96% 100.00% 85.00% 100.00% 84.56%
23 GPT-5.4 Mini (Reasoning) 94.44% 81.72% 100.00% 100.00% 90.50% 100.00% 90.65%
24 Z.AI GLM 4.7 94.31% 78.57% 99.99% 100.00% 93.00% 100.00% 88.69%
25 Z.AI GLM 5 94.11% 75.55% 100.00% 100.00% 95.00% 100.00% 91.23%
26 Qwen 3.5 9B 94.02% 74.59% 100.00% 100.00% 95.50% 100.00% 86.05%
27 Gemini 3.1 Flash Lite (Preview) 94.00% 89.38% 98.60% 100.00% 82.00% 100.00% 85.87%
28 GPT-5 Nano 93.91% 86.54% 97.99% 100.00% 85.00% 100.00% 82.60%
29 GPT-5 93.53% 96.13% 100.00% 100.00% 76.50% 95.00% 91.93%
30 GPT-5.4 Nano (Reasoning) 93.34% 84.86% 95.83% 100.00% 91.00% 95.00% 81.36%
31 Inception Mercury 2 92.86% 93.76% 97.02% 100.00% 73.50% 100.00% 83.85%
32 Stealth: Aurora Alpha 92.59% 88.58% 97.37% 100.00% 77.00% 100.00% 83.79%
33 ByteDance Seed 2.0 Lite 92.23% 73.68% 99.96% 100.00% 87.50% 100.00% 84.80%
34 Gemini 2.5 Pro 92.18% 66.47% 99.43% 100.00% 100.00% 95.00% 88.53%
35 ByteDance Seed 2.0 Mini 91.88% 73.92% 99.96% 100.00% 85.50% 100.00% 86.91%
36 GPT-5.4 Nano (Reasoning, Low) 91.42% 73.42% 93.19% 100.00% 90.50% 100.00% 79.48%
37 Aion 2.0 90.91% 65.48% 89.10% 100.00% 100.00% 100.00% 89.21%
38 ByteDance Seed 1.6 90.83% 74.14% 100.00% 100.00% 80.00% 100.00% 90.70%
39 Claude Opus 4.6 90.72% 83.65% 99.93% 100.00% 70.00% 100.00% 92.35%
40 GPT-4.1 90.57% 81.33% 86.01% 100.00% 85.50% 100.00% 88.68%
41 MiniMax M2.5 90.42% 86.03% 75.58% 100.00% 90.50% 100.00% 88.71%
42 Claude Opus 4.5 89.84% 80.83% 95.88% 100.00% 72.50% 100.00% 89.69%
43 Grok 4 89.67% 75.34% 88.00% 100.00% 85.00% 100.00% 88.12%
44 Gemini 2.5 Flash Lite (Reasoning) 89.63% 55.13% 96.54% 100.00% 96.50% 100.00% 85.75%
45 Z.AI GLM 4.7 Flash 88.98% 68.93% 100.00% 100.00% 81.00% 95.00% 84.82%
46 Claude Opus 4 88.81% 74.37% 80.19% 100.00% 89.50% 100.00% 87.69%
47 Z.AI GLM 4.6 88.58% 47.21% 97.70% 100.00% 98.00% 100.00% 89.11%
48 Claude Sonnet 4.6 88.52% 71.94% 94.66% 100.00% 76.00% 100.00% 91.15%
49 GPT-5.4 Mini (Reasoning, Low) 88.49% 75.01% 99.92% 100.00% 67.50% 100.00% 85.75%
50 Llama 3.1 Nemotron 70B 88.31% 47.07% 99.99% 100.00% 94.50% 100.00% 74.70%
51 Inception Mercury 87.38% 85.14% 96.78% 100.00% 60.00% 95.00% 79.50%
52 Qwen 3.5 Plus (2026-02-15) 86.65% 44.24% 100.00% 100.00% 89.00% 100.00% 85.96%
53 Gemini 3 Flash (Preview) 86.39% 83.97% 99.99% 100.00% 48.00% 100.00% 85.35%
54 Nemotron 3 Nano 86.00% 86.70% 92.81% 100.00% 50.50% 100.00% 77.73%
55 Mistral Small 4 (Reasoning) 85.61% 56.00% 96.06% 100.00% 76.00% 100.00% 82.39%
56 Mistral Large 3 84.91% 52.60% 99.95% 100.00% 72.00% 100.00% 85.43%
57 Stealth: Hunter Alpha 84.63% 55.55% 87.94% 96.67% 83.00% 100.00% 87.34%
58 ByteDance Seed 1.6 Flash 84.16% 48.55% 82.27% 100.00% 90.00% 100.00% 73.27%
59 Grok 4.1 Fast 84.12% 60.51% 95.61% 100.00% 64.50% 100.00% 89.55%
60 Claude Sonnet 4 84.02% 53.92% 84.17% 100.00% 82.00% 100.00% 88.72%
61 DeepSeek-V2 Chat 83.82% 41.72% 97.87% 100.00% 79.50% 100.00% 84.83%
62 Claude Sonnet 4.5 83.78% 58.93% 91.98% 100.00% 68.00% 100.00% 88.03%
63 Qwen3 235B A22B Instruct 2507 83.15% 40.15% 80.58% 100.00% 95.00% 100.00% 80.10%
64 GPT-4o, May 13th (temp=0) 83.13% 65.85% 71.82% 100.00% 78.00% 100.00% 85.36%
65 Claude 3.5 Haiku 82.57% 46.03% 76.84% 100.00% 90.00% 100.00% 83.73%
66 GPT-4o, Aug. 6th (temp=1) 82.44% 77.45% 74.24% 100.00% 60.50% 100.00% 82.62%
67 Stealth: Healer Alpha 82.30% 61.71% 90.46% 93.33% 71.00% 95.00% 85.93%
68 GPT-4.1 Mini 82.30% 66.59% 79.41% 100.00% 65.50% 100.00% 83.20%
69 Gemini 2.5 Flash (Reasoning) 82.25% 46.02% 85.21% 100.00% 90.00% 90.00% 86.51%
70 GPT-4o Mini (temp=1) 82.16% 78.18% 79.62% 100.00% 53.00% 100.00% 79.08%
71 Grok 4.20 (Beta) 82.15% 59.16% 99.42% 96.67% 60.50% 95.00% 83.85%
72 GPT-4o, Aug. 6th (temp=0) 82.11% 77.04% 74.00% 100.00% 59.50% 100.00% 82.45%
73 GPT-5.4 81.95% 64.23% 100.00% 100.00% 45.50% 100.00% 84.32%
74 DeepSeek V3 (2024-12-26) 81.87% 58.30% 88.56% 100.00% 62.50% 100.00% 83.68%
75 Qwen 3 32B 81.66% 31.96% 95.82% 100.00% 80.50% 100.00% 82.21%
76 DeepSeek V3.2 81.58% 45.81% 92.09% 100.00% 70.00% 100.00% 82.25%
77 GPT-4o Mini (temp=0) 81.43% 82.65% 79.49% 100.00% 45.00% 100.00% 78.29%
78 Llama 3.1 70B 81.03% 56.62% 80.53% 100.00% 68.00% 100.00% 78.40%
79 GPT-4o, May 13th (temp=1) 80.69% 65.90% 74.04% 100.00% 63.50% 100.00% 83.80%
80 DeepSeek V3 (2025-03-24) 80.62% 58.32% 97.96% 93.33% 53.50% 100.00% 81.99%
81 Gemini 2.5 Flash Lite 80.14% 53.37% 73.32% 100.00% 74.00% 100.00% 81.08%
82 Mistral Medium 3.1 80.13% 64.40% 97.23% 100.00% 39.00% 100.00% 77.83%
83 Writer: Palmyra X5 79.71% 35.61% 79.93% 100.00% 83.00% 100.00% 79.57%
84 GPT-5.4 Mini 79.37% 66.13% 98.23% 100.00% 32.50% 100.00% 82.43%
85 Gemma 3 12B 79.28% 46.84% 80.06% 100.00% 69.50% 100.00% 78.41%
86 Z.AI GLM 4.5 79.19% 38.11% 74.82% 100.00% 83.00% 100.00% 86.27%
87 Ministral 3 14B 79.03% 47.57% 92.10% 100.00% 55.50% 100.00% 72.54%
88 GPT-5.4 Nano 78.57% 47.90% 93.46% 100.00% 51.50% 100.00% 74.40%
89 Mistral Small 4 78.28% 46.07% 85.82% 100.00% 59.50% 100.00% 76.46%
90 Gemma 3 27B 76.82% 45.66% 95.44% 100.00% 43.00% 100.00% 77.85%
91 Grok 4 Fast 76.76% 61.39% 89.92% 100.00% 32.50% 100.00% 86.15%
92 Claude 3.5 Sonnet 76.75% 53.50% 93.24% 100.00% 37.00% 100.00% 84.24%
93 DeepSeek V3.1 76.65% 47.10% 83.32% 93.33% 59.50% 100.00% 82.39%
94 Qwen 2.5 72B 76.43% 51.43% 66.70% 100.00% 64.00% 100.00% 75.46%
95 Mistral Small Creative 76.28% 33.57% 96.83% 100.00% 51.00% 100.00% 73.27%
96 Llama 3.1 8B 74.82% 56.94% 85.64% 90.00% 46.50% 95.00% 63.37%
97 Ministral 3 8B 74.43% 38.01% 74.12% 100.00% 60.00% 100.00% 71.76%
98 Mistral Small 3.2 24B 73.17% 42.38% 99.97% 100.00% 23.50% 100.00% 78.60%
99 Mistral Large 73.04% 31.90% 63.80% 100.00% 69.50% 100.00% 80.15%
100 Claude Haiku 4.5 72.48% 51.60% 70.28% 70.00% 70.50% 100.00% 85.14%
101 Ministral 3 3B 72.38% 37.04% 74.85% 100.00% 50.00% 100.00% 67.22%
102 LFM2 24B 69.48% 44.98% 63.93% 100.00% 38.50% 100.00% 58.77%
103 Mistral Large 2 69.19% 21.18% 58.78% 100.00% 66.00% 100.00% 82.41%
104 Hermes 3 405B 69.02% 51.42% 79.99% 86.67% 37.00% 90.00% 82.86%
105 Claude 3 Haiku 68.47% 39.32% 61.88% 96.67% 44.50% 100.00% 71.19%
106 GPT-4.1 Nano 68.45% 64.57% 75.17% 100.00% 52.50% 50.00% 71.94%
107 WizardLM 2 8x22b 67.14% 20.73% 71.48% 90.00% 53.50% 100.00% 71.07%
108 Claude 3.7 Sonnet 62.54% 66.00% 66.35% 33.33% 47.00% 100.00% 83.39%
109 Gemini 2.5 Flash 61.45% 38.47% 64.79% 100.00% 54.00% 50.00% 80.60%
110 Hermes 3 70B 61.15% 28.43% 58.14% 66.67% 52.50% 100.00% 72.57%
111 Arcee AI: Trinity Large (Preview) 60.74% 43.69% 57.16% 73.33% 44.50% 85.00% 73.33%
112 Gemma 3 4B 60.30% 22.12% 95.04% 53.33% 31.00% 100.00% 68.57%
113 Arcee AI: Trinity Mini 59.94% 26.25% 43.97% 80.00% 49.50% 100.00% 70.90%
114 Cohere Command R+ (Aug. 2024) 59.51% 35.13% 60.43% 90.00% 42.00% 70.00% 69.03%
115 Mistral NeMO 51.55% 15.70% 47.57% 40.00% 54.50% 100.00% 65.04%
116 Ministral 3B 49.17% 17.87% 62.17% 33.33% 47.50% 85.00% 61.29%
117 Rocinante 12B 48.47% 14.07% 53.12% 36.67% 43.50% 95.00% 54.55%
118 Ministral 8B 46.82% 21.93% 60.35% 33.33% 33.50% 85.00% 64.87%