Utility

32 scenarios across 5 subcategories. 91 models scored.

Model Leaderboard

All models ranked by their Utility category score.

# Model Utility Word Counting Sentence Counting Paragraph Counting Structural Counting Data Extraction Overall
1 Gemini 3.1 Pro (Preview) 99.91% 99.56% 100.00% 100.00% 100.00% 100.00% 94.37%
2 Claude Opus 4.6 (Reasoning) 98.93% 94.65% 100.00% 100.00% 100.00% 100.00% 95.02%
3 o4 Mini High 98.67% 95.97% 99.90% 100.00% 97.50% 100.00% 90.29%
4 GPT-5 Mini 98.39% 97.93% 100.00% 100.00% 94.00% 100.00% 92.62%
5 Claude Sonnet 4.6 (Reasoning) 97.88% 90.42% 99.96% 100.00% 99.00% 100.00% 93.66%
6 Qwen 3.5 397B A17B 97.50% 87.52% 100.00% 100.00% 100.00% 100.00% 91.73%
7 Gemini 3 Flash (Preview, Reasoning) 97.20% 86.00% 100.00% 100.00% 100.00% 100.00% 90.50%
8 MoonshotAI: Kimi K2.5 96.63% 83.17% 100.00% 100.00% 100.00% 100.00% 91.04%
9 Qwen 3.5 35B 96.42% 82.08% 100.00% 100.00% 100.00% 100.00% 88.00%
10 Qwen 3.5 122B 96.36% 81.81% 100.00% 100.00% 100.00% 100.00% 91.53%
11 o4 Mini 96.31% 90.72% 98.35% 100.00% 92.50% 100.00% 88.35%
12 GPT-5.2 96.22% 84.32% 99.77% 100.00% 97.00% 100.00% 90.26%
13 Gemini 3 Pro (Preview) 96.14% 80.75% 99.97% 100.00% 100.00% 100.00% 88.79%
14 Qwen 3.5 Flash 96.11% 80.57% 100.00% 100.00% 100.00% 100.00% 86.38%
15 Qwen 3.5 27B 95.67% 85.82% 92.55% 100.00% 100.00% 100.00% 90.85%
16 GPT-5.1 95.33% 85.32% 99.80% 100.00% 91.50% 100.00% 92.54%
17 Z.AI GLM 4.7 94.31% 78.57% 99.99% 100.00% 93.00% 100.00% 88.69%
18 Z.AI GLM 5 94.11% 75.55% 100.00% 100.00% 95.00% 100.00% 91.23%
19 GPT-5 Nano 93.91% 86.54% 97.99% 100.00% 85.00% 100.00% 82.60%
20 GPT-5 93.53% 96.13% 100.00% 100.00% 76.50% 95.00% 91.93%
21 Stealth: Aurora Alpha 92.59% 88.58% 97.37% 100.00% 77.00% 100.00% 83.79%
22 Gemini 2.5 Pro 92.18% 66.47% 99.43% 100.00% 100.00% 95.00% 88.53%
23 Aion 2.0 90.91% 65.48% 89.10% 100.00% 100.00% 100.00% 89.21%
24 ByteDance Seed 1.6 90.83% 74.14% 100.00% 100.00% 80.00% 100.00% 90.70%
25 Claude Opus 4.6 90.72% 83.65% 99.93% 100.00% 70.00% 100.00% 92.35%
26 GPT-4.1 90.57% 81.33% 86.01% 100.00% 85.50% 100.00% 88.68%
27 Minimax M2.5 90.42% 86.03% 75.58% 100.00% 90.50% 100.00% 88.71%
28 Claude Opus 4.5 89.84% 80.83% 95.88% 100.00% 72.50% 100.00% 89.69%
29 Grok 4 89.67% 75.34% 88.00% 100.00% 85.00% 100.00% 88.12%
30 Gemini 2.5 Flash Lite (Reasoning) 89.63% 55.13% 96.54% 100.00% 96.50% 100.00% 85.75%
31 Z.AI GLM 4.7 Flash 88.98% 68.93% 100.00% 100.00% 81.00% 95.00% 84.82%
32 Claude Opus 4 88.81% 74.37% 80.19% 100.00% 89.50% 100.00% 87.69%
33 Z.AI GLM 4.6 88.58% 47.21% 97.70% 100.00% 98.00% 100.00% 89.11%
34 Claude Sonnet 4.6 88.52% 71.94% 94.66% 100.00% 76.00% 100.00% 91.15%
35 Llama 3.1 Nemotron 70B 88.31% 47.07% 99.99% 100.00% 94.50% 100.00% 74.70%
36 Qwen 3.5 Plus (2026-02-15) 86.65% 44.24% 100.00% 100.00% 89.00% 100.00% 85.96%
37 Gemini 3 Flash (Preview) 86.39% 83.97% 99.99% 100.00% 48.00% 100.00% 85.35%
38 Mistral Large 3 84.91% 52.60% 99.95% 100.00% 72.00% 100.00% 85.43%
39 ByteDance Seed 1.6 Flash 84.16% 48.55% 82.27% 100.00% 90.00% 100.00% 73.27%
40 Grok 4.1 Fast 84.12% 60.51% 95.61% 100.00% 64.50% 100.00% 89.55%
41 Claude Sonnet 4 84.02% 53.92% 84.17% 100.00% 82.00% 100.00% 88.72%
42 DeepSeek-V2 Chat 83.82% 41.72% 97.87% 100.00% 79.50% 100.00% 84.83%
43 Claude Sonnet 4.5 83.78% 58.93% 91.98% 100.00% 68.00% 100.00% 88.03%
44 GPT-4o, May 13th (temp=0) 83.13% 65.85% 71.82% 100.00% 78.00% 100.00% 85.36%
45 Claude 3.5 Haiku 82.57% 46.03% 76.84% 100.00% 90.00% 100.00% 83.73%
46 GPT-4o, Aug. 6th (temp=1) 82.44% 77.45% 74.24% 100.00% 60.50% 100.00% 82.62%
47 GPT-4.1 Mini 82.30% 66.59% 79.41% 100.00% 65.50% 100.00% 83.20%
48 Gemini 2.5 Flash (Reasoning) 82.25% 46.02% 85.21% 100.00% 90.00% 90.00% 86.51%
49 GPT-4o Mini (temp=1) 82.16% 78.18% 79.62% 100.00% 53.00% 100.00% 79.08%
50 GPT-4o, Aug. 6th (temp=0) 82.11% 77.04% 74.00% 100.00% 59.50% 100.00% 82.45%
51 DeepSeek V3 (2024-12-26) 81.87% 58.30% 88.56% 100.00% 62.50% 100.00% 83.68%
52 DeepSeek V3.2 81.58% 45.81% 92.09% 100.00% 70.00% 100.00% 82.25%
53 GPT-4o Mini (temp=0) 81.43% 82.65% 79.49% 100.00% 45.00% 100.00% 78.29%
54 Llama 3.1 70B 81.03% 56.62% 80.53% 100.00% 68.00% 100.00% 78.40%
55 GPT-4o, May 13th (temp=1) 80.69% 65.90% 74.04% 100.00% 63.50% 100.00% 83.80%
56 DeepSeek V3 (2025-03-24) 80.62% 58.32% 97.96% 93.33% 53.50% 100.00% 81.99%
57 Gemini 2.5 Flash Lite 80.14% 53.37% 73.32% 100.00% 74.00% 100.00% 81.08%
58 Mistral Medium 3.1 80.13% 64.40% 97.23% 100.00% 39.00% 100.00% 77.83%
59 Writer: Palmyra X5 79.71% 35.61% 79.93% 100.00% 83.00% 100.00% 79.57%
60 Gemma 3 12B 79.28% 46.84% 80.06% 100.00% 69.50% 100.00% 78.41%
61 Z.AI GLM 4.5 79.19% 38.11% 74.82% 100.00% 83.00% 100.00% 86.27%
62 Ministral 3 14B 79.03% 47.57% 92.10% 100.00% 55.50% 100.00% 72.54%
63 Gemma 3 27B 76.82% 45.66% 95.44% 100.00% 43.00% 100.00% 77.85%
64 Grok 4 Fast 76.76% 61.39% 89.92% 100.00% 32.50% 100.00% 86.15%
65 Claude 3.5 Sonnet 76.75% 53.50% 93.24% 100.00% 37.00% 100.00% 84.24%
66 DeepSeek V3.1 76.65% 47.10% 83.32% 93.33% 59.50% 100.00% 82.39%
67 Qwen 2.5 72B 76.43% 51.43% 66.70% 100.00% 64.00% 100.00% 75.46%
68 Mistral Small Creative 76.28% 33.57% 96.83% 100.00% 51.00% 100.00% 73.27%
69 Llama 3.1 8B 74.82% 56.94% 85.64% 90.00% 46.50% 95.00% 63.37%
70 Ministral 3 8B 74.43% 38.01% 74.12% 100.00% 60.00% 100.00% 71.76%
71 Mistral Small 3.2 24B 73.17% 42.38% 99.97% 100.00% 23.50% 100.00% 78.60%
72 Mistral Large 73.04% 31.90% 63.80% 100.00% 69.50% 100.00% 80.15%
73 Claude Haiku 4.5 72.48% 51.60% 70.28% 70.00% 70.50% 100.00% 85.14%
74 Ministral 3 3B 72.38% 37.04% 74.85% 100.00% 50.00% 100.00% 67.22%
75 LFM2 24B 69.48% 44.98% 63.93% 100.00% 38.50% 100.00% 58.77%
76 Mistral Large 2 69.19% 21.18% 58.78% 100.00% 66.00% 100.00% 82.41%
77 Hermes 3 405B 69.02% 51.42% 79.99% 86.67% 37.00% 90.00% 82.86%
78 Claude 3 Haiku 68.47% 39.32% 61.88% 96.67% 44.50% 100.00% 71.19%
79 GPT-4.1 Nano 68.45% 64.57% 75.17% 100.00% 52.50% 50.00% 71.94%
80 WizardLM 2 8x22b 67.14% 20.73% 71.48% 90.00% 53.50% 100.00% 71.07%
81 Claude 3.7 Sonnet 62.54% 66.00% 66.35% 33.33% 47.00% 100.00% 83.39%
82 Gemini 2.5 Flash 61.45% 38.47% 64.79% 100.00% 54.00% 50.00% 80.60%
83 Hermes 3 70B 61.15% 28.43% 58.14% 66.67% 52.50% 100.00% 72.57%
84 Arcee AI: Trinity Large (Preview) 60.74% 43.69% 57.16% 73.33% 44.50% 85.00% 73.33%
85 Gemma 3 4B 60.30% 22.12% 95.04% 53.33% 31.00% 100.00% 68.57%
86 Arcee AI: Trinity Mini 59.94% 26.25% 43.97% 80.00% 49.50% 100.00% 70.90%
87 Cohere Command R+ (Aug. 2024) 59.51% 35.13% 60.43% 90.00% 42.00% 70.00% 69.03%
88 Mistral NeMO 51.55% 15.70% 47.57% 40.00% 54.50% 100.00% 65.04%
89 Ministral 3B 49.17% 17.87% 62.17% 33.33% 47.50% 85.00% 61.29%
90 Rocinante 12B 48.47% 14.07% 53.12% 36.67% 43.50% 95.00% 54.55%
91 Ministral 8B 46.82% 21.93% 60.35% 33.33% 33.50% 85.00% 64.87%