Attention

Subcategory of Reasoning. 155 models scored.

Model Leaderboard

All models ranked by their Attention subcategory score.

#	Model	Attention	Reasoning	Overall
1	Claude Opus 4.5	98.41%	93.93%	89.69%
2	Claude Opus 4.6 (Reasoning)	98.10%	93.77%	95.02%
3	Claude Opus 4.6	97.78%	93.33%	92.35%
4	Claude Opus 4.8 (Reasoning)	97.61%	93.53%	92.22%
5	Gemini 3.5 Flash (Reasoning)	97.41%	98.45%	94.08%
6	Claude Opus 4.8 (Reasoning, Low)	97.30%	93.37%	92.14%
7	Z.AI GLM 5.1	97.16%	96.83%	94.37%
8	Grok 4.20 (Reasoning)	97.12%	79.11%	91.39%
9	Grok 4	97.02%	96.01%	88.12%
10	Qwen3.6 Max Preview	97.02%	96.51%	94.54%
11	Grok 4.20 (Beta, Reasoning)	96.94%	82.64%	91.49%
12	Gemini 2.5 Pro	96.82%	96.91%	88.53%
13	Qwen3.7 Max	96.68%	95.84%	95.75%
14	Gemini 3 Flash (Preview, Reasoning)	96.60%	98.05%	90.50%
15	Z.AI GLM 5 Turbo	96.34%	95.67%	94.27%
16	GPT-5	96.33%	95.67%	91.93%
17	Claude Sonnet 4.6 (Reasoning)	96.08%	92.76%	93.66%
18	Grok 4.1 Fast	96.05%	93.58%	89.55%
19	Gemini 3.1 Pro (Preview)	96.03%	96.01%	94.37%
20	Grok 4.3 (Reasoning)	95.92%	95.46%	93.60%
21	MiniMax M3	95.91%	91.84%	90.88%
22	Gemma 4 31B (Reasoning)	95.88%	97.19%	91.71%
23	Claude Opus 4.7 (Reasoning)	95.87%	95.44%	93.23%
24	Z.AI GLM 5	95.78%	95.89%	91.23%
25	DeepSeek V4 Flash (Reasoning)	95.71%	95.08%	89.01%
26	Claude Sonnet 4.5	95.56%	92.50%	88.03%
27	Qwen 3.5 Plus (2026-04-20)	95.53%	94.99%	91.51%
28	Aion 2.0	95.49%	94.13%	89.21%
29	Gemini 3 Pro (Preview)	95.47%	95.24%	88.79%
30	MoonshotAI: Kimi K2.6	95.44%	95.22%	92.31%
31	Gemini 3.5 Flash (Reasoning, Minimal)	95.40%	94.12%	86.47%
32	GPT-5.5	95.33%	95.16%	89.09%
33	GPT-5.1	95.29%	95.14%	92.54%
34	Qwen 3.5 397B A17B	95.13%	95.06%	91.73%
35	o4 Mini High	95.05%	95.02%	90.29%
36	Gemma 4 31B	95.04%	96.52%	86.91%
37	GPT-5.5 (Reasoning, Low)	95.03%	95.01%	92.59%
38	Z.AI GLM 4.7	94.98%	94.99%	88.69%
39	Qwen 3.5 122B	94.87%	94.93%	91.53%
40	MoonshotAI: Kimi K2.5	94.83%	95.41%	91.04%
41	Grok 4 Fast	94.78%	94.89%	86.15%
42	GPT-5.5 (Reasoning)	94.78%	94.89%	92.98%
43	DeepSeek V4 Pro (Reasoning)	94.77%	94.61%	90.10%
44	Qwen 3.5 35B	94.75%	94.88%	88.00%
45	Xiaomi MIMO v2.5 Pro	94.70%	92.07%	87.36%
46	Gemini 3 Flash (Preview)	94.58%	94.79%	85.35%
47	GPT-5.4 (Reasoning)	94.55%	94.78%	93.24%
48	Claude Sonnet 4	94.52%	94.48%	88.72%
49	ByteDance Seed 2.0 Lite	94.49%	95.50%	84.80%
50	Qwen 3.5 Flash	94.31%	94.66%	86.38%
51	Xiaomi MIMO v2.5	94.31%	92.43%	85.05%
52	GPT-5.4 Mini (Reasoning)	94.13%	94.56%	90.65%
53	Claude Opus 4.7	93.95%	95.72%	89.93%
54	ByteDance Seed 2.0 Mini	93.68%	92.40%	86.91%
55	GPT-5.4 (Reasoning, Low)	93.67%	94.34%	91.41%
56	GPT-5.2	93.58%	94.54%	90.26%
57	Qwen 3.6 Flash	93.41%	94.20%	90.65%
58	o4 Mini	93.40%	94.45%	88.35%
59	Stealth: Hunter Alpha	93.39%	91.67%	87.34%
60	Stealth: Healer Alpha	93.35%	91.67%	85.93%
61	Claude Opus 4	93.24%	92.59%	87.69%
62	GPT-5 Mini	93.22%	94.36%	92.62%
63	Qwen 3.6 35B	93.16%	94.08%	89.05%
64	Claude Sonnet 4.6	93.07%	88.48%	91.15%
65	ByteDance Seed 1.6	92.98%	91.49%	90.70%
66	Qwen 3.5 Plus (2026-02-15)	92.96%	93.45%	85.96%
67	GPT-5.4	92.85%	93.92%	84.32%
68	Z.AI GLM 4.6	92.74%	95.12%	89.11%
69	Qwen 3.6 27B	92.21%	93.08%	89.72%
70	Gemini 2.5 Flash (Reasoning)	91.73%	93.81%	86.51%
71	MiniMax M2.7	91.61%	93.28%	89.10%
72	Gemma 4 26B (Reasoning)	91.42%	95.21%	91.49%
73	Gemini 2.5 Flash Lite (Reasoning)	91.22%	93.86%	85.75%
74	Nemotron 3 Super	91.22%	93.11%	84.56%
75	Claude 3.5 Sonnet	91.16%	90.30%	84.24%
76	Qwen 3.5 9B	90.87%	92.93%	86.05%
77	Qwen 3.5 27B	90.46%	92.73%	90.85%
78	Claude 3.7 Sonnet	90.43%	89.94%	83.39%
79	Gemini 2.5 Flash	90.21%	92.60%	80.60%
80	DeepSeek V4 Pro	89.98%	87.07%	82.63%
81	Gemini 3.1 Flash Lite (Preview)	89.85%	92.15%	85.87%
82	GPT-OSS 120B	89.84%	92.42%	86.44%
83	Z.AI GLM 4.5	89.84%	91.03%	86.27%
84	GPT-5.4 Mini (Reasoning, Low)	89.57%	92.28%	85.75%
85	DeepSeek V4 Flash	89.38%	79.16%	82.02%
86	MiniMax M2.5	89.33%	92.42%	88.71%
87	Mistral Medium 3.1	89.19%	89.32%	77.83%
88	Gemini 3.1 Flash Lite	89.19%	92.10%	85.75%
89	Inception Mercury 2	89.05%	92.03%	83.85%
90	Gemini 3.1 Flash Lite (Reasoning)	89.00%	91.72%	86.41%
91	DeepSeek V3.2	88.87%	89.46%	82.25%
92	Mistral Large 2	88.62%	88.20%	82.41%
93	Mistral Large 3	88.45%	88.95%	85.43%
94	Grok 4.20	88.11%	84.81%	81.70%
95	DeepSeek-V2 Chat	87.95%	88.70%	84.83%
96	GPT-4o, May 13th (temp=0)	87.71%	88.58%	85.36%
97	Grok 4.20 (Beta)	87.37%	87.05%	83.85%
98	DeepSeek V3 (2025-03-24)	87.29%	88.45%	81.99%
99	DeepSeek V3 (2024-12-26)	87.02%	88.71%	83.68%
100	Z.AI GLM 4.5 Air	86.99%	87.91%	83.12%
101	GPT-4.1	86.98%	88.46%	88.68%
102	Mistral Large	86.57%	76.31%	80.15%
103	Claude Haiku 4.5	86.08%	87.76%	85.14%
104	GPT-4o, Aug. 6th (temp=0)	85.73%	87.59%	82.45%
105	Stealth: Aurora Alpha	85.21%	90.11%	83.79%
106	Mistral Small 4 (Reasoning)	85.12%	87.78%	82.39%
107	Nemotron 3 Nano	84.82%	89.91%	77.73%
108	GPT-4o, Aug. 6th (temp=1)	84.37%	86.91%	82.62%
109	GPT-5 Nano	84.22%	89.61%	82.60%
110	Z.AI GLM 4.7 Flash	83.99%	89.50%	84.82%
111	Qwen 3 32B	83.81%	86.35%	82.21%
112	Writer: Palmyra X5	83.69%	86.57%	79.57%
113	Grok 4.3	83.35%	84.62%	78.66%
114	Qwen3 235B A22B Instruct 2507	83.25%	85.82%	80.10%
115	GPT-5.4 Nano (Reasoning)	83.07%	88.48%	81.36%
116	Mistral Small Creative	83.04%	87.99%	73.27%
117	DeepSeek V3.1	82.96%	83.95%	82.39%
118	Hermes 3 405B	82.88%	85.58%	82.86%
119	GPT-4o, May 13th (temp=1)	82.52%	85.98%	83.80%
120	GPT-4.1 Mini	82.22%	85.83%	83.20%
121	ByteDance Seed 1.6 Flash	81.38%	86.52%	73.27%
122	GPT-5.4 Mini	80.08%	88.04%	82.43%
123	Mistral Small 3.2 24B	79.53%	81.71%	78.58%
124	Gemma 4 26B	79.04%	88.02%	85.84%
125	Gemma 3 27B	78.48%	86.74%	77.85%
126	Qwen 2.5 72B	77.42%	83.43%	75.46%
127	Llama 3.1 Nemotron 70B	77.15%	82.19%	74.70%
128	Ministral 3 14B	77.03%	83.24%	72.54%
129	Inception Mercury	76.92%	85.96%	79.50%
130	Gemini 2.5 Flash Lite	76.60%	85.80%	81.08%
131	Ministral 3 8B	74.95%	71.64%	71.76%
132	Llama 3.1 70B	73.63%	79.31%	78.40%
133	Arcee AI: Trinity Mini	71.60%	76.94%	70.90%
134	Ministral 8B	71.56%	73.78%	64.87%
135	Arcee AI: Trinity Large (Preview)	70.59%	77.24%	73.33%
136	Cydonia 24B V4.1	69.33%	76.47%	75.09%
137	Hermes 3 70B	68.76%	79.08%	72.57%
138	GPT-4o Mini (temp=0)	68.07%	81.26%	78.29%
139	Mistral Small 4	67.95%	78.72%	76.46%
140	GPT-4o Mini (temp=1)	67.12%	80.28%	79.08%
141	Claude 3 Haiku	66.98%	77.94%	71.19%
142	Ministral 3B	66.56%	69.70%	61.29%
143	Cohere Command R+ (Aug. 2024)	66.37%	65.10%	69.03%
144	Ministral 3 3B	65.98%	71.88%	67.22%
145	WizardLM 2 8x22b	64.77%	67.36%	71.06%
146	Gemma 3 12B	63.84%	79.42%	78.41%
147	GPT-5.4 Nano (Reasoning, Low)	62.85%	78.93%	79.48%
148	GPT-5.4 Nano	58.27%	75.66%	74.40%
149	Skyfall 36B V2	53.52%	62.23%	65.76%
150	Gemma 3 4B	52.28%	73.64%	68.57%
151	Llama 3.1 8B	51.46%	69.12%	63.35%
152	GPT-4.1 Nano	46.04%	70.24%	71.94%
153	Rocinante 12B	33.73%	54.31%	54.54%
154	LFM2 24B	30.88%	54.88%	58.77%
155	Mistral NeMO	27.84%	57.59%	65.04%