Comprehension

Subcategory of Language. 155 models scored.

Model Leaderboard

All models ranked by their Comprehension subcategory score.

#	Model	Comprehension	Language	Overall
1	Qwen3.7 Max	100.00%	97.05%	95.75%
2	Claude Opus 4.6 (Reasoning)	100.00%	96.12%	95.02%
3	Qwen3.6 Max Preview	100.00%	100.00%	94.54%
4	Z.AI GLM 5 Turbo	100.00%	99.90%	94.27%
5	Claude Sonnet 4.6 (Reasoning)	100.00%	97.58%	93.66%
6	Claude Opus 4.7 (Reasoning)	100.00%	98.77%	93.23%
7	GPT-5.5 (Reasoning)	100.00%	99.69%	92.98%
8	GPT-5.5 (Reasoning, Low)	100.00%	99.24%	92.59%
9	Claude Opus 4.6	100.00%	96.13%	92.35%
10	MoonshotAI: Kimi K2.6	100.00%	96.77%	92.31%
11	Qwen 3.5 397B A17B	100.00%	95.01%	91.73%
12	Qwen 3.5 122B	100.00%	95.01%	91.53%
13	Qwen 3.5 Plus (2026-04-20)	100.00%	97.14%	91.51%
14	Grok 4.20 (Beta, Reasoning)	100.00%	99.08%	91.49%
15	Claude Sonnet 4.6	100.00%	100.00%	91.15%
16	MoonshotAI: Kimi K2.5	100.00%	97.10%	91.04%
17	MiniMax M3	100.00%	94.71%	90.88%
18	Qwen 3.5 27B	100.00%	95.53%	90.85%
19	ByteDance Seed 1.6	100.00%	95.63%	90.70%
20	GPT-5.4 Mini (Reasoning)	100.00%	98.12%	90.65%
21	Claude Opus 4.5	100.00%	99.66%	89.69%
22	Aion 2.0	100.00%	96.17%	89.21%
23	Z.AI GLM 4.6	100.00%	96.60%	89.11%
24	MiniMax M2.7	100.00%	84.80%	89.10%
25	DeepSeek V4 Flash (Reasoning)	100.00%	94.76%	89.01%
26	Qwen 3.5 35B	100.00%	91.95%	88.00%
27	Claude Opus 4	100.00%	93.01%	87.69%
28	Qwen 3.5 Plus (2026-02-15)	100.00%	95.10%	85.96%
29	Mistral Large 3	100.00%	92.02%	85.43%
30	GPT-4o, May 13th (temp=0)	100.00%	98.72%	85.36%
31	DeepSeek-V2 Chat	100.00%	100.00%	84.83%
32	ByteDance Seed 2.0 Lite	100.00%	96.80%	84.80%
33	DeepSeek V3 (2024-12-26)	100.00%	87.88%	83.68%
34	Hermes 3 405B	100.00%	99.57%	82.86%
35	Mistral Large 2	100.00%	85.22%	82.41%
36	DeepSeek V3 (2025-03-24)	100.00%	86.42%	81.99%
37	Ministral 3 3B	100.00%	68.10%	67.22%
38	Gemini 3.1 Pro (Preview)	95.00%	94.90%	94.37%
39	Gemini 3.5 Flash (Reasoning)	95.00%	94.41%	94.08%
40	Grok 4.3 (Reasoning)	95.00%	97.50%	93.60%
41	GPT-5.4 (Reasoning)	95.00%	94.90%	93.24%
42	GPT-5 Mini	95.00%	96.49%	92.62%
43	Claude Opus 4.8 (Reasoning)	95.00%	96.38%	92.22%
44	Claude Opus 4.8 (Reasoning, Low)	95.00%	96.31%	92.14%
45	Gemma 4 26B (Reasoning)	95.00%	95.20%	91.49%
46	Grok 4.20 (Reasoning)	95.00%	96.61%	91.39%
47	Z.AI GLM 5	95.00%	92.06%	91.23%
48	Gemini 3 Flash (Preview, Reasoning)	95.00%	94.93%	90.50%
49	Gemini 3 Pro (Preview)	95.00%	89.64%	88.79%
50	MiniMax M2.5	95.00%	96.05%	88.71%
51	Claude Sonnet 4.5	95.00%	92.39%	88.03%
52	Stealth: Hunter Alpha	95.00%	93.35%	87.34%
53	Gemini 3.5 Flash (Reasoning, Minimal)	95.00%	97.50%	86.47%
54	GPT-OSS 120B	95.00%	97.18%	86.44%
55	Gemini 3.1 Flash Lite (Reasoning)	95.00%	96.70%	86.41%
56	Qwen 3.5 Flash	95.00%	91.94%	86.38%
57	Z.AI GLM 4.5	95.00%	97.33%	86.27%
58	Gemini 3.1 Flash Lite (Preview)	95.00%	94.98%	85.87%
59	Claude Haiku 4.5	95.00%	91.84%	85.14%
60	Z.AI GLM 4.7 Flash	95.00%	87.67%	84.82%
61	GPT-4o, May 13th (temp=1)	95.00%	92.52%	83.80%
62	Z.AI GLM 4.5 Air	95.00%	95.05%	83.12%
63	DeepSeek V3.1	95.00%	96.87%	82.39%
64	Cydonia 24B V4.1	95.00%	72.49%	75.09%
65	WizardLM 2 8x22b	95.00%	78.05%	71.06%
66	Mistral NeMO	95.00%	80.80%	65.04%
67	GPT-5.1	90.00%	93.64%	92.54%
68	Qwen 3.6 27B	90.00%	89.01%	89.72%
69	Grok 4.1 Fast	90.00%	88.76%	89.55%
70	GPT-5.5	90.00%	94.15%	89.09%
71	Qwen 3.6 35B	90.00%	93.56%	89.05%
72	Claude Sonnet 4	90.00%	91.31%	88.72%
73	GPT-4.1	90.00%	93.91%	88.68%
74	Gemini 2.5 Pro	90.00%	92.57%	88.53%
75	ByteDance Seed 2.0 Mini	90.00%	90.12%	86.91%
76	Qwen 3.5 9B	90.00%	88.18%	86.05%
77	Stealth: Healer Alpha	90.00%	88.45%	85.93%
78	Gemma 4 26B	90.00%	92.02%	85.84%
79	Gemini 3 Flash (Preview)	90.00%	95.00%	85.35%
80	Claude 3.7 Sonnet	90.00%	92.95%	83.39%
81	Qwen 3 32B	90.00%	84.61%	82.21%
82	DeepSeek V4 Flash	90.00%	88.50%	82.02%
83	Mistral Large	90.00%	88.64%	80.15%
84	Z.AI GLM 5.1	85.00%	91.57%	94.37%
85	GPT-5	85.00%	91.50%	91.93%
86	GPT-5.2	85.00%	91.19%	90.26%
87	Xiaomi MIMO v2.5 Pro	85.00%	87.69%	87.36%
88	Gemini 3.1 Flash Lite	85.00%	90.90%	85.75%
89	Grok 4.20 (Beta)	85.00%	91.17%	83.85%
90	GPT-5.4 (Reasoning, Low)	85.00%	90.79%	91.41%
91	Qwen 3.6 Flash	85.00%	89.33%	90.65%
92	DeepSeek V4 Pro (Reasoning)	85.00%	88.51%	90.10%
93	Claude Opus 4.7	85.00%	92.32%	89.93%
94	Grok 4	85.00%	90.61%	88.12%
95	Grok 4 Fast	85.00%	84.61%	86.15%
96	GPT-5.4 Mini (Reasoning, Low)	85.00%	92.45%	85.75%
97	Stealth: Aurora Alpha	85.00%	92.50%	83.79%
98	Z.AI GLM 4.7	80.00%	85.46%	88.69%
99	GPT-4.1 Mini	80.00%	89.64%	83.20%
100	GPT-5.4 Mini	80.00%	88.75%	82.43%
101	DeepSeek V3.2	80.00%	85.01%	82.25%
102	Gemini 2.5 Flash Lite	80.00%	82.75%	81.08%
103	Nemotron 3 Nano	80.00%	87.63%	77.73%
104	Arcee AI: Trinity Large (Preview)	80.00%	78.38%	73.33%
105	Skyfall 36B V2	80.00%	73.94%	65.76%
106	Gemma 4 31B (Reasoning)	75.00%	83.82%	91.71%
107	Gemini 2.5 Flash (Reasoning)	75.00%	86.06%	86.51%
108	Claude 3.5 Sonnet	75.00%	85.62%	84.24%
109	Inception Mercury 2	75.00%	87.32%	83.85%
110	Gemini 2.5 Flash	75.00%	86.23%	80.60%
111	Qwen3 235B A22B Instruct 2507	75.00%	60.83%	80.10%
112	Grok 4.3	75.00%	84.74%	78.66%
113	Mistral Small 3.2 24B	75.00%	72.77%	78.58%
114	Llama 3.1 70B	75.00%	80.18%	78.40%
115	Hermes 3 70B	75.00%	81.66%	72.57%
116	Rocinante 12B	75.00%	63.45%	54.54%
117	DeepSeek V4 Pro	70.00%	72.80%	82.63%
118	GPT-5.4 Nano (Reasoning)	70.00%	83.99%	81.36%
119	Writer: Palmyra X5	70.00%	56.58%	79.57%
120	GPT-5.4 Nano (Reasoning, Low)	70.00%	81.87%	79.48%
121	Gemma 3 12B	70.00%	80.10%	78.41%
122	Qwen 2.5 72B	70.00%	68.95%	75.46%
123	GPT-5.4 Nano	70.00%	80.82%	74.40%
124	Gemma 3 4B	70.00%	72.28%	68.57%
125	Gemini 2.5 Flash Lite (Reasoning)	65.00%	74.36%	85.75%
126	Nemotron 3 Super	65.00%	81.41%	84.56%
127	GPT-5.4	65.00%	81.49%	84.32%
128	GPT-4o, Aug. 6th (temp=1)	65.00%	82.21%	82.62%
129	Grok 4.20	65.00%	78.86%	81.70%
130	Inception Mercury	65.00%	80.37%	79.50%
131	Gemma 3 27B	65.00%	77.21%	77.85%
132	GPT-4.1 Nano	65.00%	78.95%	71.94%
133	LFM2 24B	65.00%	64.64%	58.77%
134	Xiaomi MIMO v2.5	65.00%	76.33%	85.05%
135	Claude 3 Haiku	65.00%	72.76%	71.19%
136	Llama 3.1 Nemotron 70B	60.00%	46.80%	74.70%
137	Arcee AI: Trinity Mini	60.00%	70.59%	70.90%
138	o4 Mini High	60.00%	79.76%	90.29%
139	o4 Mini	60.00%	80.00%	88.35%
140	Cohere Command R+ (Aug. 2024)	60.00%	66.58%	69.03%
141	GPT-5 Nano	55.00%	77.18%	82.60%
142	GPT-4o Mini (temp=1)	55.00%	77.50%	79.08%
143	Mistral Small 4	55.00%	51.96%	76.46%
144	Ministral 8B	55.00%	53.91%	64.87%
145	Llama 3.1 8B	55.00%	64.06%	63.35%
146	Gemma 4 31B	50.00%	75.00%	86.91%
147	GPT-4o, Aug. 6th (temp=0)	50.00%	75.00%	82.45%
148	Mistral Small 4 (Reasoning)	50.00%	60.53%	82.39%
149	GPT-4o Mini (temp=0)	50.00%	75.00%	78.29%
150	Mistral Medium 3.1	50.00%	49.50%	77.83%
151	Mistral Small Creative	50.00%	41.85%	73.27%
152	Ministral 3 14B	50.00%	30.00%	72.54%
153	Ministral 3 8B	50.00%	48.96%	71.76%
154	ByteDance Seed 1.6 Flash	40.00%	61.23%	73.27%
155	Ministral 3B	25.00%	42.25%	61.29%