Generation

Subcategory of Language. 155 models scored.

Model Leaderboard

All models ranked by their Generation subcategory score.

#	Model	Generation	Language	Overall
1	Qwen3.6 Max Preview	100.00%	100.00%	94.54%
2	Grok 4.3 (Reasoning)	100.00%	97.50%	93.60%
3	Claude Sonnet 4.6	100.00%	100.00%	91.15%
4	o4 Mini	100.00%	80.00%	88.35%
5	Gemma 4 31B	100.00%	75.00%	86.91%
6	Gemini 3.5 Flash (Reasoning, Minimal)	100.00%	97.50%	86.47%
7	Gemini 3 Flash (Preview)	100.00%	95.00%	85.35%
8	DeepSeek-V2 Chat	100.00%	100.00%	84.83%
9	Stealth: Aurora Alpha	100.00%	92.50%	83.79%
10	GPT-4o, Aug. 6th (temp=0)	100.00%	75.00%	82.45%
11	GPT-4o Mini (temp=1)	100.00%	77.50%	79.08%
12	GPT-4o Mini (temp=0)	100.00%	75.00%	78.29%
13	GPT-5.4 Mini (Reasoning, Low)	99.89%	92.45%	85.75%
14	Z.AI GLM 5 Turbo	99.80%	99.90%	94.27%
15	Z.AI GLM 4.5	99.67%	97.33%	86.27%
16	Claude Opus 4.7	99.65%	92.32%	89.93%
17	Inception Mercury 2	99.64%	87.32%	83.85%
18	o4 Mini High	99.51%	79.76%	90.29%
19	GPT-4o, Aug. 6th (temp=1)	99.43%	82.21%	82.62%
20	GPT-5.5 (Reasoning)	99.37%	99.69%	92.98%
21	GPT-OSS 120B	99.36%	97.18%	86.44%
22	GPT-5 Nano	99.35%	77.18%	82.60%
23	Claude Opus 4.5	99.31%	99.66%	89.69%
24	GPT-4.1 Mini	99.27%	89.64%	83.20%
25	Hermes 3 405B	99.14%	99.57%	82.86%
26	DeepSeek V3.1	98.74%	96.87%	82.39%
27	GPT-5.5 (Reasoning, Low)	98.49%	99.24%	92.59%
28	Gemini 3.1 Flash Lite (Reasoning)	98.40%	96.70%	86.41%
29	GPT-5.5	98.29%	94.15%	89.09%
30	Grok 4.20 (Reasoning)	98.22%	96.61%	91.39%
31	Grok 4.20 (Beta, Reasoning)	98.15%	99.08%	91.49%
32	Z.AI GLM 5.1	98.14%	91.57%	94.37%
33	GPT-5	98.01%	91.50%	91.93%
34	GPT-5.4 Nano (Reasoning)	97.99%	83.99%	81.36%
35	GPT-5.4	97.98%	81.49%	84.32%
36	GPT-5 Mini	97.98%	96.49%	92.62%
37	GPT-4.1	97.82%	93.91%	88.68%
38	Nemotron 3 Super	97.82%	81.41%	84.56%
39	Claude Opus 4.8 (Reasoning)	97.76%	96.38%	92.22%
40	Claude Opus 4.8 (Reasoning, Low)	97.63%	96.31%	92.14%
41	Claude Opus 4.7 (Reasoning)	97.53%	98.77%	93.23%
42	GPT-5.4 Mini	97.50%	88.75%	82.43%
43	Gemini 2.5 Flash	97.46%	86.23%	80.60%
44	GPT-4o, May 13th (temp=0)	97.44%	98.72%	85.36%
45	GPT-5.2	97.39%	91.19%	90.26%
46	Grok 4.20 (Beta)	97.34%	91.17%	83.85%
47	GPT-5.1	97.28%	93.64%	92.54%
48	Gemini 2.5 Flash (Reasoning)	97.13%	86.06%	86.51%
49	Qwen 3.6 35B	97.12%	93.56%	89.05%
50	MiniMax M2.5	97.10%	96.05%	88.71%
51	Gemini 3.1 Flash Lite	96.80%	90.90%	85.75%
52	GPT-5.4 (Reasoning, Low)	96.58%	90.79%	91.41%
53	GPT-5.4 Mini (Reasoning)	96.25%	98.12%	90.65%
54	Claude 3.5 Sonnet	96.24%	85.62%	84.24%
55	Grok 4	96.22%	90.61%	88.12%
56	Claude 3.7 Sonnet	95.89%	92.95%	83.39%
57	Inception Mercury	95.74%	80.37%	79.50%
58	Gemma 4 26B (Reasoning)	95.40%	95.20%	91.49%
59	Nemotron 3 Nano	95.26%	87.63%	77.73%
60	Claude Sonnet 4.6 (Reasoning)	95.15%	97.58%	93.66%
61	Gemini 2.5 Pro	95.14%	92.57%	88.53%
62	Z.AI GLM 4.5 Air	95.10%	95.05%	83.12%
63	Gemini 3.1 Flash Lite (Preview)	94.95%	94.98%	85.87%
64	Gemini 3 Flash (Preview, Reasoning)	94.87%	94.93%	90.50%
65	GPT-5.4 (Reasoning)	94.80%	94.90%	93.24%
66	Gemini 3.1 Pro (Preview)	94.80%	94.90%	94.37%
67	Grok 4.3	94.47%	84.74%	78.66%
68	Qwen 3.5 Plus (2026-04-20)	94.29%	97.14%	91.51%
69	MoonshotAI: Kimi K2.5	94.19%	97.10%	91.04%
70	Qwen3.7 Max	94.10%	97.05%	95.75%
71	Gemma 4 26B	94.03%	92.02%	85.84%
72	Gemini 3.5 Flash (Reasoning)	93.83%	94.41%	94.08%
73	GPT-5.4 Nano (Reasoning, Low)	93.74%	81.87%	79.48%
74	Qwen 3.6 Flash	93.66%	89.33%	90.65%
75	ByteDance Seed 2.0 Lite	93.60%	96.80%	84.80%
76	MoonshotAI: Kimi K2.6	93.53%	96.77%	92.31%
77	Z.AI GLM 4.6	93.20%	96.60%	89.11%
78	GPT-4.1 Nano	92.91%	78.95%	71.94%
79	Grok 4.20	92.72%	78.86%	81.70%
80	Gemma 4 31B (Reasoning)	92.64%	83.82%	91.71%
81	Claude Sonnet 4	92.61%	91.31%	88.72%
82	Aion 2.0	92.34%	96.17%	89.21%
83	Claude Opus 4.6	92.27%	96.13%	92.35%
84	Claude Opus 4.6 (Reasoning)	92.23%	96.12%	95.02%
85	DeepSeek V4 Pro (Reasoning)	92.02%	88.51%	90.10%
86	Stealth: Hunter Alpha	91.70%	93.35%	87.34%
87	GPT-5.4 Nano	91.64%	80.82%	74.40%
88	ByteDance Seed 1.6	91.26%	95.63%	90.70%
89	Qwen 3.5 27B	91.05%	95.53%	90.85%
90	Z.AI GLM 4.7	90.93%	85.46%	88.69%
91	Xiaomi MIMO v2.5 Pro	90.37%	87.69%	87.36%
92	ByteDance Seed 2.0 Mini	90.24%	90.12%	86.91%
93	Qwen 3.5 Plus (2026-02-15)	90.21%	95.10%	85.96%
94	Gemma 3 12B	90.19%	80.10%	78.41%
95	GPT-4o, May 13th (temp=1)	90.04%	92.52%	83.80%
96	Qwen 3.5 122B	90.03%	95.01%	91.53%
97	Qwen 3.5 397B A17B	90.03%	95.01%	91.73%
98	DeepSeek V3.2	90.03%	85.01%	82.25%
99	Claude Sonnet 4.5	89.77%	92.39%	88.03%
100	DeepSeek V4 Flash (Reasoning)	89.51%	94.76%	89.01%
101	MiniMax M3	89.43%	94.71%	90.88%
102	Gemma 3 27B	89.42%	77.21%	77.85%
103	Z.AI GLM 5	89.12%	92.06%	91.23%
104	Qwen 3.5 Flash	88.89%	91.94%	86.38%
105	Claude Haiku 4.5	88.67%	91.84%	85.14%
106	Hermes 3 70B	88.33%	81.66%	72.57%
107	Qwen 3.6 27B	88.01%	89.01%	89.72%
108	Xiaomi MIMO v2.5	87.66%	76.33%	85.05%
109	Grok 4.1 Fast	87.51%	88.76%	89.55%
110	Mistral Large	87.27%	88.64%	80.15%
111	DeepSeek V4 Flash	87.00%	88.50%	82.02%
112	Stealth: Healer Alpha	86.90%	88.45%	85.93%
113	Qwen 3.5 9B	86.36%	88.18%	86.05%
114	Claude Opus 4	86.02%	93.01%	87.69%
115	Gemini 2.5 Flash Lite	85.50%	82.75%	81.08%
116	Llama 3.1 70B	85.36%	80.18%	78.40%
117	Gemini 3 Pro (Preview)	84.29%	89.64%	88.79%
118	Grok 4 Fast	84.22%	84.61%	86.15%
119	Mistral Large 3	84.05%	92.02%	85.43%
120	Qwen 3.5 35B	83.90%	91.95%	88.00%
121	Gemini 2.5 Flash Lite (Reasoning)	83.71%	74.36%	85.75%
122	ByteDance Seed 1.6 Flash	82.45%	61.23%	73.27%
123	Arcee AI: Trinity Mini	81.18%	70.59%	70.90%
124	Claude 3 Haiku	80.53%	72.76%	71.19%
125	Z.AI GLM 4.7 Flash	80.35%	87.67%	84.82%
126	Qwen 3 32B	79.21%	84.61%	82.21%
127	Arcee AI: Trinity Large (Preview)	76.76%	78.38%	73.33%
128	DeepSeek V3 (2024-12-26)	75.76%	87.88%	83.68%
129	DeepSeek V4 Pro	75.59%	72.80%	82.63%
130	Gemma 3 4B	74.56%	72.28%	68.57%
131	Cohere Command R+ (Aug. 2024)	73.15%	66.58%	69.03%
132	Llama 3.1 8B	73.13%	64.06%	63.35%
133	DeepSeek V3 (2025-03-24)	72.83%	86.42%	81.99%
134	Mistral Small 4 (Reasoning)	71.06%	60.53%	82.39%
135	Mistral Small 3.2 24B	70.53%	72.77%	78.58%
136	Mistral Large 2	70.43%	85.22%	82.41%
137	MiniMax M2.7	69.59%	84.80%	89.10%
138	Qwen 2.5 72B	67.91%	68.95%	75.46%
139	Skyfall 36B V2	67.88%	73.94%	65.76%
140	Mistral NeMO	66.60%	80.80%	65.04%
141	LFM2 24B	64.28%	64.64%	58.77%
142	WizardLM 2 8x22b	61.10%	78.05%	71.06%
143	Ministral 3B	59.49%	42.25%	61.29%
144	Ministral 8B	52.81%	53.91%	64.87%
145	Rocinante 12B	51.90%	63.45%	54.54%
146	Cydonia 24B V4.1	49.98%	72.49%	75.09%
147	Mistral Medium 3.1	49.00%	49.50%	77.83%
148	Mistral Small 4	48.93%	51.96%	76.46%
149	Ministral 3 8B	47.92%	48.96%	71.76%
150	Qwen3 235B A22B Instruct 2507	46.67%	60.83%	80.10%
151	Writer: Palmyra X5	43.16%	56.58%	79.57%
152	Ministral 3 3B	36.20%	68.10%	67.22%
153	Mistral Small Creative	33.69%	41.85%	73.27%
154	Llama 3.1 Nemotron 70B	33.61%	46.80%	74.70%
155	Ministral 3 14B	10.00%	30.00%	72.54%