Prose Variety

Subcategory of Creative Writing. 155 models scored.

Model Leaderboard

All models ranked by their Prose Variety subcategory score.

#	Model	Prose Variety	Creative Writing	Overall
1	Writer: Palmyra X5	83.01%	83.95%	79.57%
2	Qwen3 235B A22B Instruct 2507	81.37%	84.81%	80.10%
3	GPT-5.4	81.26%	90.94%	84.32%
4	GPT-5.4 (Reasoning, Low)	80.88%	90.51%	91.41%
5	Skyfall 36B V2	80.34%	83.32%	65.76%
6	GPT-5.4 (Reasoning)	79.74%	91.17%	93.24%
7	Rocinante 12B	79.52%	81.94%	54.54%
8	Cydonia 24B V4.1	78.35%	74.19%	75.09%
9	Claude Sonnet 4.5	78.05%	84.19%	88.03%
10	Llama 3.1 8B	77.72%	76.54%	63.35%
11	Claude Opus 4	77.58%	83.79%	87.69%
12	GPT-5.5 (Reasoning, Low)	77.34%	90.24%	92.59%
13	Z.AI GLM 5	77.33%	83.63%	91.23%
14	Mistral Small 4 (Reasoning)	77.22%	81.67%	82.39%
15	Claude Sonnet 4	76.88%	79.21%	88.72%
16	GPT-5.5	76.57%	90.39%	89.09%
17	Claude Opus 4.5	76.11%	81.71%	89.69%
18	GPT-5.4 Mini	75.74%	88.10%	82.43%
19	GPT-5.5 (Reasoning)	75.69%	90.26%	92.98%
20	Z.AI GLM 5.1	75.67%	84.05%	94.37%
21	Mistral Small Creative	75.44%	80.29%	73.27%
22	Z.AI GLM 5 Turbo	75.38%	84.66%	94.27%
23	Mistral Small 4	75.37%	81.12%	76.46%
24	Claude Haiku 4.5	75.25%	78.96%	85.14%
25	Claude Opus 4.7	75.05%	84.74%	89.93%
26	Claude Opus 4.6 (Reasoning)	75.04%	84.55%	95.02%
27	GPT-5.4 Mini (Reasoning, Low)	74.97%	87.72%	85.75%
28	GPT-5.4 Mini (Reasoning)	74.97%	88.66%	90.65%
29	Ministral 3 14B	74.94%	79.11%	72.54%
30	Mistral Medium 3.1	74.30%	81.70%	77.83%
31	Grok 4.20 (Reasoning)	74.25%	86.25%	91.39%
32	Claude Opus 4.6	74.14%	83.59%	92.35%
33	Grok 4.20	74.09%	83.44%	81.70%
34	Claude Opus 4.8 (Reasoning, Low)	74.07%	85.86%	92.14%
35	DeepSeek V3 (2025-03-24)	73.86%	82.34%	81.99%
36	GPT-5.1	73.76%	87.20%	92.54%
37	Hermes 3 70B	73.70%	77.41%	72.57%
38	MiniMax M2.5	73.63%	81.21%	88.71%
39	Mistral Large	73.53%	82.02%	80.15%
40	DeepSeek V4 Pro (Reasoning)	73.38%	82.99%	90.10%
41	Grok 4.20 (Beta)	73.10%	82.80%	83.85%
42	DeepSeek V4 Pro	73.07%	83.70%	82.63%
43	GPT-4o, Aug. 6th (temp=1)	73.05%	75.50%	82.62%
44	Mistral Large 2	72.75%	81.86%	82.41%
45	Claude Opus 4.8 (Reasoning)	72.75%	85.25%	92.22%
46	Claude Opus 4.7 (Reasoning)	72.71%	84.73%	93.23%
47	Claude Sonnet 4.6	72.52%	83.31%	91.15%
48	Mistral Large 3	72.38%	81.21%	85.43%
49	MiniMax M2.7	72.16%	81.70%	89.10%
50	Claude Sonnet 4.6 (Reasoning)	71.90%	83.09%	93.66%
51	GPT-5.4 Nano	71.90%	80.50%	74.40%
52	Grok 4.1 Fast	71.82%	82.14%	89.55%
53	Qwen 3 32B	71.76%	81.30%	82.21%
54	Gemma 3 27B	71.72%	78.79%	77.85%
55	GPT-5.4 Nano (Reasoning, Low)	71.57%	80.93%	79.48%
56	Gemma 3 12B	71.52%	75.38%	78.41%
57	GPT-4o Mini (temp=1)	71.50%	74.37%	79.08%
58	Cohere Command R+ (Aug. 2024)	71.05%	77.70%	69.03%
59	Llama 3.1 Nemotron 70B	70.84%	71.71%	74.70%
60	MoonshotAI: Kimi K2.5	70.84%	81.35%	91.04%
61	Claude 3.7 Sonnet	70.66%	76.31%	83.39%
62	Ministral 3 8B	70.61%	77.26%	71.76%
63	DeepSeek V4 Flash (Reasoning)	70.46%	83.03%	89.01%
64	ByteDance Seed 1.6 Flash	70.46%	81.51%	73.27%
65	MiniMax M3	70.33%	84.57%	90.88%
66	Grok 4.20 (Beta, Reasoning)	70.03%	84.50%	91.49%
67	Hermes 3 405B	69.94%	80.92%	82.86%
68	LFM2 24B	69.90%	78.10%	58.77%
69	GPT-4o, May 13th (temp=1)	69.83%	75.88%	83.80%
70	DeepSeek V4 Flash	69.82%	83.42%	82.02%
71	GPT-4.1	69.77%	81.24%	88.68%
72	Ministral 8B	69.62%	76.87%	64.87%
73	GPT-5.4 Nano (Reasoning)	69.53%	80.97%	81.36%
74	Claude 3 Haiku	69.39%	74.53%	71.19%
75	Z.AI GLM 4.5	68.98%	76.56%	86.27%
76	Gemma 3 4B	68.92%	72.10%	68.57%
77	Stealth: Hunter Alpha	68.88%	79.18%	87.34%
78	Gemini 3.5 Flash (Reasoning, Minimal)	68.64%	78.55%	86.47%
79	Qwen 3.5 397B A17B	68.63%	86.93%	91.73%
80	Arcee AI: Trinity Large (Preview)	68.56%	75.26%	73.33%
81	Xiaomi MIMO v2.5 Pro	68.44%	81.08%	87.36%
82	Grok 4 Fast	68.43%	77.03%	86.15%
83	Claude 3.5 Sonnet	68.25%	78.69%	84.24%
84	MoonshotAI: Kimi K2.6	68.21%	85.47%	92.31%
85	GPT-4.1 Mini	68.20%	74.52%	83.20%
86	GPT-5.2	67.99%	80.36%	90.26%
87	Qwen3.6 Max Preview	67.96%	88.42%	94.54%
88	Z.AI GLM 4.5 Air	67.71%	74.61%	83.12%
89	Qwen 3.5 Plus (2026-02-15)	67.63%	77.07%	85.96%
90	Grok 4	67.50%	77.34%	88.12%
91	Ministral 3B	67.17%	75.49%	61.29%
92	Qwen 3.6 Flash	67.13%	86.02%	90.65%
93	GPT-4.1 Nano	67.03%	71.81%	71.94%
94	DeepSeek-V2 Chat	66.87%	77.20%	84.83%
95	Gemini 2.5 Flash (Reasoning)	66.66%	76.30%	86.51%
96	Gemini 3.5 Flash (Reasoning)	66.61%	79.87%	94.08%
97	Gemini 2.5 Flash Lite	66.56%	75.05%	81.08%
98	Xiaomi MIMO v2.5	66.42%	79.16%	85.05%
99	Stealth: Healer Alpha	66.33%	78.28%	85.93%
100	DeepSeek V3 (2024-12-26)	66.31%	77.88%	83.68%
101	DeepSeek V3.2	66.11%	79.95%	82.25%
102	Gemini 2.5 Flash Lite (Reasoning)	65.69%	71.64%	85.75%
103	Llama 3.1 70B	65.53%	72.78%	78.40%
104	Gemini 2.5 Flash	65.37%	77.57%	80.60%
105	WizardLM 2 8x22b	65.28%	79.06%	71.06%
106	Ministral 3 3B	65.17%	75.45%	67.22%
107	Qwen 3.6 35B	65.03%	85.97%	89.05%
108	Aion 2.0	64.90%	80.24%	89.21%
109	Qwen 3.5 Plus (2026-04-20)	64.62%	85.18%	91.51%
110	o4 Mini	63.95%	82.04%	88.35%
111	Z.AI GLM 4.6	63.85%	78.86%	89.11%
112	o4 Mini High	63.78%	82.72%	90.29%
113	Gemini 3 Pro (Preview)	63.57%	77.77%	88.79%
114	Gemini 3.1 Pro (Preview)	63.42%	85.44%	94.37%
115	Qwen 3.6 27B	63.42%	82.81%	89.72%
116	DeepSeek V3.1	62.92%	77.45%	82.39%
117	Gemini 2.5 Pro	62.91%	81.03%	88.53%
118	Grok 4.3	62.78%	84.51%	78.66%
119	GPT-5	62.58%	86.87%	91.93%
120	Gemma 4 31B	62.09%	75.59%	86.91%
121	Gemini 3 Flash (Preview, Reasoning)	61.94%	75.87%	90.50%
122	Qwen3.7 Max	61.91%	85.39%	95.75%
123	Z.AI GLM 4.7 Flash	61.69%	77.36%	84.82%
124	Z.AI GLM 4.7	61.49%	78.89%	88.69%
125	Gemini 3 Flash (Preview)	61.35%	75.04%	85.35%
126	GPT-5 Mini	60.91%	80.48%	92.62%
127	GPT-5 Nano	60.85%	67.04%	82.60%
128	Gemma 4 31B (Reasoning)	60.53%	78.13%	91.71%
129	Qwen 2.5 72B	60.17%	75.16%	75.46%
130	GPT-4o, May 13th (temp=0)	60.13%	74.89%	85.36%
131	Mistral NeMO	59.98%	76.72%	65.04%
132	Gemini 3.1 Flash Lite (Preview)	59.92%	75.78%	85.87%
133	Gemma 4 26B (Reasoning)	59.82%	76.38%	91.49%
134	Gemma 4 26B	59.55%	75.17%	85.84%
135	GPT-4o Mini (temp=0)	59.43%	73.10%	78.29%
136	Gemini 3.1 Flash Lite (Reasoning)	59.35%	76.31%	86.41%
137	ByteDance Seed 2.0 Mini	59.35%	80.11%	86.91%
138	Grok 4.3 (Reasoning)	58.99%	85.11%	93.60%
139	GPT-4o, Aug. 6th (temp=0)	58.95%	73.65%	82.45%
140	Qwen 3.5 Flash	58.83%	83.81%	86.38%
141	Gemini 3.1 Flash Lite	58.53%	76.01%	85.75%
142	ByteDance Seed 2.0 Lite	58.28%	82.35%	84.80%
143	Qwen 3.5 35B	58.12%	83.51%	88.00%
144	Nemotron 3 Super	57.81%	69.75%	84.56%
145	ByteDance Seed 1.6	57.70%	78.43%	90.70%
146	Arcee AI: Trinity Mini	56.69%	74.01%	70.90%
147	Qwen 3.5 122B	56.46%	83.02%	91.53%
148	Qwen 3.5 27B	54.98%	82.54%	90.85%
149	Qwen 3.5 9B	53.95%	84.35%	86.05%
150	Nemotron 3 Nano	52.17%	65.87%	77.73%
151	GPT-OSS 120B	50.96%	67.85%	86.44%
152	Inception Mercury 2	50.37%	68.31%	83.85%
153	Stealth: Aurora Alpha	49.43%	67.54%	83.79%
154	Mistral Small 3.2 24B	47.89%	71.87%	78.58%
155	Inception Mercury	40.36%	69.99%	79.50%