Purple Prose

Subcategory of Creative Writing. 155 models scored.

Model Leaderboard

All models ranked by their Purple Prose subcategory score.

#	Model	Purple Prose	Creative Writing	Overall
1	GPT-5.5	98.39%	90.39%	89.09%
2	GPT-5.5 (Reasoning, Low)	98.18%	90.24%	92.59%
3	GPT-5.5 (Reasoning)	98.18%	90.26%	92.98%
4	Qwen3.6 Max Preview	98.07%	88.42%	94.54%
5	Grok 4.1 Fast	97.96%	82.14%	89.55%
6	Qwen 3.5 Plus (2026-04-20)	97.91%	85.18%	91.51%
7	o4 Mini High	97.56%	82.72%	90.29%
8	GPT-5.4 (Reasoning)	97.55%	91.17%	93.24%
9	o4 Mini	97.30%	82.04%	88.35%
10	Qwen 3.6 Flash	96.80%	86.02%	90.65%
11	Qwen 3.6 27B	96.75%	82.81%	89.72%
12	GPT-5.4	96.74%	90.94%	84.32%
13	GPT-5.4 (Reasoning, Low)	96.68%	90.51%	91.41%
14	Qwen 3.5 9B	96.66%	84.35%	86.05%
15	Qwen 3.6 35B	96.65%	85.97%	89.05%
16	Qwen3 235B A22B Instruct 2507	95.70%	84.81%	80.10%
17	MoonshotAI: Kimi K2.6	95.55%	85.47%	92.31%
18	Grok 4.3	95.49%	84.51%	78.66%
19	Qwen3.7 Max	95.46%	85.39%	95.75%
20	DeepSeek V3 (2025-03-24)	95.33%	82.34%	81.99%
21	GPT-5.4 Mini (Reasoning, Low)	95.16%	87.72%	85.75%
22	ByteDance Seed 1.6 Flash	95.11%	81.51%	73.27%
23	Gemini 3.1 Pro (Preview)	95.06%	85.44%	94.37%
24	Writer: Palmyra X5	94.98%	83.95%	79.57%
25	Qwen 3.5 397B A17B	94.84%	86.93%	91.73%
26	Qwen 3.5 Flash	94.71%	83.81%	86.38%
27	GPT-5.4 Mini	94.65%	88.10%	82.43%
28	Mistral Medium 3.1	94.59%	81.70%	77.83%
29	Qwen 3.5 27B	94.46%	82.54%	90.85%
30	Mistral Large	94.33%	82.02%	80.15%
31	Qwen 3.5 122B	94.27%	83.02%	91.53%
32	GPT-5.4 Mini (Reasoning)	94.24%	88.66%	90.65%
33	Mistral Small Creative	94.23%	80.29%	73.27%
34	Grok 4.20 (Reasoning)	94.23%	86.25%	91.39%
35	Claude Opus 4.8 (Reasoning, Low)	94.06%	85.86%	92.14%
36	Grok 4.3 (Reasoning)	94.05%	85.11%	93.60%
37	Claude Opus 4.8 (Reasoning)	93.97%	85.25%	92.22%
38	Mistral Small 4	93.71%	81.12%	76.46%
39	Qwen 3 32B	93.62%	81.30%	82.21%
40	Hermes 3 405B	93.61%	80.92%	82.86%
41	Qwen 3.5 35B	93.58%	83.51%	88.00%
42	GPT-5.1	93.57%	87.20%	92.54%
43	GPT-4.1	93.57%	81.24%	88.68%
44	MoonshotAI: Kimi K2.5	93.06%	81.35%	91.04%
45	Claude Opus 4.7 (Reasoning)	92.86%	84.73%	93.23%
46	Grok 4 Fast	92.79%	77.03%	86.15%
47	Ministral 3 14B	92.74%	79.11%	72.54%
48	Mistral Large 3	92.61%	81.21%	85.43%
49	Xiaomi MIMO v2.5 Pro	92.48%	81.08%	87.36%
50	Rocinante 12B	92.33%	81.94%	54.54%
51	DeepSeek V4 Flash (Reasoning)	92.27%	83.03%	89.01%
52	GPT-5	92.06%	86.87%	91.93%
53	Mistral Large 2	91.94%	81.86%	82.41%
54	Claude Opus 4.6 (Reasoning)	91.88%	84.55%	95.02%
55	DeepSeek V4 Pro	91.83%	83.70%	82.63%
56	Gemini 3 Pro (Preview)	91.81%	77.77%	88.79%
57	Gemini 3.5 Flash (Reasoning)	91.71%	79.87%	94.08%
58	DeepSeek V4 Pro (Reasoning)	91.70%	82.99%	90.10%
59	Stealth: Hunter Alpha	91.65%	79.18%	87.34%
60	DeepSeek-V2 Chat	91.52%	77.20%	84.83%
61	LFM2 24B	91.50%	78.10%	58.77%
62	Claude Opus 4.7	91.45%	84.74%	89.93%
63	Grok 4	91.44%	77.34%	88.12%
64	DeepSeek V3 (2024-12-26)	91.42%	77.88%	83.68%
65	Z.AI GLM 4.7	91.38%	78.89%	88.69%
66	GPT-4o Mini (temp=1)	91.21%	74.37%	79.08%
67	Claude Opus 4.6	91.07%	83.59%	92.35%
68	MiniMax M3	90.97%	84.57%	90.88%
69	Grok 4.20 (Beta, Reasoning)	90.81%	84.50%	91.49%
70	DeepSeek V4 Flash	90.79%	83.42%	82.02%
71	ByteDance Seed 2.0 Lite	90.78%	82.35%	84.80%
72	Ministral 3 8B	90.74%	77.26%	71.76%
73	DeepSeek V3.2	90.65%	79.95%	82.25%
74	Mistral Small 4 (Reasoning)	90.49%	81.67%	82.39%
75	Claude Opus 4	90.42%	83.79%	87.69%
76	Grok 4.20	90.27%	83.44%	81.70%
77	Ministral 3B	90.17%	75.49%	61.29%
78	Claude Sonnet 4.6	90.14%	83.31%	91.15%
79	Gemini 2.5 Pro	90.00%	81.03%	88.53%
80	Skyfall 36B V2	89.93%	83.32%	65.76%
81	Grok 4.20 (Beta)	89.84%	82.80%	83.85%
82	Mistral NeMO	89.84%	76.72%	65.04%
83	Aion 2.0	89.80%	80.24%	89.21%
84	Z.AI GLM 5.1	89.65%	84.05%	94.37%
85	Z.AI GLM 4.7 Flash	89.60%	77.36%	84.82%
86	Ministral 3 3B	89.60%	75.45%	67.22%
87	Claude Sonnet 4.5	89.39%	84.19%	88.03%
88	Ministral 8B	89.29%	76.87%	64.87%
89	Claude Sonnet 4.6 (Reasoning)	89.19%	83.09%	93.66%
90	Gemini 3.5 Flash (Reasoning, Minimal)	88.91%	78.55%	86.47%
91	GPT-4o Mini (temp=0)	88.83%	73.10%	78.29%
92	Xiaomi MIMO v2.5	88.79%	79.16%	85.05%
93	Claude 3 Haiku	88.64%	74.53%	71.19%
94	Z.AI GLM 5	88.49%	83.63%	91.23%
95	Gemma 3 27B	88.37%	78.79%	77.85%
96	Stealth: Healer Alpha	88.08%	78.28%	85.93%
97	GPT-4o, Aug. 6th (temp=1)	87.96%	75.50%	82.62%
98	GPT-4.1 Mini	87.91%	74.52%	83.20%
99	Z.AI GLM 5 Turbo	87.80%	84.66%	94.27%
100	ByteDance Seed 2.0 Mini	87.57%	80.11%	86.91%
101	GPT-5.2	87.55%	80.36%	90.26%
102	Qwen 3.5 Plus (2026-02-15)	87.42%	77.07%	85.96%
103	Cohere Command R+ (Aug. 2024)	87.40%	77.70%	69.03%
104	GPT-4o, May 13th (temp=1)	87.39%	75.88%	83.80%
105	Hermes 3 70B	87.04%	77.41%	72.57%
106	ByteDance Seed 1.6	87.01%	78.43%	90.70%
107	Gemma 3 12B	86.76%	75.38%	78.41%
108	DeepSeek V3.1	86.19%	77.45%	82.39%
109	GPT-4o, May 13th (temp=0)	86.01%	74.89%	85.36%
110	Arcee AI: Trinity Mini	85.69%	74.01%	70.90%
111	WizardLM 2 8x22b	85.19%	79.06%	71.06%
112	Gemma 4 31B (Reasoning)	85.05%	78.13%	91.71%
113	GPT-4o, Aug. 6th (temp=0)	84.90%	73.65%	82.45%
114	GPT-5.4 Nano (Reasoning, Low)	84.72%	80.93%	79.48%
115	MiniMax M2.5	84.71%	81.21%	88.71%
116	Mistral Small 3.2 24B	84.70%	71.87%	78.58%
117	Qwen 2.5 72B	84.37%	75.16%	75.46%
118	Z.AI GLM 4.6	84.37%	78.86%	89.11%
119	Gemini 3 Flash (Preview)	84.15%	75.04%	85.35%
120	GPT-5.4 Nano (Reasoning)	84.09%	80.97%	81.36%
121	MiniMax M2.7	83.74%	81.70%	89.10%
122	GPT-5.4 Nano	83.70%	80.50%	74.40%
123	Gemini 2.5 Flash	83.37%	77.57%	80.60%
124	Gemma 4 31B	83.35%	75.59%	86.91%
125	Claude Opus 4.5	83.27%	81.71%	89.69%
126	GPT-5 Mini	83.16%	80.48%	92.62%
127	Gemma 3 4B	83.05%	72.10%	68.57%
128	GPT-4.1 Nano	82.65%	71.81%	71.94%
129	Gemini 3.1 Flash Lite	82.35%	76.01%	85.75%
130	Claude 3.5 Sonnet	82.20%	78.69%	84.24%
131	Gemini 3.1 Flash Lite (Preview)	82.09%	75.78%	85.87%
132	Claude 3.7 Sonnet	81.90%	76.31%	83.39%
133	Claude Haiku 4.5	81.79%	78.96%	85.14%
134	Z.AI GLM 4.5 Air	81.79%	74.61%	83.12%
135	Z.AI GLM 4.5	81.73%	76.56%	86.27%
136	Gemini 2.5 Flash (Reasoning)	81.71%	76.30%	86.51%
137	Arcee AI: Trinity Large (Preview)	81.59%	75.26%	73.33%
138	Gemini 3.1 Flash Lite (Reasoning)	80.96%	76.31%	86.41%
139	Gemini 2.5 Flash Lite	79.16%	75.05%	81.08%
140	Gemini 3 Flash (Preview, Reasoning)	78.95%	75.87%	90.50%
141	Claude Sonnet 4	78.57%	79.21%	88.72%
142	Gemma 4 26B (Reasoning)	78.14%	76.38%	91.49%
143	Gemma 4 26B	77.99%	75.17%	85.84%
144	Cydonia 24B V4.1	77.12%	74.19%	75.09%
145	Gemini 2.5 Flash Lite (Reasoning)	72.59%	71.64%	85.75%
146	Inception Mercury	72.27%	69.99%	79.50%
147	Llama 3.1 70B	71.74%	72.78%	78.40%
148	Nemotron 3 Super	70.69%	69.75%	84.56%
149	GPT-OSS 120B	70.06%	67.85%	86.44%
150	Llama 3.1 Nemotron 70B	69.97%	71.71%	74.70%
151	Llama 3.1 8B	69.60%	76.54%	63.35%
152	Inception Mercury 2	67.97%	68.31%	83.85%
153	GPT-5 Nano	66.77%	67.04%	82.60%
154	Stealth: Aurora Alpha	65.58%	67.54%	83.79%
155	Nemotron 3 Nano	59.84%	65.87%	77.73%