Clichés

Subcategory of Creative Writing. 155 models scored.

Model Leaderboard

All models ranked by their Clichés subcategory score.

#	Model	Clichés	Creative Writing	Overall
1	o4 Mini High	90.38%	82.72%	90.29%
2	Qwen3.6 Max Preview	89.87%	88.42%	94.54%
3	GPT-5	89.59%	86.87%	91.93%
4	Gemini 3.1 Pro (Preview)	88.95%	85.44%	94.37%
5	Grok 4.20 (Reasoning)	87.99%	86.25%	91.39%
6	Grok 4.3 (Reasoning)	87.57%	85.11%	93.60%
7	Qwen3.7 Max	87.36%	85.39%	95.75%
8	Grok 4.20 (Beta, Reasoning)	87.27%	84.50%	91.49%
9	Gemini 2.5 Pro	87.21%	81.03%	88.53%
10	o4 Mini	86.60%	82.04%	88.35%
11	GPT-5.4 (Reasoning)	86.52%	91.17%	93.24%
12	Grok 4.20	86.50%	83.44%	81.70%
13	GPT-5.1	86.39%	87.20%	92.54%
14	Qwen 3.6 35B	86.36%	85.97%	89.05%
15	Grok 4.3	85.60%	84.51%	78.66%
16	Qwen 3.6 Flash	85.50%	86.02%	90.65%
17	Grok 4.20 (Beta)	85.46%	82.80%	83.85%
18	GPT-5.4 (Reasoning, Low)	84.51%	90.51%	91.41%
19	Qwen 3.5 Plus (2026-04-20)	84.50%	85.18%	91.51%
20	GPT-5.4	84.30%	90.94%	84.32%
21	DeepSeek V3.1	83.91%	77.45%	82.39%
22	Z.AI GLM 4.6	83.84%	78.86%	89.11%
23	Gemma 3 27B	83.82%	78.79%	77.85%
24	GPT-4.1	83.80%	81.24%	88.68%
25	Claude Opus 4.7 (Reasoning)	83.59%	84.73%	93.23%
26	Claude Opus 4.8 (Reasoning)	83.58%	85.25%	92.22%
27	Skyfall 36B V2	83.52%	83.32%	65.76%
28	GPT-5.4 Mini (Reasoning)	83.22%	88.66%	90.65%
29	Qwen 3.5 397B A17B	83.12%	86.93%	91.73%
30	Claude Opus 4.8 (Reasoning, Low)	83.05%	85.86%	92.14%
31	Gemma 3 12B	82.85%	75.38%	78.41%
32	Rocinante 12B	82.68%	81.94%	54.54%
33	Hermes 3 405B	82.60%	80.92%	82.86%
34	Claude Opus 4.7	82.11%	84.74%	89.93%
35	Qwen 3.5 Flash	82.05%	83.81%	86.38%
36	Qwen3 235B A22B Instruct 2507	82.05%	84.81%	80.10%
37	ByteDance Seed 2.0 Mini	81.90%	80.11%	86.91%
38	Grok 4.1 Fast	81.78%	82.14%	89.55%
39	Gemini 3.5 Flash (Reasoning)	81.76%	79.87%	94.08%
40	Qwen 3 32B	81.72%	81.30%	82.21%
41	Z.AI GLM 4.7	81.62%	78.89%	88.69%
42	Aion 2.0	81.53%	80.24%	89.21%
43	GPT-5.5	81.40%	90.39%	89.09%
44	Mistral Large 2	81.36%	81.86%	82.41%
45	GPT-5.5 (Reasoning)	81.35%	90.26%	92.98%
46	DeepSeek V3.2	81.32%	79.95%	82.25%
47	Gemini 3 Flash (Preview, Reasoning)	81.18%	75.87%	90.50%
48	Qwen 3.5 35B	81.11%	83.51%	88.00%
49	Z.AI GLM 4.7 Flash	81.09%	77.36%	84.82%
50	Mistral Large 3	80.74%	81.21%	85.43%
51	GPT-4.1 Nano	80.71%	71.81%	71.94%
52	Qwen 3.6 27B	80.66%	82.81%	89.72%
53	Gemini 3 Pro (Preview)	80.50%	77.77%	88.79%
54	DeepSeek V3 (2025-03-24)	80.49%	82.34%	81.99%
55	GPT-4o, May 13th (temp=1)	80.44%	75.88%	83.80%
56	GPT-4.1 Mini	80.38%	74.52%	83.20%
57	Gemini 3.5 Flash (Reasoning, Minimal)	80.34%	78.55%	86.47%
58	Gemma 4 26B	80.33%	75.17%	85.84%
59	Gemini 3 Flash (Preview)	80.32%	75.04%	85.35%
60	Qwen 3.5 122B	80.31%	83.02%	91.53%
61	GPT-4o, Aug. 6th (temp=1)	80.25%	75.50%	82.62%
62	Gemma 3 4B	80.17%	72.10%	68.57%
63	DeepSeek V4 Flash (Reasoning)	80.09%	83.03%	89.01%
64	Mistral Large	80.04%	82.02%	80.15%
65	Gemini 3.1 Flash Lite	79.97%	76.01%	85.75%
66	Mistral Medium 3.1	79.96%	81.70%	77.83%
67	GPT-5.4 Mini	79.92%	88.10%	82.43%
68	Qwen 3.5 9B	79.82%	84.35%	86.05%
69	DeepSeek V3 (2024-12-26)	79.81%	77.88%	83.68%
70	Arcee AI: Trinity Mini	79.76%	74.01%	70.90%
71	Gemma 4 26B (Reasoning)	79.74%	76.38%	91.49%
72	Gemini 3.1 Flash Lite (Preview)	79.66%	75.78%	85.87%
73	Gemma 4 31B (Reasoning)	79.53%	78.13%	91.71%
74	DeepSeek V4 Flash	79.51%	83.42%	82.02%
75	Gemini 3.1 Flash Lite (Reasoning)	79.50%	76.31%	86.41%
76	WizardLM 2 8x22b	79.40%	79.06%	71.06%
77	Llama 3.1 8B	79.32%	76.54%	63.35%
78	DeepSeek-V2 Chat	79.10%	77.20%	84.83%
79	GPT-5.5 (Reasoning, Low)	78.89%	90.24%	92.59%
80	Z.AI GLM 5	78.86%	83.63%	91.23%
81	GPT-5.4 Mini (Reasoning, Low)	78.74%	87.72%	85.75%
82	Ministral 3 14B	78.70%	79.11%	72.54%
83	MiniMax M3	78.49%	84.57%	90.88%
84	GPT-5 Mini	78.48%	80.48%	92.62%
85	Qwen 3.5 27B	78.44%	82.54%	90.85%
86	Hermes 3 70B	78.42%	77.41%	72.57%
87	Writer: Palmyra X5	78.38%	83.95%	79.57%
88	Z.AI GLM 5 Turbo	78.37%	84.66%	94.27%
89	Claude Opus 4	78.30%	83.79%	87.69%
90	GPT-4o Mini (temp=1)	78.20%	74.37%	79.08%
91	DeepSeek V4 Pro (Reasoning)	78.10%	82.99%	90.10%
92	Gemini 2.5 Flash	78.05%	77.57%	80.60%
93	MoonshotAI: Kimi K2.6	77.97%	85.47%	92.31%
94	Cohere Command R+ (Aug. 2024)	77.80%	77.70%	69.03%
95	ByteDance Seed 1.6	77.64%	78.43%	90.70%
96	Xiaomi MIMO v2.5	77.49%	79.16%	85.05%
97	Grok 4	77.48%	77.34%	88.12%
98	Z.AI GLM 5.1	77.47%	84.05%	94.37%
99	Gemini 2.5 Flash (Reasoning)	77.43%	76.30%	86.51%
100	DeepSeek V4 Pro	77.40%	83.70%	82.63%
101	Claude 3.5 Sonnet	77.15%	78.69%	84.24%
102	Gemma 4 31B	76.74%	75.59%	86.91%
103	Claude Sonnet 4.6	76.43%	83.31%	91.15%
104	Claude Sonnet 4.5	76.41%	84.19%	88.03%
105	Gemini 2.5 Flash Lite	76.25%	75.05%	81.08%
106	Xiaomi MIMO v2.5 Pro	76.24%	81.08%	87.36%
107	Ministral 3 8B	76.24%	77.26%	71.76%
108	Mistral Small 4	76.23%	81.12%	76.46%
109	ByteDance Seed 2.0 Lite	76.22%	82.35%	84.80%
110	Mistral Small Creative	76.20%	80.29%	73.27%
111	Mistral Small 4 (Reasoning)	75.97%	81.67%	82.39%
112	Mistral NeMO	75.80%	76.72%	65.04%
113	Claude Opus 4.6 (Reasoning)	75.75%	84.55%	95.02%
114	ByteDance Seed 1.6 Flash	75.52%	81.51%	73.27%
115	MiniMax M2.5	75.41%	81.21%	88.71%
116	Claude Sonnet 4.6 (Reasoning)	75.13%	83.09%	93.66%
117	Stealth: Hunter Alpha	74.96%	79.18%	87.34%
118	LFM2 24B	74.86%	78.10%	58.77%
119	Grok 4 Fast	74.82%	77.03%	86.15%
120	MiniMax M2.7	74.72%	81.70%	89.10%
121	Stealth: Healer Alpha	74.72%	78.28%	85.93%
122	Claude Opus 4.5	73.89%	81.71%	89.69%
123	Arcee AI: Trinity Large (Preview)	73.85%	75.26%	73.33%
124	Ministral 8B	73.81%	76.87%	64.87%
125	Claude Sonnet 4	73.80%	79.21%	88.72%
126	Qwen 2.5 72B	72.84%	75.16%	75.46%
127	GPT-5.4 Nano (Reasoning)	72.62%	80.97%	81.36%
128	Claude Opus 4.6	72.59%	83.59%	92.35%
129	Z.AI GLM 4.5	72.34%	76.56%	86.27%
130	Claude Haiku 4.5	71.93%	78.96%	85.14%
131	Qwen 3.5 Plus (2026-02-15)	71.92%	77.07%	85.96%
132	Cydonia 24B V4.1	71.78%	74.19%	75.09%
133	Llama 3.1 Nemotron 70B	71.70%	71.71%	74.70%
134	Llama 3.1 70B	70.83%	72.78%	78.40%
135	Ministral 3B	70.80%	75.49%	61.29%
136	Ministral 3 3B	70.63%	75.45%	67.22%
137	GPT-4o Mini (temp=0)	70.56%	73.10%	78.29%
138	GPT-OSS 120B	70.48%	67.85%	86.44%
139	GPT-5.4 Nano (Reasoning, Low)	70.47%	80.93%	79.48%
140	GPT-4o, May 13th (temp=0)	69.79%	74.89%	85.36%
141	GPT-5.4 Nano	69.68%	80.50%	74.40%
142	GPT-4o, Aug. 6th (temp=0)	69.33%	73.65%	82.45%
143	Claude 3.7 Sonnet	69.12%	76.31%	83.39%
144	Inception Mercury	68.89%	69.99%	79.50%
145	Claude 3 Haiku	68.71%	74.53%	71.19%
146	Mistral Small 3.2 24B	68.55%	71.87%	78.58%
147	Z.AI GLM 4.5 Air	68.55%	74.61%	83.12%
148	Stealth: Aurora Alpha	68.10%	67.54%	83.79%
149	Inception Mercury 2	67.53%	68.31%	83.85%
150	MoonshotAI: Kimi K2.5	67.36%	81.35%	91.04%
151	Gemini 2.5 Flash Lite (Reasoning)	66.75%	71.64%	85.75%
152	GPT-5.2	66.36%	80.36%	90.26%
153	Nemotron 3 Super	64.41%	69.75%	84.56%
154	Nemotron 3 Nano	64.30%	65.87%	77.73%
155	GPT-5 Nano	50.07%	67.04%	82.60%