Literary fiction: old friends reunite

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Creative Writing Hallucination

Performance Score Distribution (Top 20)

Click a model name to view its detail page.

	Score
Grok 4.20 (Reasoning)	94%
Mistral Large 2	88%
GPT-5.4	88%
GPT-5.5 (Reasoning)	88%
GPT-5.5 (Reasoning, Low)	87%
Grok 4.5 (Reasoning, High)	87%
GPT-5.5	87%
GPT-5.4 Mini	87%
o4 Mini High	87%
Grok 4.5 (Reasoning, Low)	87%
GPT-5.4 (Reasoning, Low)	86%
Qwen3 235B A22B Instruct 2507	86%
Grok 4.3 (Reasoning)	86%
GPT-5.4 (Reasoning)	86%
Mistral Small 4	86%
Claude Opus 4.8 (Reasoning)	86%
Ministral 3 14B	86%
DeepSeek V4 Pro (Reasoning)	85%
MoonshotAI: Kimi K2.6	85%
Mistral Small 4 (Reasoning)	85%

	Score	Cost	Time
Grok 4.20 (Reasoning)	94%	$0.020	1.9m
Mistral Large 2	88%	$0.0099	26.9s
Mistral Small 4	86%	$0.0012	20.4s
Ministral 3 14B	86%	$0.0005	11.5s
GPT-5.4 Mini	87%	$0.015	16.0s
Mistral Small 4 (Reasoning)	85%	$0.0017	27.6s
Mistral Medium 3.1	85%	$0.0039	31.4s
Mistral Large 3	85%	$0.0019	19.0s
Grok 4.3	85%	$0.0061	27.5s
Qwen3 235B A22B Instruct 2507	86%	$0.0012	1.2m
Writer: Palmyra X5	85%	$0.011	22.5s
DeepSeek V4 Pro (Reasoning)	85%	$0.021	4.5m
Aion 3.0 Mini	83%	$0.0057	1.6m
GPT-5.4 Mini (Reasoning, Low)	84%	$0.015	17.3s
Hermes 3 405B	82%	$0.0023	35.3s
Ministral 3 3B	78%	$0.0003	7.7s
Gemini 3.1 Flash Lite (Reasoning)	82%	$0.0026	8.1s
Grok 4.5 (Reasoning, Low)	87%	$0.018	1.5m
Z.AI GLM 5.2 (Reasoning, High)	81%	$0.012	1.1m
GPT-5.4 Mini (Reasoning)	84%	$0.025	28.5s

	Score	Consistency	Stability
Grok 4.20 (Reasoning)	94%	95%	91%
GPT-5.5 (Reasoning)	88%	98%	86%
GPT-5.4 Mini	87%	98%	85%
GPT-5.4	88%	97%	85%
GPT-5.5 (Reasoning, Low)	87%	98%	85%
Mistral Large 2	88%	95%	85%
GPT-5.4 (Reasoning, Low)	86%	97%	84%
GPT-5.4 (Reasoning)	86%	97%	83%
Mistral Small 4	86%	96%	83%
MoonshotAI: Kimi K2.6	85%	98%	83%
Gemini 3.1 Pro (Preview)	83%	99%	83%
GPT-5.5	87%	96%	83%
Grok 4.3	85%	97%	83%
Writer: Palmyra X5	85%	97%	82%
Qwen 3.5 397B A17B	85%	97%	82%
Mistral Medium 3.1	85%	96%	82%
Mistral Large 3	85%	97%	82%
Claude Opus 4.8 (Reasoning, Low)	84%	97%	82%
Grok 4.3 (Reasoning)	86%	96%	82%
Qwen3.6 Max Preview	84%	96%	82%

	Score	Cost	Speed	Stability
Grok 4.20 (Reasoning)	94%	$0.020	1.9m	91%
Mistral Large 2	88%	$0.0099	26.9s	85%
GPT-5.4 Mini	87%	$0.015	16.0s	85%
Mistral Small 4	86%	$0.0012	20.4s	83%
Ministral 3 14B	86%	$0.0005	11.5s	81%
Mistral Large 3	85%	$0.0019	19.0s	82%
Grok 4.3	85%	$0.0061	27.5s	83%
Mistral Medium 3.1	85%	$0.0039	31.4s	82%
Writer: Palmyra X5	85%	$0.011	22.5s	82%
Mistral Small 4 (Reasoning)	85%	$0.0017	27.6s	80%
Qwen3 235B A22B Instruct 2507	86%	$0.0012	1.2m	80%
GPT-5.4 Mini (Reasoning, Low)	84%	$0.015	17.3s	80%
Gemini 3.1 Flash Lite (Reasoning)	82%	$0.0026	8.1s	79%
GPT-5.4 Mini (Reasoning)	84%	$0.025	28.5s	81%
GPT-5.4 Nano (Reasoning, Low)	83%	$0.0062	22.3s	79%
Grok 4.5 (Reasoning, Low)	87%	$0.018	1.5m	80%
Grok 4.5 (Reasoning, High)	87%	$0.030	1.5m	82%
Qwen 3.5 Flash	82%	$0.0026	1.1m	79%
GPT-5.4 Nano (Reasoning)	82%	$0.0069	27.2s	79%
Z.AI GLM 5 Turbo	83%	$0.0093	42.0s	78%

Rank	Model	Avg. Cost	Avg. Time	Stability	# 1	# 2	# 3	# 4	# 5	Total
1	Grok 4.20 (Reasoning)	$0.020	1.9m	91%	96	96	95	94	90	94%
2	Mistral Large 2	$0.0099	26.9s	85%	91	89	89	87	83	88%
28	GPT-5.4	$0.061	1.8m	85%	89	89	88	88	85	88%
114	GPT-5.5 (Reasoning)	$0.152	2.0m	86%	89	88	87	87	86	88%
122	GPT-5.5 (Reasoning, Low)	$0.155	2.0m	85%	89	87	87	86	85	87%
17	Grok 4.5 (Reasoning, High)	$0.030	1.5m	82%	90	89	86	85	84	87%
127	GPT-5.5	$0.157	2.1m	83%	89	89	86	85	85	87%
3	GPT-5.4 Mini	$0.015	16.0s	85%	88	88	87	86	84	87%
26	o4 Mini High	$0.027	51.3s	77%	94	89	85	83	82	87%
16	Grok 4.5 (Reasoning, Low)	$0.018	1.5m	80%	91	90	86	85	82	87%
45	GPT-5.4 (Reasoning, Low)	$0.066	1.7m	84%	89	87	86	86	85	86%
11	Qwen3 235B A22B Instruct 2507	$0.0012	1.2m	80%	91	89	87	84	81	86%
56	Grok 4.3 (Reasoning)	$0.029	3.1m	82%	90	86	85	85	85	86%
111	GPT-5.4 (Reasoning)	$0.097	3.2m	83%	88	86	86	86	84	86%
4	Mistral Small 4	$0.0012	20.4s	83%	88	87	86	85	83	86%
47	Claude Opus 4.8 (Reasoning)	$0.063	42.2s	80%	91	86	85	84	83	86%
5	Ministral 3 14B	$0.0005	11.5s	81%	89	88	85	84	83	86%
104	DeepSeek V4 Pro (Reasoning)	$0.021	4.5m	78%	91	88	86	82	78	85%
137	MoonshotAI: Kimi K2.6	$0.054	7.5m	83%	87	86	85	84	84	85%
10	Mistral Small 4 (Reasoning)	$0.0017	27.6s	80%	90	87	85	83	80	85%
7	Grok 4.3	$0.0061	27.5s	83%	87	86	85	84	83	85%
9	Writer: Palmyra X5	$0.011	22.5s	82%	87	85	85	84	83	85%
21	Qwen 3.5 397B A17B	$0.011	2.2m	82%	87	86	85	83	83	85%
6	Mistral Large 3	$0.0019	19.0s	82%	87	86	85	84	82	85%
8	Mistral Medium 3.1	$0.0039	31.4s	82%	87	86	86	83	82	85%
14	GPT-5.4 Mini (Reasoning)	$0.025	28.5s	81%	87	85	84	84	82	84%
39	Claude Opus 4.8 (Reasoning, Low)	$0.061	39.2s	82%	86	85	84	84	82	84%
84	GPT-5.1	$0.057	2.5m	81%	87	86	86	83	79	84%
51	DeepSeek V4 Pro	$0.0040	1.2m	74%	91	89	82	81	79	84%
105	Qwen3.6 Max Preview	$0.056	4.0m	82%	86	86	85	83	81	84%
12	GPT-5.4 Mini (Reasoning, Low)	$0.015	17.3s	80%	87	86	85	84	79	84%
107	GPT-5	$0.068	3.2m	81%	86	84	84	82	82	84%
32	Claude Sonnet 4.5	$0.032	35.6s	79%	88	85	84	82	79	84%
101	Claude Opus 4.7	$0.064	30.3s	73%	95	83	82	79	78	83%
102	Gemini 3.1 Pro (Preview)	$0.098	1.8m	83%	84	84	83	83	83	83%
20	Z.AI GLM 5 Turbo	$0.0093	42.0s	78%	88	83	82	82	81	83%
41	Z.AI GLM 5	$0.0095	1.5m	78%	87	86	83	80	79	83%
34	Z.AI GLM 4.6	$0.0067	42.4s	75%	90	86	82	80	78	83%
135	Claude Opus 4	$0.162	1.1m	81%	85	84	84	82	80	83%
15	GPT-5.4 Nano (Reasoning, Low)	$0.0062	22.3s	79%	86	84	82	81	81	83%
67	ByteDance Seed 2.0 Lite	$0.013	2.3m	78%	87	83	82	81	81	83%
49	Aion 3.0 Mini	$0.0057	1.6m	77%	88	86	85	78	76	83%
30	Qwen 3.6 35B	$0.010	1.2m	79%	86	84	82	81	80	83%
76	Qwen 3.5 Plus (2026-04-20)	$0.017	2.1m	76%	87	86	81	80	79	83%
128	Qwen3.7 Max	$0.081	3.0m	77%	87	86	83	79	79	83%
23	o4 Mini	$0.012	22.7s	77%	87	84	81	81	80	83%
74	Qwen 3.5 122B	$0.038	1.5m	78%	87	85	84	80	77	82%
18	Qwen 3.5 Flash	$0.0026	1.1m	79%	85	82	82	82	80	82%
59	Grok 4.20	$0.0080	46.6s	73%	88	87	80	79	78	82%
50	Aion 3.0	$0.023	1.2m	79%	85	83	82	81	80	82%
31	Qwen 3.5 9B	$0.0009	1.2m	77%	87	82	81	81	80	82%
110	MiniMax M3	$0.0099	4.7m	78%	86	84	82	80	79	82%
48	Qwen 3.5 27B	$0.020	1.4m	79%	84	84	82	80	80	82%
118	MoonshotAI: Kimi K2.5	$0.022	5.0m	79%	86	83	83	80	79	82%
37	Gemini 3.5 Flash (Reasoning, Minimal)	$0.018	12.1s	76%	88	84	81	79	79	82%
29	Qwen 3.5 35B	$0.017	1.1m	80%	84	82	82	82	81	82%
24	Qwen 3.5 Plus (2026-02-15)	$0.0051	37.6s	77%	86	82	82	80	79	82%
103	Claude Opus 4.5	$0.083	1.2m	79%	84	84	83	80	78	82%
13	Gemini 3.1 Flash Lite (Reasoning)	$0.0026	8.1s	79%	83	83	82	81	80	82%
82	Claude Opus 4.7 (Reasoning)	$0.072	33.1s	79%	84	83	83	80	78	82%
42	Hermes 3 405B	$0.0023	35.3s	75%	88	83	83	81	73	82%
27	Xiaomi MIMO v2.5 Pro	$0.0074	48.9s	79%	84	82	82	82	78	82%
19	GPT-5.4 Nano (Reasoning)	$0.0069	27.2s	79%	84	83	82	80	79	82%
22	DeepSeek V4 Flash	$0.0006	28.2s	77%	86	83	81	79	78	82%
60	Z.AI GLM 4.5	$0.0052	50.4s	73%	90	81	80	80	77	81%
68	Z.AI GLM 5.1	$0.015	1.4m	76%	86	83	83	81	74	81%
46	Claude Sonnet 5	$0.024	32.4s	78%	84	84	83	79	76	81%
96	Gemma 4 31B (Reasoning)	$0.0015	4.2m	78%	84	83	82	81	77	81%
70	Z.AI GLM 5.2 (Reasoning, High)	$0.012	1.1m	74%	87	86	84	79	71	81%
77	MiniMax M2.7	$0.0056	57.4s	71%	89	84	80	79	74	81%
92	Qwen 3.6 27B	$0.025	2.5m	77%	84	84	82	81	75	81%
40	ByteDance Seed 1.6 Flash	$0.0015	29.4s	75%	86	85	81	79	75	81%
44	DeepSeek V3 (2025-03-24)	$0.0010	1.3m	77%	84	83	81	81	77	81%
100	Claude Sonnet 4	$0.032	48.1s	71%	88	86	80	79	72	81%
62	Qwen 3.6 Flash	$0.0100	44.5s	74%	86	83	81	80	74	81%
57	Gemma 4 26B	$0.0008	1.1m	75%	86	83	80	80	76	81%
52	Qwen 3 32B	$0.0019	1.0m	76%	84	84	80	78	77	81%
43	Gemma 3 27B	$0.0004	51.2s	76%	84	83	81	79	76	81%
132	Claude Opus 4.6 (Reasoning)	$0.096	1.7m	75%	85	83	80	78	77	81%
88	Gemma 4 26B (Reasoning)	$0.0010	2.8m	75%	86	82	80	79	76	81%
36	Ministral 3 8B	$0.0004	9.3s	74%	87	82	81	79	75	80%
130	Claude Opus 4.6	$0.083	1.4m	74%	87	81	80	77	77	80%
66	MiniMax M2.5	$0.0032	1.6m	76%	85	83	82	76	75	80%
25	DeepSeek V4 Flash (Reasoning)	$0.0008	36.6s	78%	82	81	81	80	77	80%
109	Gemini 3.5 Flash (Reasoning)	$0.080	40.9s	77%	82	82	81	80	76	80%
71	WizardLM 2 8x22b	$0.0018	2.8m	79%	81	80	80	79	79	80%
80	Cohere Command R+ (Aug. 2024)	$0.014	27.6s	71%	89	84	82	73	73	80%
65	Claude Sonnet 4.6	$0.027	38.6s	77%	82	81	79	78	78	80%
89	DeepSeek-V2 Chat	$0.0018	1.2m	70%	89	79	78	77	76	80%
133	Claude Sonnet 4.6 (Reasoning)	$0.086	1.6m	74%	85	82	80	77	75	80%
124	GPT-5.2	$0.058	1.5m	74%	84	83	79	77	75	80%
83	Claude Sonnet 5 (Reasoning)	$0.027	37.2s	74%	83	83	78	78	75	80%
94	GPT-4.1	$0.017	51.8s	71%	85	83	77	77	75	80%
38	GPT-5.4 Nano	$0.0066	24.3s	77%	82	81	80	77	77	80%
55	Gemini 3 Flash (Preview, Reasoning)	$0.013	30.8s	77%	82	82	82	77	75	79%
54	Xiaomi MIMO v2.5	$0.0060	34.9s	76%	83	80	80	79	76	79%
97	Hermes 3 70B	$0.0006	1.1m	69%	89	83	79	73	73	79%
78	DeepSeek V3.2	$0.0010	2.3m	77%	81	80	80	79	76	79%
63	GPT-5 Mini	$0.0100	1.0m	77%	81	80	80	79	75	79%
35	Gemini 3.1 Flash Lite	$0.0027	8.8s	76%	82	80	80	78	75	79%
33	Mistral NeMO	$0.0004	8.8s	76%	81	81	80	77	76	79%
136	ByteDance Seed 2.0 Mini	$0.0048	5.4m	73%	85	81	80	76	73	79%
81	Z.AI GLM 4.7	$0.013	1.1m	74%	83	81	79	77	74	79%
53	Gemini 3 Flash (Preview)	$0.0080	20.0s	76%	81	80	78	78	77	79%
64	Gemini 3.1 Flash Lite (Preview)	$0.0027	8.0s	73%	84	81	78	76	76	79%
61	Ministral 3 3B	$0.0003	7.7s	73%	82	82	82	78	68	78%
85	Z.AI GLM 4.7 Flash	$0.0015	1.1m	72%	84	82	78	76	72	78%
72	Gemma 4 31B	$0.0008	1.1m	75%	82	79	78	76	75	78%
79	Aion 2.0	$0.0059	1.3m	75%	80	80	79	75	75	78%
95	Z.AI GLM 4.5 Air	$0.0045	1.3m	72%	83	81	80	75	69	78%
58	Qwen 2.5 72B	$0.0006	33.4s	76%	79	79	78	77	76	78%
93	DeepSeek V3.1	$0.0017	1.7m	74%	81	80	78	76	74	78%
86	Gemini 2.5 Pro	$0.036	36.3s	76%	79	78	78	77	76	78%
75	Ministral 3B	$0.0001	5.4s	71%	82	81	78	76	70	77%
90	DeepSeek V3 (2024-12-26)	$0.0016	41.6s	71%	82	78	78	73	71	76%
73	Cydonia 24B V4.1	$0.0010	36.8s	74%	79	77	76	75	75	76%
121	GPT-5 Nano	$0.0042	1.5m	69%	83	81	79	71	66	76%
69	Ministral 8B	$0.0003	12.8s	74%	77	77	76	76	75	76%
91	GPT-4o, Aug. 6th (temp=1)	$0.017	28.8s	74%	77	77	76	75	73	76%
98	Claude Haiku 4.5	$0.010	22.0s	71%	79	77	75	73	73	76%
112	Claude Sonnet 5 (Reasoning, Low)	$0.028	37.5s	73%	78	77	75	74	73	75%
99	Gemini 2.5 Flash Lite	$0.0008	9.3s	69%	82	75	74	74	71	75%
120	GPT-4o Mini (temp=1)	$0.0011	38.5s	66%	85	75	74	72	71	75%
138	ByteDance Seed 1.6	$0.015	3.0m	70%	78	78	75	72	70	75%
87	Inception Mercury 2	$0.0022	5.5s	72%	77	75	74	74	73	74%
116	GPT-4.1 Nano	$0.0007	13.6s	66%	82	74	73	72	69	74%
106	Gemini 2.5 Flash	$0.0048	10.7s	70%	78	75	74	74	68	74%
113	Gemma 3 12B	$0.0003	43.8s	69%	78	75	72	72	72	74%
115	GPT-4.1 Mini	$0.0025	19.8s	68%	81	75	75	71	67	74%
117	GPT-4o, Aug. 6th (temp=0)	$0.016	24.9s	70%	77	74	74	74	68	73%
119	Gemini 2.5 Flash Lite (Reasoning)	$0.0026	38.5s	68%	79	73	72	71	71	73%
125	Gemini 2.5 Flash (Reasoning)	$0.012	24.5s	68%	78	75	73	70	69	73%
134	Mistral Small 3.2 24B	$0.0023	1.9m	68%	77	76	74	72	65	73%
108	Gemma 3 4B	$0.0002	21.2s	70%	75	74	74	71	69	72%
131	Nemotron 3 Super	$0.0000	37.2s	65%	79	70	70	70	69	72%
129	Llama 3.1 70B	$0.0009	25.7s	65%	79	72	72	71	64	72%
126	GPT-4o Mini (temp=0)	$0.0011	32.5s	68%	74	72	71	70	69	71%
123	Arcee AI: Trinity Mini	$0.0003	10.6s	68%	73	71	71	71	67	70%
139	GPT-OSS 120B	$0.0016	1.8m	66%	73	72	70	67	66	70%
140	Nemotron 3 Nano	$0.0011	1.3m	64%	69	69	68	64	62	66%
80.61%

Median	Evaluator	Top 3	Flop 3
100.0%	"Not X but Y" pattern overuse	100Gemini 2.5 Flash (Reasoning) 100Qwen 3.5 Plus (2026-02-15) 100Claude Sonnet 5 (Reasoning)	15GPT-5 Nano 40Nemotron 3 Super 65Gemini 2.5 Flash Lite (Reasoning)
41.8%	Adverb-first sentence starts	98Qwen3 235B A22B Instruct 2507 96Ministral 3 8B 96Ministral 3 14B	0Qwen 3.5 Plus (2026-02-15) 0Nemotron 3 Nano 0GPT-4o, Aug. 6th (temp=0)
100.0%	Adverbs in dialogue tags	100GPT-5.4 Nano (Reasoning, Low) 100Ministral 3 3B 100Aion 2.0	28GPT-4.1 Nano 30Cydonia 24B V4.1 37GPT-4.1 Mini
90.1%	AI-ism adverb frequency	99GPT-5.5 (Reasoning) 99GPT-5.2 99GPT-5	60GPT-4.1 Nano 63Cydonia 24B V4.1 64GPT-4.1 Mini
100.0%	AI-ism character names	100GPT-5.4 Nano (Reasoning) 100Nemotron 3 Nano 100Gemini 3.1 Pro (Preview)	76Z.AI GLM 5 84MiniMax M2.7 88DeepSeek V4 Flash (Reasoning)
100.0%	AI-ism location names	100Gemma 4 31B (Reasoning) 100DeepSeek-V2 Chat 100Grok 4.20 (Reasoning)	96Claude Sonnet 4 96Claude Opus 4 96Claude Opus 4.6 (Reasoning)
52.6%	AI-ism word frequency	92Claude Sonnet 5 91Claude Opus 4.7 90Claude Opus 4.7 (Reasoning)	0GPT-4o Mini (temp=0) 1GPT-4o Mini (temp=1) 3Inception Mercury 2
100.0%	Cliché density	100Claude Opus 4.6 100o4 Mini 100Claude Sonnet 4.6	13GPT-4o Mini (temp=0) 40Mistral Small 3.2 24B 53Gemini 2.5 Flash (Reasoning)
100.0%	Dialogue tag variety (said vs. fancy)	100Claude Opus 4.5 100Z.AI GLM 5 Turbo 100Qwen 3.6 Flash	21Gemma 3 4B 37GPT-4o, Aug. 6th (temp=1) 39Gemma 3 12B
67.9%	Em-dash & semicolon overuse	100Qwen 3.5 Flash 100Qwen 3.5 9B 100Qwen 3.6 Flash	0Z.AI GLM 5 0ByteDance Seed 1.6 1MoonshotAI: Kimi K2.5
100.0%	Emotion telling (show vs. tell)	100Ministral 3 14B 100Qwen 3.6 35B 100Grok 4.20 (Reasoning)	75Mistral Small 3.2 24B 94Cohere Command R+ (Aug. 2024) 95GPT-4o, Aug. 6th (temp=1)
100.0%	Filter word density	100MiniMax M2.5 100Gemini 3 Flash (Preview) 100Claude Sonnet 4.6 (Reasoning)	35Nemotron 3 Nano 46Gemini 3.1 Flash Lite (Reasoning) 66GPT-5 Nano
100.0%	Gibberish response detection	100o4 Mini High 100Gemma 3 27B 100GPT-5.4 Nano (Reasoning)	80Llama 3.1 70B 99Cydonia 24B V4.1 99DeepSeek V3 (2025-03-24)
100.0%	Markdown formatting overuse	100Claude Sonnet 5 (Reasoning) 100Gemini 2.5 Pro 100GPT-4.1	80Ministral 3B 94ByteDance Seed 1.6 Flash
100.0%	Missing dialogue indicators (quotation marks)	100Mistral Large 2 100GPT-4.1 Nano 100Claude Haiku 4.5	80Qwen 3.5 Flash 80Qwen 3.5 Plus (2026-04-20) 83Qwen 3.5 35B
53.0%	Name drop frequency	100DeepSeek V3.1 100Gemma 3 4B 100Z.AI GLM 4.6	0GPT-5.2 0GPT-5.5 (Reasoning) 0GPT-5.5 (Reasoning, Low)
81.2%	Narrator intent-glossing	100Grok 4.3 (Reasoning) 100ByteDance Seed 1.6 Flash 100Grok 4.3	7Nemotron 3 Nano 18GPT-5.4 Nano 19Gemini 2.5 Flash Lite
100.0%	Overuse of "that" (subordinate clause padding)	100Mistral NeMO 100Nemotron 3 Super 100o4 Mini	65ByteDance Seed 2.0 Mini 70ByteDance Seed 1.6 80Mistral Small 3.2 24B
100.0%	Paragraph length variance	100GPT-5.4 Mini 100Cydonia 24B V4.1 100Ministral 3B	53Nemotron 3 Nano 61Arcee AI: Trinity Mini 62GPT-OSS 120B
98.9%	Passive voice overuse	100GPT-5.4 Mini (Reasoning, Low) 100GPT-5.4 (Reasoning) 100GPT-5.5	81Claude Sonnet 5 (Reasoning, Low) 90Claude Sonnet 4.6 (Reasoning) 91ByteDance Seed 2.0 Lite
100.0%	Past progressive (was/were + -ing) overuse	100GPT-5.4 100GPT-5.4 (Reasoning, Low) 100GPT-5 Nano	31Claude Sonnet 4.6 48Claude Opus 4.7 (Reasoning) 51Z.AI GLM 4.6
58.4%	Pronoun-first sentence starts	100GPT-5.1 100GPT-5.5 (Reasoning, Low) 100Claude Opus 4.5	0GPT-4.1 Nano 0DeepSeek V3.1 0Gemma 3 4B
97.6%	Purple prose (modifier overload)	100GPT-5 100Xiaomi MIMO v2.5 100Grok 4.5 (Reasoning, Low)	82GPT-4.1 Nano 88Gemini 2.5 Flash Lite 88Gemma 4 26B (Reasoning)
100.0%	Repeated phrase echo	100MoonshotAI: Kimi K2.5 100WizardLM 2 8x22b 100Grok 4.20	—
100.0%	Sentence length variance	100Hermes 3 70B 100Nemotron 3 Super 100GPT-5.2	93Llama 3.1 70B 98GPT-OSS 120B 98Inception Mercury 2
48.9%	Sentence opener variety	86GPT-4o Mini (temp=1) 83GPT-4o, Aug. 6th (temp=1) 82Cohere Command R+ (Aug. 2024)	30Qwen 3.6 35B 33Qwen 3.5 35B 33Qwen3.6 Max Preview
10.5%	Subject-first sentence starts	87Grok 4.20 (Reasoning) 64Writer: Palmyra X5 63Qwen3 235B A22B Instruct 2507	0Gemini 3.1 Pro (Preview) 0GPT-5.4 Nano 0Gemma 4 31B (Reasoning)
20.3%	Subordinate conjunction sentence starts	83Grok 4.20 (Reasoning) 80Ministral 3 14B 73GPT-5 Nano	0Ministral 8B 0Gemini 2.5 Flash (Reasoning) 0Aion 2.0
83.0%	Technical jargon density	100Gemma 3 27B 100Qwen 3.5 9B 100Qwen 3.5 397B A17B	17Claude Sonnet 5 (Reasoning, Low) 17ByteDance Seed 1.6 22Claude Sonnet 5 (Reasoning)
77.6%	Useless dialogue additions	100GPT-5.4 100Grok 4.3 (Reasoning) 100Grok 4.20 (Reasoning)	0GPT-OSS 120B 0Mistral NeMO 0Gemma 3 4B

Bad Writing Habits

Literary fiction: old friends reunite

Performance Score Distribution (Top 20)

Price-Performance Score Distribution (Top 20)

Most Stable Models (Top 20)

Top Overall Models (Top 20)