Mystery: examining a crime scene

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Creative Writing Hallucination

Performance Score Distribution (Top 20)

Click a model name to view its detail page.

	Score
GPT-5.4 (Reasoning, Low)	93%
Claude Opus 4.7 (Reasoning)	92%
Claude Opus 4.6 (Reasoning)	92%
GPT-5.4	92%
GPT-5.4 (Reasoning)	91%
Claude Opus 4.8 (Reasoning)	91%
Claude Sonnet 5 (Reasoning, Low)	91%
Grok 4.5 (Reasoning, High)	91%
GPT-5.4 Mini (Reasoning)	90%
GPT-5.1	90%
Claude Opus 4.8 (Reasoning, Low)	90%
Claude Sonnet 4.5	90%
Grok 4.5 (Reasoning, Low)	90%
Z.AI GLM 5.2 (Reasoning, High)	90%
GPT-5.5	90%
GPT-5.4 Mini	89%
GPT-5.5 (Reasoning, Low)	89%
Writer: Palmyra X5	89%
Claude Sonnet 5 (Reasoning)	89%
Claude Sonnet 4.6	89%

	Score	Cost	Time
Z.AI GLM 5.2 (Reasoning, High)	90%	$0.011	57.3s
DeepSeek V4 Flash (Reasoning)	89%	$0.0008	29.5s
GPT-5.4 Mini	89%	$0.014	15.4s
GPT-5.4 Mini (Reasoning)	90%	$0.026	33.4s
DeepSeek V4 Flash	88%	$0.0006	24.0s
GPT-5.4 Mini (Reasoning, Low)	88%	$0.014	15.7s
Writer: Palmyra X5	89%	$0.013	21.6s
Grok 4.3	87%	$0.0080	21.7s
Qwen 3.6 Flash	88%	$0.012	45.2s
Grok 4.5 (Reasoning, Low)	90%	$0.017	1.5m
Z.AI GLM 5 Turbo	88%	$0.0088	33.3s
Qwen3 235B A22B Instruct 2507	87%	$0.0011	47.2s
GPT-5.4 (Reasoning, Low)	93%	$0.054	1.2m
Claude Haiku 4.5	87%	$0.015	23.7s
Xiaomi MIMO v2.5 Pro	88%	$0.0093	54.2s
Claude Sonnet 5 (Reasoning, Low)	91%	$0.042	45.5s
Z.AI GLM 5.1	88%	$0.017	1.1m
MiniMax M2.7	85%	$0.0043	1.3m
Claude Sonnet 4.5	90%	$0.042	39.8s
GPT-5.4	92%	$0.044	1.3m

	Score	Consistency	Stability
GPT-5.4 (Reasoning, Low)	93%	98%	91%
GPT-5.4 Mini (Reasoning)	90%	99%	90%
GPT-5.1	90%	99%	89%
Claude Opus 4.7 (Reasoning)	92%	95%	89%
Claude Opus 4.6 (Reasoning)	92%	96%	88%
Claude Opus 4.8 (Reasoning)	91%	97%	88%
Claude Opus 4.8 (Reasoning, Low)	90%	97%	87%
GPT-5.5 (Reasoning)	89%	98%	87%
GPT-5.4 Mini	89%	98%	87%
Grok 4.5 (Reasoning, High)	91%	94%	87%
GPT-5.4	92%	96%	87%
Z.AI GLM 5.2 (Reasoning, High)	90%	96%	87%
GPT-5.4 (Reasoning)	91%	94%	86%
Claude Sonnet 5 (Reasoning, Low)	91%	96%	86%
GPT-5	87%	98%	86%
GPT-5.5	90%	95%	85%
Claude Sonnet 4.5	90%	96%	85%
Writer: Palmyra X5	89%	96%	85%
GPT-5.4 Mini (Reasoning, Low)	88%	95%	84%
Qwen3.6 Max Preview	87%	96%	84%

	Score	Cost	Speed	Stability
GPT-5.4 Mini (Reasoning)	90%	$0.026	33.4s	90%
GPT-5.4 (Reasoning, Low)	93%	$0.054	1.2m	91%
GPT-5.4 Mini	89%	$0.014	15.4s	87%
Z.AI GLM 5.2 (Reasoning, High)	90%	$0.011	57.3s	87%
Writer: Palmyra X5	89%	$0.013	21.6s	85%
DeepSeek V4 Flash (Reasoning)	89%	$0.0008	29.5s	84%
DeepSeek V4 Flash	88%	$0.0006	24.0s	84%
Claude Sonnet 5 (Reasoning, Low)	91%	$0.042	45.5s	86%
GPT-5.4 Mini (Reasoning, Low)	88%	$0.014	15.7s	84%
GPT-5.4	92%	$0.044	1.3m	87%
GPT-5.1	90%	$0.052	1.6m	89%
Qwen 3.6 Flash	88%	$0.012	45.2s	84%
Claude Sonnet 4.5	90%	$0.042	39.8s	85%
Grok 4.3	87%	$0.0080	21.7s	84%
Claude Opus 4.7 (Reasoning)	92%	$0.095	35.4s	89%
Qwen3 235B A22B Instruct 2507	87%	$0.0011	47.2s	83%
Z.AI GLM 5 Turbo	88%	$0.0088	33.3s	82%
Grok 4.5 (Reasoning, Low)	90%	$0.017	1.5m	83%
Grok 4.5 (Reasoning, High)	91%	$0.042	2.2m	87%
Claude Haiku 4.5	87%	$0.015	23.7s	83%

Rank	Model	Avg. Cost	Avg. Time	Stability	# 1	# 2	# 3	# 4	# 5	Total
2	GPT-5.4 (Reasoning, Low)	$0.054	1.2m	91%	95	94	93	93	92	93%
15	Claude Opus 4.7 (Reasoning)	$0.095	35.4s	89%	95	95	94	89	89	92%
23	Claude Opus 4.6 (Reasoning)	$0.091	1.2m	88%	94	94	92	89	89	92%
10	GPT-5.4	$0.044	1.3m	87%	94	94	90	90	89	92%
44	GPT-5.4 (Reasoning)	$0.090	2.6m	86%	96	92	92	90	86	91%
22	Claude Opus 4.8 (Reasoning)	$0.089	43.9s	88%	93	92	91	90	88	91%
8	Claude Sonnet 5 (Reasoning, Low)	$0.042	45.5s	86%	94	93	90	90	88	91%
19	Grok 4.5 (Reasoning, High)	$0.042	2.2m	87%	95	92	92	88	86	91%
1	GPT-5.4 Mini (Reasoning)	$0.026	33.4s	90%	91	91	91	90	89	90%
11	GPT-5.1	$0.052	1.6m	89%	91	91	91	90	89	90%
24	Claude Opus 4.8 (Reasoning, Low)	$0.086	42.9s	87%	92	91	90	88	88	90%
13	Claude Sonnet 4.5	$0.042	39.8s	85%	94	91	89	88	88	90%
18	Grok 4.5 (Reasoning, Low)	$0.017	1.5m	83%	94	93	88	88	87	90%
4	Z.AI GLM 5.2 (Reasoning, High)	$0.011	57.3s	87%	92	91	90	89	87	90%
59	GPT-5.5	$0.114	1.5m	85%	93	91	89	89	86	90%
3	GPT-5.4 Mini	$0.014	15.4s	87%	91	90	89	89	88	89%
80	GPT-5.5 (Reasoning, Low)	$0.113	1.5m	82%	94	94	90	85	83	89%
5	Writer: Palmyra X5	$0.013	21.6s	85%	92	90	88	88	87	89%
31	Claude Sonnet 5 (Reasoning)	$0.045	48.1s	82%	94	91	87	87	85	89%
25	Claude Sonnet 4.6	$0.037	40.6s	82%	93	91	87	87	86	89%
77	GPT-5.5 (Reasoning)	$0.139	1.8m	87%	90	90	89	88	87	89%
6	DeepSeek V4 Flash (Reasoning)	$0.0008	29.5s	84%	91	91	88	87	86	89%
12	Qwen 3.6 Flash	$0.012	45.2s	84%	92	89	88	88	85	88%
131	Claude Opus 4	$0.232	2.4m	84%	92	91	89	86	85	88%
29	Z.AI GLM 5	$0.013	2.1m	82%	93	89	87	87	85	88%
39	Claude Sonnet 5	$0.034	34.7s	79%	95	91	87	86	82	88%
17	Z.AI GLM 5 Turbo	$0.0088	33.3s	82%	93	88	87	87	85	88%
28	Aion 3.0	$0.033	1.2m	83%	92	90	88	86	84	88%
35	Qwen 3.5 397B A17B	$0.013	2.4m	82%	92	91	87	85	85	88%
9	GPT-5.4 Mini (Reasoning, Low)	$0.014	15.7s	84%	91	89	88	87	84	88%
27	Z.AI GLM 5.1	$0.017	1.1m	81%	94	89	87	85	84	88%
7	DeepSeek V4 Flash	$0.0006	24.0s	84%	90	90	87	86	86	88%
26	Qwen 3.5 Plus (2026-04-20)	$0.019	1.8m	84%	91	89	87	87	85	88%
21	Xiaomi MIMO v2.5 Pro	$0.0093	54.2s	83%	92	88	87	86	85	88%
66	MiniMax M3	$0.0100	4.2m	81%	94	88	87	85	83	87%
61	Qwen3.6 Max Preview	$0.051	3.2m	84%	90	89	88	86	84	87%
48	GPT-5	$0.063	2.5m	86%	89	87	87	87	86	87%
50	Aion 3.0 Mini	$0.0074	1.2m	75%	95	92	85	84	80	87%
52	Claude Opus 4.6	$0.077	1.2m	83%	91	89	88	85	84	87%
20	Claude Haiku 4.5	$0.015	23.7s	83%	89	89	87	86	84	87%
16	Qwen3 235B A22B Instruct 2507	$0.0011	47.2s	83%	89	89	86	86	84	87%
78	Claude Sonnet 4.6 (Reasoning)	$0.075	1.2m	79%	93	89	86	84	83	87%
14	Grok 4.3	$0.0080	21.7s	84%	90	88	87	84	84	87%
36	DeepSeek V4 Pro (Reasoning)	$0.011	2.8m	84%	89	87	86	86	84	86%
137	MoonshotAI: Kimi K2.6	$0.091	11.0m	83%	89	86	86	85	84	86%
37	Grok 4.20	$0.010	37.8s	78%	91	89	84	84	83	86%
62	Claude Opus 4.7	$0.091	34.2s	84%	88	87	86	85	83	86%
41	Grok 4.20 (Reasoning)	$0.017	1.3m	80%	92	87	87	84	80	86%
47	Qwen 3.6 35B	$0.016	2.4m	82%	89	89	87	83	82	86%
34	DeepSeek V4 Pro	$0.011	1.3m	81%	90	87	87	84	80	86%
32	MiniMax M2.7	$0.0043	1.3m	81%	89	87	86	83	82	85%
42	WizardLM 2 8x22b	$0.0039	1.9m	80%	90	87	85	83	82	85%
53	Claude Sonnet 4	$0.035	39.0s	79%	90	88	86	81	80	85%
90	Hermes 3 70B	$0.0018	1.3m	70%	95	91	82	80	75	85%
70	Claude Opus 4.5	$0.067	40.4s	81%	88	87	85	82	81	85%
60	MoonshotAI: Kimi K2.5	$0.021	2.4m	81%	88	86	85	83	82	85%
40	MiniMax M2.5	$0.0038	1.5m	81%	88	86	84	83	82	85%
55	Hermes 3 405B	$0.0049	25.9s	75%	91	88	83	82	79	84%
98	Qwen3.7 Max	$0.081	2.6m	81%	87	87	85	83	80	84%
76	o4 Mini High	$0.037	1.3m	78%	89	85	83	83	81	84%
33	o4 Mini	$0.017	30.1s	81%	87	85	84	83	82	84%
68	Gemma 3 27B	$0.0007	59.9s	75%	91	86	82	81	79	84%
69	ByteDance Seed 2.0 Lite	$0.011	1.9m	78%	89	84	83	81	81	84%
38	Qwen 3.5 122B	$0.015	31.5s	81%	86	84	83	83	82	84%
30	GPT-5.4 Nano	$0.0053	18.9s	82%	85	84	83	83	82	83%
45	Mistral Small 4 (Reasoning)	$0.0026	32.2s	78%	88	83	82	82	81	83%
64	Grok 4.3 (Reasoning)	$0.017	1.6m	80%	86	85	83	82	82	83%
127	Gemini 3.1 Pro (Preview)	$0.143	2.1m	80%	86	84	82	82	82	83%
84	Gemini 3.5 Flash (Reasoning)	$0.056	30.9s	78%	89	83	83	82	81	83%
51	Mistral Large 3	$0.0046	32.8s	77%	90	84	83	81	79	83%
43	GPT-5.4 Nano (Reasoning)	$0.0059	22.2s	79%	85	85	82	82	81	83%
91	GPT-5.2	$0.062	1.5m	79%	87	85	84	81	79	83%
46	Mistral Small 4	$0.0017	17.4s	78%	86	84	83	82	77	83%
83	DeepSeek V3 (2025-03-24)	$0.0022	16.1s	72%	89	89	81	77	77	82%
49	Qwen 3.5 35B	$0.013	41.6s	80%	84	83	82	81	80	82%
54	Qwen 3.5 9B	$0.0013	1.3m	79%	85	84	84	81	78	82%
57	Aion 2.0	$0.0078	1.1m	80%	84	84	82	81	81	82%
67	GPT-5.4 Nano (Reasoning, Low)	$0.0050	22.8s	75%	87	84	83	83	73	82%
58	Qwen 3 32B	$0.0014	44.4s	78%	86	84	83	82	77	82%
56	Cydonia 24B V4.1	$0.0021	48.7s	78%	84	84	83	82	77	82%
87	Mistral Large 2	$0.019	35.7s	74%	89	84	83	81	73	82%
92	ByteDance Seed 1.6 Flash	$0.0012	23.4s	70%	93	82	79	78	77	82%
73	DeepSeek V3.2	$0.0020	1.6m	78%	85	82	81	80	79	82%
88	Xiaomi MIMO v2.5	$0.0061	34.8s	73%	88	84	80	79	76	81%
72	GPT-5 Mini	$0.0100	1.1m	78%	84	83	83	79	77	81%
82	Z.AI GLM 4.5	$0.0054	34.0s	75%	84	84	79	79	78	81%
71	Z.AI GLM 4.5 Air	$0.0020	35.0s	77%	84	83	81	79	77	81%
79	Mistral Medium 3.1	$0.0056	39.4s	76%	86	80	80	79	78	81%
110	ByteDance Seed 2.0 Mini	$0.0041	4.1m	75%	86	82	80	78	77	81%
65	Qwen 3.5 Flash	$0.0028	41.1s	78%	83	82	81	79	78	80%
63	Mistral NeMO	$0.0009	9.6s	77%	82	82	81	80	76	80%
74	Z.AI GLM 4.6	$0.0068	52.0s	78%	82	81	80	79	79	80%
95	DeepSeek V3.1	$0.0020	2.0m	75%	85	82	80	80	75	80%
86	Z.AI GLM 4.7	$0.011	1.4m	78%	82	82	81	80	76	80%
81	Qwen 3.5 Plus (2026-02-15)	$0.0075	33.3s	76%	82	82	81	80	75	80%
96	GPT-4.1	$0.021	38.6s	74%	84	82	79	78	76	80%
113	Qwen 3.5 27B	$0.041	2.9m	77%	82	81	81	77	77	79%
75	Ministral 3 14B	$0.0010	7.2s	76%	82	81	80	80	74	79%
103	Gemini 2.5 Pro	$0.037	33.9s	74%	84	82	80	75	75	79%
85	DeepSeek-V2 Chat	$0.0028	56.9s	76%	81	81	80	78	76	79%
89	Gemini 2.5 Flash	$0.0057	10.0s	74%	83	81	78	78	75	79%
107	Gemma 4 31B (Reasoning)	$0.0019	4.1m	77%	80	80	79	78	77	79%
122	Qwen 3.6 27B	$0.024	2.2m	72%	85	81	78	77	74	79%
117	ByteDance Seed 1.6	$0.011	2.0m	72%	84	83	80	78	69	79%
104	Cohere Command R+ (Aug. 2024)	$0.024	32.9s	73%	85	80	79	76	73	79%
97	Gemini 3.5 Flash (Reasoning, Minimal)	$0.020	13.4s	74%	83	80	78	78	74	79%
93	Z.AI GLM 4.7 Flash	$0.0018	1.5m	76%	81	80	78	77	76	78%
101	Gemma 3 12B	$0.0004	52.6s	72%	85	82	81	72	71	78%
99	GPT-4o, Aug. 6th (temp=1)	$0.023	19.8s	74%	80	79	78	78	73	78%
105	Gemini 2.5 Flash (Reasoning)	$0.014	24.8s	72%	84	80	78	76	71	78%
94	GPT-4o, Aug. 6th (temp=0)	$0.019	20.5s	76%	80	79	78	76	75	78%
106	Gemini 2.5 Flash Lite	$0.0011	8.9s	69%	83	81	76	74	73	77%
100	Gemma 4 26B	$0.0010	1.0m	73%	82	78	78	75	73	77%
102	GPT-4o Mini (temp=1)	$0.0014	38.4s	72%	80	80	76	76	73	77%
119	Gemma 4 31B	$0.0013	2.0m	73%	80	78	78	72	72	76%
112	Gemini 3 Flash (Preview, Reasoning)	$0.011	26.6s	71%	80	78	75	75	72	76%
129	Gemma 4 26B (Reasoning)	$0.0019	1.8m	70%	81	77	74	74	73	76%
111	Ministral 3 3B	$0.0005	3.1s	69%	80	77	74	72	72	75%
108	Ministral 3 8B	$0.0008	7.7s	70%	80	75	75	73	72	75%
115	GPT-4.1 Mini	$0.0030	16.8s	71%	80	75	74	74	72	75%
109	Qwen 2.5 72B	$0.0010	31.4s	72%	78	76	75	73	71	75%
114	Gemini 3.1 Flash Lite (Preview)	$0.0036	9.0s	70%	78	76	74	73	72	75%
124	DeepSeek V3 (2024-12-26)	$0.0028	1.1m	71%	78	77	77	73	68	75%
118	Ministral 8B	$0.0006	6.9s	70%	79	78	77	71	67	74%
125	Llama 3.1 70B	$0.0020	17.5s	68%	80	78	76	73	65	74%
140	Mistral Small 3.2 24B	$0.0071	6.4m	70%	77	77	74	73	68	74%
123	Nemotron 3 Super	$0.0000	38.9s	71%	77	75	74	72	71	74%
130	Arcee AI: Trinity Mini	$0.0004	8.2s	66%	81	76	74	71	66	74%
126	Gemini 3 Flash (Preview)	$0.0079	19.4s	70%	78	75	74	71	69	73%
120	Gemma 3 4B	$0.0003	21.3s	71%	76	74	74	73	70	73%
116	Ministral 3B	$0.0002	2.6s	71%	76	74	74	72	70	73%
121	GPT-4o Mini (temp=0)	$0.0013	31.3s	72%	75	74	74	72	70	73%
128	GPT-4.1 Nano	$0.0009	16.6s	69%	75	74	74	71	66	72%
133	Gemini 3.1 Flash Lite (Reasoning)	$0.0033	9.5s	66%	78	71	70	70	69	72%
132	Inception Mercury 2	$0.0031	6.3s	67%	75	74	70	70	68	71%
134	Gemini 3.1 Flash Lite	$0.0035	8.5s	63%	80	72	70	66	66	71%
135	Nemotron 3 Nano	$0.0011	55.7s	64%	75	71	68	68	68	70%
138	GPT-OSS 120B	$0.0011	1.7m	64%	75	71	69	67	65	69%
136	Gemini 2.5 Flash Lite (Reasoning)	$0.0031	24.6s	64%	72	71	68	66	65	68%
139	GPT-5 Nano	$0.0046	1.5m	65%	68	66	66	66	65	66%
82.42%

Median	Evaluator	Top 3	Flop 3
99.0%	"Not X but Y" pattern overuse	100Z.AI GLM 5 100Ministral 3 8B 100o4 Mini High	14GPT-5 Nano 35MoonshotAI: Kimi K2.5 40GPT-5.2
48.2%	Adverb-first sentence starts	100Writer: Palmyra X5 100Z.AI GLM 5.2 (Reasoning, High) 98Qwen3 235B A22B Instruct 2507	0Llama 3.1 70B 0GPT-OSS 120B 0Nemotron 3 Nano
100.0%	Adverbs in dialogue tags	100Qwen 3.5 27B 100ByteDance Seed 2.0 Mini 100Qwen 3.5 9B	28GPT-4.1 Nano 64GPT-4.1 Mini 66Hermes 3 405B
92.3%	AI-ism adverb frequency	100Qwen3.7 Max 99GPT-5.4 (Reasoning) 99Grok 4.3 (Reasoning)	64GPT-4.1 Nano 73Gemini 2.5 Flash Lite (Reasoning) 74Z.AI GLM 4.5
100.0%	AI-ism character names	100Gemma 4 31B 100Mistral Large 2 100Qwen3 235B A22B Instruct 2507	80Z.AI GLM 5 80Grok 4.20 84Xiaomi MIMO v2.5 Pro
100.0%	AI-ism location names	100Gemma 4 26B (Reasoning) 100Z.AI GLM 5 Turbo 100GPT-4o, Aug. 6th (temp=0)	—
51.0%	AI-ism word frequency	90Claude Opus 4.7 (Reasoning) 87Claude Opus 4.7 85GPT-5.4 Mini (Reasoning)	3GPT-4o, Aug. 6th (temp=0) 3GPT-4o Mini (temp=1) 5GPT-4o, Aug. 6th (temp=1)
100.0%	Cliché density	100Ministral 3 8B 100Mistral Small 4 100Qwen 3 32B	27Mistral Small 3.2 24B 40Qwen 2.5 72B 40GPT-4o Mini (temp=0)
91.7%	Dialogue tag variety (said vs. fancy)	100Llama 3.1 70B 100Mistral Large 2 100Z.AI GLM 5.2 (Reasoning, High)	0Inception Mercury 2 0Nemotron 3 Nano 0Gemma 3 4B
98.8%	Em-dash & semicolon overuse	100Qwen 3.5 Plus (2026-02-15) 100GPT-5.1 100MiniMax M3	0Mistral Small 4 0Mistral Small 4 (Reasoning) 0GPT-4o Mini (temp=1)
100.0%	Emotion telling (show vs. tell)	100Gemini 3.1 Flash Lite (Reasoning) 100Claude Opus 4.5 100Claude Opus 4.6	93Ministral 3B 94Cohere Command R+ (Aug. 2024) 95Mistral Small 3.2 24B
98.1%	Filter word density	100Qwen 3.5 35B 100Claude Opus 4.7 100Gemini 3.1 Pro (Preview)	48Llama 3.1 70B 49Gemini 3.1 Flash Lite 53Gemma 4 26B (Reasoning)
100.0%	Gibberish response detection	100GPT-4.1 100Z.AI GLM 5.1 100Aion 2.0	80DeepSeek V4 Pro (Reasoning) 93DeepSeek V3 (2025-03-24) 98MiniMax M2.7
100.0%	Markdown formatting overuse	100Inception Mercury 2 100Claude Sonnet 5 100Gemini 3 Flash (Preview)	80Ministral 3B 80Ministral 3 3B 86ByteDance Seed 1.6 Flash
100.0%	Missing dialogue indicators (quotation marks)	100Grok 4.20 (Reasoning) 100DeepSeek V3 (2024-12-26) 100Ministral 3 14B	60GPT-5 60Qwen3.6 Max Preview 80Qwen 3.5 27B
51.0%	Name drop frequency	97Claude Opus 4.7 97Claude Opus 4.7 (Reasoning) 96Claude Sonnet 5	3Z.AI GLM 4.5 3Mistral Small 3.2 24B 6Qwen 3.5 122B
87.8%	Narrator intent-glossing	100GPT-5.5 (Reasoning, Low) 100GPT-5.4 (Reasoning, Low) 100GPT-5.5 (Reasoning)	2GPT-5 Nano 36Gemma 4 26B (Reasoning) 40Claude Sonnet 4
100.0%	Overuse of "that" (subordinate clause padding)	100MoonshotAI: Kimi K2.6 100Qwen3.7 Max 100Gemini 3 Flash (Preview)	67Llama 3.1 70B 69ByteDance Seed 2.0 Mini 85Claude Sonnet 5
100.0%	Paragraph length variance	100Mistral Large 2 100GPT-5.4 Nano (Reasoning) 100Gemini 3.1 Flash Lite	75Nemotron 3 Nano 76Inception Mercury 2 77Gemini 2.5 Flash Lite (Reasoning)
96.3%	Passive voice overuse	100Grok 4.5 (Reasoning, Low) 100Grok 4.5 (Reasoning, High) 100Qwen 3.5 Plus (2026-04-20)	77ByteDance Seed 2.0 Lite 84Qwen 2.5 72B 86Gemma 4 26B
98.0%	Past progressive (was/were + -ing) overuse	100GPT-5.4 (Reasoning) 100MoonshotAI: Kimi K2.6 100GPT-5 Nano	17Z.AI GLM 4.7 Flash 23Xiaomi MIMO v2.5 29Z.AI GLM 4.7
97.3%	Pronoun-first sentence starts	100Ministral 3B 100DeepSeek V4 Flash 100Z.AI GLM 4.5 Air	40Gemini 3.1 Flash Lite (Reasoning) 44GPT-4.1 Nano 49Qwen 3.5 27B
97.6%	Purple prose (modifier overload)	100Claude Opus 4.7 (Reasoning) 100Qwen 3.6 Flash 100GPT-5.2	79Gemini 3 Flash (Preview, Reasoning) 81Gemini 2.5 Flash Lite (Reasoning) 82Gemma 3 4B
100.0%	Repeated phrase echo	100Qwen 3.6 35B 100Qwen 2.5 72B 100Xiaomi MIMO v2.5	—
100.0%	Sentence length variance	100Z.AI GLM 5.1 100Grok 4.3 100GPT-5	80Qwen 3.5 27B 93Nemotron 3 Nano 95Qwen 3.5 9B
65.4%	Sentence opener variety	98GPT-4o, Aug. 6th (temp=1) 96Hermes 3 405B 95Claude Sonnet 5 (Reasoning, Low)	34GPT-5 Nano 35Qwen 3.5 9B 38Qwen 3.5 35B
41.2%	Subject-first sentence starts	100Writer: Palmyra X5 97Qwen3 235B A22B Instruct 2507 93GPT-5.4 (Reasoning, Low)	0Inception Mercury 2 0Nemotron 3 Nano 0Qwen 3.5 Flash
20.7%	Subordinate conjunction sentence starts	93GPT-4o Mini (temp=1) 83Mistral NeMO 73Claude Haiku 4.5	0Gemini 3.1 Pro (Preview) 0Ministral 8B 0Grok 4.3 (Reasoning)
79.3%	Technical jargon density	100GPT-5.4 (Reasoning) 100Qwen3.7 Max 100Aion 3.0	1GPT-5 Nano 12Gemini 2.5 Flash Lite (Reasoning) 25MiniMax M2.7
73.3%	Useless dialogue additions	100DeepSeek V4 Pro 100Claude Opus 4 100Claude Opus 4.6 (Reasoning)	0Ministral 3B 0Mistral Small 3.2 24B 0Gemini 2.5 Flash Lite (Reasoning)

Bad Writing Habits

Mystery: examining a crime scene

Performance Score Distribution (Top 20)

Price-Performance Score Distribution (Top 20)

Most Stable Models (Top 20)

Top Overall Models (Top 20)