Mystery: examining a crime scene

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Creative Writing Hallucination

Performance Score Distribution (Top 20)

Click a model name to view its detail page.

	Score
GPT-5.4 (Reasoning)	93%
Grok 4.20 (Reasoning)	92%
Qwen3.6 Max Preview	92%
GPT-5.4	91%
GPT-5.1	90%
GPT-5.5	89%
GPT-5.4 (Reasoning, Low)	89%
Cohere Command R+ (Aug. 2024)	89%
GPT-5.4 Mini (Reasoning)	89%
Grok 4.5 (Reasoning, Low)	88%
GPT-5	88%
GPT-5.5 (Reasoning)	88%
ByteDance Seed 2.0 Lite	87%
Claude Sonnet 5	87%
Grok 4.5 (Reasoning, High)	87%
Qwen 3.6 27B	87%
Qwen 3.6 35B	87%
MoonshotAI: Kimi K2.6	87%
GPT-5.5 (Reasoning, Low)	87%
GPT-5.4 Mini (Reasoning, Low)	87%

	Score	Cost	Time
Grok 4.20 (Reasoning)	92%	$0.020	2.0m
Z.AI GLM 5 Turbo	86%	$0.0062	27.2s
Grok 4.5 (Reasoning, Low)	88%	$0.016	50.8s
Cohere Command R+ (Aug. 2024)	89%	$0.017	52.5s
GPT-5.4 Mini	87%	$0.015	17.5s
GPT-5.4 Mini (Reasoning)	89%	$0.025	33.6s
GPT-5.4 Mini (Reasoning, Low)	87%	$0.014	16.6s
Mistral Medium 3.1	84%	$0.0039	28.3s
GPT-5.4	91%	$0.047	1.4m
Mistral Small 4 (Reasoning)	84%	$0.0021	28.4s
Z.AI GLM 5.1	87%	$0.012	1.8m
Qwen 3.6 35B	87%	$0.0068	59.4s
Qwen 3.6 Flash	85%	$0.0096	40.8s
Grok 4.20	83%	$0.0078	42.9s
Xiaomi MIMO v2.5	82%	$0.0045	28.2s
Claude Sonnet 4.6	85%	$0.026	39.4s
Hermes 3 405B	82%	$0.0019	37.5s
Grok 4.3	84%	$0.0058	29.1s
Mistral Large 2	84%	$0.0090	23.4s
Qwen 3.5 Flash	84%	$0.0033	55.3s

	Score	Cost	Speed	Stability
Grok 4.20 (Reasoning)	92%	$0.020	2.0m	87%
Grok 4.5 (Reasoning, Low)	88%	$0.016	50.8s	87%
GPT-5.4 Mini (Reasoning)	89%	$0.025	33.6s	86%
GPT-5.4	91%	$0.047	1.4m	89%
Z.AI GLM 5 Turbo	86%	$0.0062	27.2s	84%
GPT-5.4 Mini	87%	$0.015	17.5s	84%
GPT-5.4 Mini (Reasoning, Low)	87%	$0.014	16.6s	83%
Cohere Command R+ (Aug. 2024)	89%	$0.017	52.5s	81%
Mistral Large 2	84%	$0.0090	23.4s	83%
Mistral Medium 3.1	84%	$0.0039	28.3s	81%
Qwen3.6 Max Preview	92%	$0.048	3.4m	89%
Claude Sonnet 5	87%	$0.024	33.6s	81%
GPT-5.4 (Reasoning, Low)	89%	$0.055	1.4m	87%
Grok 4.3	84%	$0.0058	29.1s	81%
Qwen3 235B A22B Instruct 2507	85%	$0.0008	1.0m	80%
GPT-5.1	90%	$0.053	2.2m	87%
Z.AI GLM 5.1	87%	$0.012	1.8m	81%
GPT-5.4 Nano (Reasoning, Low)	84%	$0.0068	24.2s	79%
Qwen 3.6 35B	87%	$0.0068	59.4s	78%
GPT-5.4 Nano (Reasoning)	84%	$0.0079	32.4s	80%

Rank	Model	Avg. Cost	Avg. Time	Stability	# 1	# 2	# 3	# 4	# 5	Total
26	GPT-5.4 (Reasoning)	$0.087	2.5m	90%	95	94	93	91	90	93%
1	Grok 4.20 (Reasoning)	$0.020	2.0m	87%	98	92	92	90	89	92%
11	Qwen3.6 Max Preview	$0.048	3.4m	89%	95	93	92	91	90	92%
4	GPT-5.4	$0.047	1.4m	89%	93	92	92	90	89	91%
16	GPT-5.1	$0.053	2.2m	87%	93	92	92	89	85	90%
92	GPT-5.5	$0.125	1.7m	87%	92	90	90	88	88	89%
13	GPT-5.4 (Reasoning, Low)	$0.055	1.4m	87%	92	90	89	88	87	89%
8	Cohere Command R+ (Aug. 2024)	$0.017	52.5s	81%	94	92	88	86	83	89%
3	GPT-5.4 Mini (Reasoning)	$0.025	33.6s	86%	90	89	89	87	87	89%
2	Grok 4.5 (Reasoning, Low)	$0.016	50.8s	87%	90	89	89	87	86	88%
76	GPT-5	$0.067	2.7m	83%	93	89	88	87	84	88%
115	GPT-5.5 (Reasoning)	$0.138	1.8m	86%	89	88	87	87	87	88%
21	ByteDance Seed 2.0 Lite	$0.013	2.4m	82%	91	90	87	85	84	87%
12	Claude Sonnet 5	$0.024	33.6s	81%	92	88	86	85	85	87%
27	Grok 4.5 (Reasoning, High)	$0.034	1.8m	83%	90	89	87	87	83	87%
24	Qwen 3.6 27B	$0.025	2.2m	84%	90	88	88	86	83	87%
19	Qwen 3.6 35B	$0.0068	59.4s	78%	94	91	87	83	79	87%
123	MoonshotAI: Kimi K2.6	$0.057	6.7m	84%	89	88	86	86	85	87%
128	GPT-5.5 (Reasoning, Low)	$0.139	1.7m	84%	89	88	88	86	83	87%
7	GPT-5.4 Mini (Reasoning, Low)	$0.014	16.6s	83%	89	88	86	86	84	87%
6	GPT-5.4 Mini	$0.015	17.5s	84%	88	87	87	86	84	87%
17	Z.AI GLM 5.1	$0.012	1.8m	81%	91	90	87	83	82	87%
60	Claude Opus 4.8 (Reasoning, Low)	$0.065	43.5s	81%	90	89	86	84	83	86%
34	Aion 3.0	$0.017	55.5s	77%	94	88	84	84	83	86%
50	Claude Opus 4.7 (Reasoning)	$0.066	32.0s	83%	89	88	86	85	84	86%
5	Z.AI GLM 5 Turbo	$0.0062	27.2s	84%	87	87	86	86	84	86%
47	MiniMax M3	$0.0056	2.8m	78%	93	85	85	84	81	86%
54	Claude Opus 4.7	$0.067	34.8s	82%	88	87	86	85	83	86%
89	Claude Opus 4.6 (Reasoning)	$0.078	1.3m	81%	89	86	85	84	84	86%
51	Grok 4.3 (Reasoning)	$0.026	2.8m	81%	89	87	86	85	82	86%
22	Qwen 3.6 Flash	$0.0096	40.8s	78%	90	89	85	82	81	85%
32	Qwen 3.5 Plus (2026-04-20)	$0.015	1.6m	80%	89	89	85	83	81	85%
82	Qwen 3.5 397B A17B	$0.011	5.2m	81%	89	86	86	83	82	85%
35	Claude Sonnet 4.6	$0.026	39.4s	79%	90	88	87	83	78	85%
93	Claude Opus 4.6	$0.073	1.2m	80%	90	85	85	83	82	85%
15	Qwen3 235B A22B Instruct 2507	$0.0008	1.0m	80%	89	85	84	83	82	85%
129	ByteDance Seed 2.0 Mini	$0.0049	5.7m	71%	94	91	83	81	74	85%
28	Qwen 3.5 Flash	$0.0033	55.3s	78%	89	87	83	81	80	84%
29	o4 Mini	$0.014	24.5s	78%	90	84	84	81	81	84%
9	Mistral Large 2	$0.0090	23.4s	83%	85	84	84	84	83	84%
86	Claude Opus 4.8 (Reasoning)	$0.065	43.1s	80%	87	86	84	83	80	84%
10	Mistral Medium 3.1	$0.0039	28.3s	81%	86	85	85	84	80	84%
41	Aion 3.0 Mini	$0.0046	1.1m	75%	93	85	84	81	77	84%
20	GPT-5.4 Nano (Reasoning)	$0.0079	32.4s	80%	88	84	83	82	81	84%
25	Mistral Small 4 (Reasoning)	$0.0021	28.4s	78%	89	85	84	82	78	84%
18	GPT-5.4 Nano (Reasoning, Low)	$0.0068	24.2s	79%	87	85	83	81	81	84%
14	Grok 4.3	$0.0058	29.1s	81%	87	84	84	82	81	84%
127	Qwen 3.5 122B	$0.067	3.8m	80%	86	85	82	82	82	83%
38	MiniMax M2.7	$0.0044	1.2m	77%	90	86	86	79	76	83%
58	DeepSeek V4 Pro (Reasoning)	$0.0086	2.2m	78%	88	84	82	82	81	83%
36	Grok 4.20	$0.0078	42.9s	77%	89	86	85	80	77	83%
52	Claude Sonnet 5 (Reasoning)	$0.029	39.4s	78%	88	84	82	82	81	83%
40	DeepSeek V4 Pro	$0.0018	1.1m	76%	89	86	82	79	79	83%
65	Writer: Palmyra X5	$0.010	20.7s	72%	90	88	83	82	72	83%
53	Claude Sonnet 4.6 (Reasoning)	$0.037	54.6s	81%	85	85	84	81	80	83%
37	Z.AI GLM 5.2 (Reasoning, High)	$0.0091	52.5s	78%	87	84	82	81	80	83%
42	ByteDance Seed 1.6 Flash	$0.0011	23.5s	74%	92	84	81	79	79	83%
31	DeepSeek V4 Flash (Reasoning)	$0.0007	36.0s	77%	87	85	83	81	77	83%
39	Qwen 3.5 9B	$0.0018	2.7m	81%	84	84	84	83	80	83%
69	Claude Sonnet 4.5	$0.031	38.6s	76%	89	82	81	81	81	83%
43	Z.AI GLM 5	$0.0068	1.0m	77%	87	85	82	82	78	83%
66	Claude Sonnet 5 (Reasoning, Low)	$0.030	40.9s	77%	87	85	83	80	77	82%
62	o4 Mini High	$0.023	50.3s	77%	87	84	81	80	80	82%
30	GPT-5.4 Nano	$0.0077	28.8s	79%	85	84	82	81	81	82%
45	Xiaomi MIMO v2.5	$0.0045	28.2s	75%	88	84	83	83	73	82%
55	Qwen 3.5 27B	$0.022	1.8m	81%	83	83	83	81	81	82%
136	Claude Opus 4	$0.137	59.1s	77%	85	85	81	81	80	82%
71	Gemini 2.5 Pro	$0.031	32.1s	76%	86	84	81	81	78	82%
131	Gemini 3.1 Pro (Preview)	$0.098	1.6m	79%	84	83	81	81	81	82%
113	Gemini 3.5 Flash (Reasoning)	$0.078	42.0s	78%	84	84	81	81	81	82%
57	MoonshotAI: Kimi K2.5	$0.014	1.1m	77%	86	85	83	80	76	82%
46	Hermes 3 405B	$0.0019	37.5s	75%	88	85	84	79	74	82%
97	GPT-5.2	$0.054	1.5m	80%	83	82	81	81	81	82%
44	DeepSeek V3 (2025-03-24)	$0.0009	50.3s	76%	87	83	83	80	75	82%
120	Qwen3.7 Max	$0.067	2.4m	80%	84	82	82	80	80	82%
23	DeepSeek V4 Flash	$0.0006	27.1s	80%	83	82	82	81	80	81%
80	Qwen 3 32B	$0.0013	1.3m	73%	89	83	80	78	76	81%
33	Mistral Large 3	$0.0024	24.5s	78%	84	83	82	79	78	81%
49	Z.AI GLM 4.6	$0.0056	1.0m	78%	85	83	81	79	79	81%
134	Qwen 3.5 35B	$0.067	3.4m	78%	85	81	81	80	79	81%
75	WizardLM 2 8x22b	$0.0019	2.2m	77%	85	82	81	81	76	81%
48	Gemma 3 27B	$0.0004	52.4s	77%	84	82	80	80	77	81%
59	GPT-5 Mini	$0.010	1.3m	79%	82	82	81	79	78	81%
79	GPT-4.1	$0.018	50.5s	76%	85	82	82	79	73	80%
83	Claude Haiku 4.5	$0.0091	19.3s	72%	87	85	83	78	69	80%
56	Mistral Small 4	$0.0010	14.7s	74%	86	84	81	77	74	80%
107	Claude Sonnet 4	$0.026	40.3s	72%	87	80	78	77	77	80%
67	DeepSeek V3.1	$0.0014	1.5m	77%	83	82	81	77	76	80%
122	ByteDance Seed 1.6	$0.015	3.0m	73%	85	84	81	79	71	80%
81	Aion 2.0	$0.0052	1.2m	75%	84	81	80	80	74	80%
130	Claude Opus 4.5	$0.071	59.5s	75%	84	81	79	79	76	80%
72	Gemini 2.5 Flash (Reasoning)	$0.011	21.7s	75%	85	81	80	77	76	80%
105	DeepSeek V3.2	$0.0011	2.2m	72%	86	81	78	76	76	79%
103	Gemma 4 31B (Reasoning)	$0.0012	2.5m	74%	83	81	78	78	75	79%
95	Gemini 3.5 Flash (Reasoning, Minimal)	$0.017	11.5s	73%	84	81	78	78	75	79%
61	Gemma 4 26B (Reasoning)	$0.0014	49.2s	77%	81	80	80	79	75	79%
85	DeepSeek V3 (2024-12-26)	$0.0017	1.1m	74%	83	81	80	79	72	79%
68	Gemma 4 26B	$0.0007	52.4s	76%	81	81	79	77	76	79%
63	Gemini 2.5 Flash	$0.0051	11.5s	76%	82	81	81	77	73	79%
74	Ministral 3 8B	$0.0003	5.1s	73%	83	81	78	76	75	79%
110	Xiaomi MIMO v2.5 Pro	$0.0066	44.1s	70%	87	80	77	75	75	79%
73	MiniMax M2.5	$0.0028	1.2m	77%	80	80	80	77	76	79%
64	Gemini 3.1 Flash Lite (Preview)	$0.0029	8.7s	75%	82	79	78	77	76	78%
88	GPT-4o, Aug. 6th (temp=0)	$0.016	17.3s	74%	81	81	79	76	74	78%
100	Gemini 3.1 Flash Lite (Reasoning)	$0.0028	17.0s	70%	86	84	80	72	69	78%
91	Qwen 3.5 Plus (2026-02-15)	$0.0053	33.7s	73%	84	78	78	77	74	78%
90	Ministral 8B	$0.0002	6.3s	71%	85	79	78	78	70	78%
77	Hermes 3 70B	$0.0006	34.2s	75%	81	80	79	75	75	78%
99	Z.AI GLM 4.7	$0.0097	1.3m	74%	81	79	78	78	74	78%
70	Ministral 3 14B	$0.0004	8.0s	74%	81	79	78	77	74	78%
104	Z.AI GLM 4.5	$0.0038	43.1s	71%	84	82	80	72	70	78%
111	Cydonia 24B V4.1	$0.0009	31.4s	68%	84	82	76	75	70	77%
132	Nemotron 3 Super	$0.0000	3.6m	71%	84	77	76	75	75	77%
78	Gemini 2.5 Flash Lite (Reasoning)	$0.0024	25.2s	75%	79	78	77	77	75	77%
98	Gemma 4 31B	$0.0008	1.4m	74%	80	79	78	76	73	77%
108	Gemini 3.1 Flash Lite	$0.0028	9.5s	69%	84	80	75	73	73	77%
124	GPT-4o, Aug. 6th (temp=1)	$0.016	19.8s	69%	86	80	78	71	70	77%
84	GPT-4.1 Mini	$0.0027	22.0s	74%	79	78	77	76	75	77%
94	Gemini 3 Flash (Preview)	$0.0087	23.0s	74%	80	79	78	74	73	77%
96	GPT-4.1 Nano	$0.0007	15.4s	72%	81	77	76	75	74	77%
102	Gemini 2.5 Flash Lite	$0.0008	8.1s	70%	83	77	77	74	71	76%
121	Llama 3.1 70B	$0.0015	27.1s	68%	83	80	79	75	62	76%
87	Ministral 3B	$0.0001	3.0s	74%	78	77	77	74	73	76%
109	Arcee AI: Trinity Mini	$0.0002	8.5s	70%	82	78	76	74	69	76%
101	Mistral NeMO	$0.0004	7.5s	71%	79	78	75	73	72	76%
118	Gemma 3 12B	$0.0002	39.9s	69%	82	78	74	72	72	76%
106	GPT-4o Mini (temp=1)	$0.0010	32.6s	72%	79	79	77	73	70	76%
114	Z.AI GLM 4.7 Flash	$0.0017	1.2m	72%	78	77	75	74	72	75%
117	DeepSeek-V2 Chat	$0.0017	1.4m	71%	80	76	76	73	71	75%
116	Inception Mercury 2	$0.0023	5.5s	68%	80	80	74	71	71	75%
133	Z.AI GLM 4.5 Air	$0.0023	1.3m	67%	81	79	74	72	67	74%
119	Gemma 3 4B	$0.0002	21.2s	69%	77	77	75	74	67	74%
125	Qwen 2.5 72B	$0.0007	26.7s	69%	79	75	74	72	69	74%
112	Ministral 3 3B	$0.0002	2.7s	71%	77	75	74	73	70	74%
140	Mistral Small 3.2 24B	$0.010	9.7m	65%	83	72	72	72	68	73%
126	GPT-4o Mini (temp=0)	$0.0010	36.5s	70%	77	73	73	71	71	73%
137	Gemini 3 Flash (Preview, Reasoning)	$0.022	51.3s	65%	80	75	71	70	69	73%
135	GPT-OSS 120B	$0.0013	2.2m	70%	74	73	72	70	69	72%
138	GPT-5 Nano	$0.0039	1.2m	65%	74	70	69	67	64	69%
139	Nemotron 3 Nano	$0.0019	1.5m	63%	72	71	67	66	63	68%
81.54%

Median	Evaluator	Top 3	Flop 3
100.0%	"Not X but Y" pattern overuse	100Qwen 3.5 122B 100Claude Sonnet 4.5 100Gemini 3.1 Pro (Preview)	1GPT-5 Nano 51Aion 2.0 60Claude Sonnet 4.6
39.3%	Adverb-first sentence starts	100Qwen3 235B A22B Instruct 2507 98Mistral Small 4 (Reasoning) 98Mistral Medium 3.1	0Inception Mercury 2 0Gemini 3.1 Pro (Preview) 0Nemotron 3 Super
100.0%	Adverbs in dialogue tags	100Gemma 3 12B 100Qwen 3.5 9B 100MoonshotAI: Kimi K2.5	40Hermes 3 70B 59Claude Haiku 4.5 60Ministral 3B
91.9%	AI-ism adverb frequency	100ByteDance Seed 1.6 Flash 100MoonshotAI: Kimi K2.5 100Qwen3.7 Max	71GPT-4.1 Nano 76Claude Haiku 4.5 76Gemma 3 12B
96.0%	AI-ism character names	100ByteDance Seed 2.0 Lite 100Grok 4.3 100Qwen3.7 Max	68Claude Opus 4 72Claude Opus 4.5 76GPT-5.5 (Reasoning, Low)
100.0%	AI-ism location names	100Ministral 8B 100DeepSeek V3.1 100DeepSeek V4 Flash	—
50.5%	AI-ism word frequency	91Claude Sonnet 5 (Reasoning) 91Claude Opus 4.7 (Reasoning) 89Claude Sonnet 5	0GPT-4o, Aug. 6th (temp=1) 1GPT-4o Mini (temp=0) 5Mistral NeMO
100.0%	Cliché density	100Claude Opus 4.6 (Reasoning) 100Claude Sonnet 5 (Reasoning) 100DeepSeek V4 Flash	60DeepSeek-V2 Chat 67Inception Mercury 2 67Llama 3.1 70B
95.2%	Dialogue tag variety (said vs. fancy)	100Z.AI GLM 5.1 100Qwen 3.5 27B 100GPT-5.4 Mini (Reasoning)	1GPT-4o, Aug. 6th (temp=1) 15Gemma 3 4B 23Gemini 2.5 Flash Lite
55.7%	Em-dash & semicolon overuse	100Qwen 3.5 122B 100Qwen 3.5 397B A17B 100Gemini 3.1 Pro (Preview)	0Qwen3 235B A22B Instruct 2507 0Claude Sonnet 4.6 0Claude Sonnet 4
100.0%	Emotion telling (show vs. tell)	100Qwen 3.5 Flash 100Gemini 3.5 Flash (Reasoning, Minimal) 100Mistral Small 4	80Mistral Small 3.2 24B 87Hermes 3 70B 97Cohere Command R+ (Aug. 2024)
97.7%	Filter word density	100GPT-5.4 100MoonshotAI: Kimi K2.6 100Claude Sonnet 4.6	20Nemotron 3 Nano 48Llama 3.1 70B 48Claude Sonnet 4
100.0%	Gibberish response detection	100Hermes 3 405B 100Claude Opus 4.8 (Reasoning) 100DeepSeek V3.1	60Llama 3.1 70B 80Qwen 3.6 35B 80DeepSeek V3 (2025-03-24)
100.0%	Markdown formatting overuse	100Gemini 3 Flash (Preview, Reasoning) 100Gemini 3.1 Flash Lite 100Ministral 3B	80Ministral 3 3B
100.0%	Missing dialogue indicators (quotation marks)	100Gemma 4 26B (Reasoning) 100Qwen 3.5 Plus (2026-04-20) 100DeepSeek V4 Flash	40Qwen 3.6 35B 60Qwen 3.5 122B 80Qwen 3.6 Flash
49.6%	Name drop frequency	100Grok 4.20 (Reasoning) 99Qwen3.6 Max Preview 97Claude Sonnet 4.6	0GPT-5.4 Nano (Reasoning) 0GPT-4o Mini (temp=0) 0GPT-5.4 Nano
86.6%	Narrator intent-glossing	100GPT-5.4 (Reasoning, Low) 100GPT-5.4 (Reasoning) 100Qwen 3.5 Flash	10GPT-5 Nano 22Nemotron 3 Nano 36Claude Opus 4
100.0%	Overuse of "that" (subordinate clause padding)	100GPT-5.4 100Qwen 3.5 Plus (2026-04-20) 100GPT-5.4 Mini	73ByteDance Seed 2.0 Lite 81Nemotron 3 Super 86Claude Sonnet 5
100.0%	Paragraph length variance	100GPT-5.4 100Gemma 3 27B 100Xiaomi MIMO v2.5	62Mistral Small 3.2 24B 70GPT-5 Nano 77Nemotron 3 Nano
95.0%	Passive voice overuse	100Qwen3.6 Max Preview 100Gemini 3.1 Pro (Preview) 100GPT-5.1	81Aion 2.0 82Claude Sonnet 5 (Reasoning) 83Claude Sonnet 5 (Reasoning, Low)
100.0%	Past progressive (was/were + -ing) overuse	100Mistral Medium 3.1 100Claude Sonnet 4 100Nemotron 3 Nano	0Z.AI GLM 4.7 Flash 41Z.AI GLM 4.6 47DeepSeek V3.2
92.7%	Pronoun-first sentence starts	100GPT-5.5 (Reasoning, Low) 100Grok 4.20 (Reasoning) 100Claude Opus 4.6 (Reasoning)	28Mistral Small 3.2 24B 35Gemini 3.1 Flash Lite 37Mistral NeMO
98.5%	Purple prose (modifier overload)	100DeepSeek V4 Flash 100WizardLM 2 8x22b 100Ministral 3B	87Mistral Small 3.2 24B 89Gemini 3.5 Flash (Reasoning, Minimal) 90Gemma 3 12B
100.0%	Repeated phrase echo	100Qwen 3 32B 100Gemini 2.5 Flash Lite 100Qwen3 235B A22B Instruct 2507	—
100.0%	Sentence length variance	100Gemma 4 26B (Reasoning) 100Aion 2.0 100Gemini 3.5 Flash (Reasoning, Minimal)	80Mistral Small 3.2 24B 88Nemotron 3 Nano 96GPT-OSS 120B
60.6%	Sentence opener variety	93Claude Sonnet 5 92Cydonia 24B V4.1 89Claude Sonnet 5 (Reasoning)	34GPT-5 Nano 35Nemotron 3 Nano 36Qwen 3.5 35B
30.4%	Subject-first sentence starts	88Qwen3 235B A22B Instruct 2507 88Writer: Palmyra X5 76GPT-5.4 (Reasoning)	0Inception Mercury 2 0Qwen 3.5 27B 0GPT-OSS 120B
20.0%	Subordinate conjunction sentence starts	94ByteDance Seed 2.0 Lite 77ByteDance Seed 1.6 66Writer: Palmyra X5	0Nemotron 3 Nano 0GPT-4.1 Mini 0Grok 4.5 (Reasoning, High)
84.1%	Technical jargon density	100GPT-5.4 Mini (Reasoning, Low) 100Hermes 3 405B 100Qwen 3.6 27B	0GPT-5 Nano 11Claude Sonnet 5 (Reasoning, Low) 13Nemotron 3 Nano
68.4%	Useless dialogue additions	100GPT-5.4 (Reasoning) 100GPT-5.5 (Reasoning) 100Qwen3.6 Max Preview	0Mistral NeMO 0Gemma 3 4B 0Mistral Small 3.2 24B

Bad Writing Habits

Mystery: examining a crime scene

Performance Score Distribution (Top 20)

Price-Performance Score Distribution (Top 20)

Most Stable Models (Top 20)

Top Overall Models (Top 20)