Mystery: examining a crime scene

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Creative Writing Hallucination

Performance Score Distribution (Top 20)

Click a model name to view its detail page.

	Score
GPT-5.4 (Reasoning)	92%
GPT-5.4	92%
GPT-5.5	91%
GPT-5.4 (Reasoning, Low)	91%
GPT-5.5 (Reasoning, Low)	90%
GPT-5.5 (Reasoning)	90%
GPT-5.4 Mini	89%
GPT-5.4 Mini (Reasoning)	89%
GPT-5.4 Mini (Reasoning, Low)	87%
Qwen3.6 Max Preview	87%
Qwen 3.6 35B	86%
Qwen 3.6 Flash	86%
Claude Opus 4.7 (Reasoning)	86%
Qwen3.7 Max	85%
GPT-5	85%
GPT-5.1	85%
MoonshotAI: Kimi K2.6	85%
Claude Sonnet 4.5	85%
Claude Opus 4.8 (Reasoning, Low)	84%
Grok 4.3	84%

	Score	Cost	Time
Qwen 3.6 35B	86%	$0.0090	58.9s
GPT-5.4 Mini (Reasoning, Low)	87%	$0.016	18.8s
GPT-5.4 Mini	89%	$0.017	19.5s
Qwen 3.6 Flash	86%	$0.0090	36.8s
Grok 4.3	84%	$0.0053	27.4s
Mistral Small 4 (Reasoning)	83%	$0.0022	30.7s
Qwen 3 32B	83%	$0.0012	39.7s
DeepSeek V4 Flash	83%	$0.0005	26.7s
GPT-5.4 Mini (Reasoning)	89%	$0.031	37.4s
GPT-5.4 Nano	80%	$0.0057	20.4s
Mistral Large 3	82%	$0.0026	28.4s
Mistral NeMO	81%	$0.0003	11.1s
Qwen3 235B A22B Instruct 2507	81%	$0.0011	1.1m
Mistral Small 4	79%	$0.0011	18.0s
GPT-5.4 Nano (Reasoning, Low)	81%	$0.0055	20.4s
Hermes 3 405B	82%	$0.0020	59.2s
DeepSeek V3 (2025-03-24)	80%	$0.0016	33.7s
Ministral 3 14B	79%	$0.0005	12.5s
MiniMax M2.5	82%	$0.0030	1.0m
Qwen 3.5 Flash	81%	$0.0019	36.5s

	Score	Cost	Speed	Stability
GPT-5.4 Mini (Reasoning, Low)	87%	$0.016	18.8s	84%
GPT-5.4 Mini	89%	$0.017	19.5s	80%
GPT-5.4 Mini (Reasoning)	89%	$0.031	37.4s	84%
GPT-5.4	92%	$0.057	1.6m	88%
Qwen 3.6 Flash	86%	$0.0090	36.8s	80%
Qwen 3.6 35B	86%	$0.0090	58.9s	81%
Grok 4.3	84%	$0.0053	27.4s	79%
Qwen 3 32B	83%	$0.0012	39.7s	80%
GPT-5.4 (Reasoning, Low)	91%	$0.060	1.6m	87%
Mistral Small 4 (Reasoning)	83%	$0.0022	30.7s	78%
Claude Sonnet 4.5	85%	$0.029	35.7s	81%
Mistral Large 3	82%	$0.0026	28.4s	76%
DeepSeek V4 Flash	83%	$0.0005	26.7s	75%
Claude Sonnet 5	82%	$0.024	36.0s	81%
GPT-5.4 Nano (Reasoning, Low)	81%	$0.0055	20.4s	77%
GPT-5.4 Nano	80%	$0.0057	20.4s	77%
Mistral NeMO	81%	$0.0003	11.1s	75%
Grok 4.20	82%	$0.0078	40.7s	77%
Writer: Palmyra X5	81%	$0.011	24.2s	77%
MiniMax M2.5	82%	$0.0030	1.0m	76%

Rank	Model	Avg. Cost	Avg. Time	Stability	# 1	# 2	# 3	# 4	# 5	Total
24	GPT-5.4 (Reasoning)	$0.100	3.1m	91%	94	93	93	91	91	92%
4	GPT-5.4	$0.057	1.6m	88%	95	92	92	90	89	92%
93	GPT-5.5	$0.166	1.8m	85%	95	90	90	90	88	91%
9	GPT-5.4 (Reasoning, Low)	$0.060	1.6m	87%	93	93	92	89	85	91%
87	GPT-5.5 (Reasoning, Low)	$0.156	2.1m	86%	93	92	90	89	88	90%
97	GPT-5.5 (Reasoning)	$0.164	2.1m	87%	91	91	89	89	88	90%
2	GPT-5.4 Mini	$0.017	19.5s	80%	95	90	87	86	85	89%
3	GPT-5.4 Mini (Reasoning)	$0.031	37.4s	84%	93	89	88	87	86	89%
1	GPT-5.4 Mini (Reasoning, Low)	$0.016	18.8s	84%	89	88	87	87	84	87%
33	Qwen3.6 Max Preview	$0.046	3.4m	85%	89	88	87	86	85	87%
6	Qwen 3.6 35B	$0.0090	58.9s	81%	90	88	88	87	78	86%
5	Qwen 3.6 Flash	$0.0090	36.8s	80%	89	89	86	84	80	86%
28	Claude Opus 4.7 (Reasoning)	$0.071	34.5s	82%	89	86	86	84	83	86%
70	Qwen3.7 Max	$0.059	2.3m	79%	89	89	84	82	82	85%
75	GPT-5	$0.073	3.0m	83%	87	87	86	83	83	85%
35	GPT-5.1	$0.063	1.5m	83%	86	86	85	85	83	85%
134	MoonshotAI: Kimi K2.6	$0.043	8.2m	79%	90	86	85	82	81	85%
11	Claude Sonnet 4.5	$0.029	35.7s	81%	87	85	84	84	82	85%
46	Claude Opus 4.8 (Reasoning, Low)	$0.068	47.0s	80%	88	86	84	82	82	84%
7	Grok 4.3	$0.0053	27.4s	79%	88	87	85	83	79	84%
37	Qwen 3.5 397B A17B	$0.030	2.3m	80%	86	85	85	85	79	84%
10	Mistral Small 4 (Reasoning)	$0.0022	30.7s	78%	87	86	83	82	78	83%
8	Qwen 3 32B	$0.0012	39.7s	80%	85	84	83	83	80	83%
47	Qwen 3.6 27B	$0.016	1.5m	74%	91	82	81	81	80	83%
126	Claude Opus 4	$0.147	1.0m	79%	85	85	84	83	77	83%
48	Claude Opus 4.7	$0.064	34.5s	80%	85	84	84	83	79	83%
40	MiniMax M3	$0.0031	2.0m	75%	91	84	83	79	77	83%
13	DeepSeek V4 Flash	$0.0005	26.7s	75%	90	83	82	80	78	83%
12	Mistral Large 3	$0.0026	28.4s	76%	88	82	81	81	80	82%
120	Gemini 3.1 Pro (Preview)	$0.083	1.5m	71%	91	88	80	78	75	82%
25	Hermes 3 405B	$0.0020	59.2s	74%	88	86	82	78	76	82%
50	Grok 4.20 (Reasoning)	$0.016	1.4m	75%	88	86	81	79	76	82%
20	MiniMax M2.5	$0.0030	1.0m	76%	87	83	83	81	75	82%
14	Claude Sonnet 5	$0.024	36.0s	81%	83	83	83	81	81	82%
18	Grok 4.20	$0.0078	40.7s	77%	85	84	81	81	78	82%
43	Claude Sonnet 4.6	$0.031	48.5s	77%	87	83	83	77	77	82%
77	Grok 4.3 (Reasoning)	$0.016	2.6m	73%	87	85	81	78	76	81%
22	Qwen3 235B A22B Instruct 2507	$0.0011	1.1m	77%	85	84	81	79	78	81%
56	Z.AI GLM 5.1	$0.012	1.3m	73%	89	85	82	77	73	81%
23	Qwen 3.5 Flash	$0.0019	36.5s	75%	88	81	80	79	78	81%
42	Mistral Large 2	$0.011	28.7s	72%	89	84	79	78	76	81%
19	Writer: Palmyra X5	$0.011	24.2s	77%	83	83	80	80	79	81%
73	Claude Opus 4.8 (Reasoning)	$0.067	45.2s	79%	83	82	81	81	78	81%
21	o4 Mini	$0.014	22.3s	77%	83	83	82	79	76	81%
17	Mistral NeMO	$0.0003	11.1s	75%	87	80	80	80	77	81%
116	Claude Opus 4.6 (Reasoning)	$0.086	1.4m	76%	84	84	81	79	76	81%
41	Claude Sonnet 5 (Reasoning, Low)	$0.025	35.6s	76%	85	82	81	80	75	81%
15	GPT-5.4 Nano (Reasoning, Low)	$0.0055	20.4s	77%	84	81	81	79	77	81%
16	GPT-5.4 Nano	$0.0057	20.4s	77%	83	83	82	79	75	80%
52	o4 Mini High	$0.023	38.2s	74%	86	82	80	78	76	80%
66	Qwen 3.5 Plus (2026-04-20)	$0.015	1.5m	74%	85	83	80	79	74	80%
49	Z.AI GLM 5 Turbo	$0.0081	37.9s	72%	87	83	80	77	74	80%
30	GPT-5 Mini	$0.0099	40.3s	76%	84	79	79	79	79	80%
45	Gemini 3.5 Flash (Reasoning, Minimal)	$0.019	12.7s	74%	84	83	80	78	73	80%
101	GPT-5.2	$0.060	1.6m	75%	84	80	79	78	78	80%
32	DeepSeek V3 (2025-03-24)	$0.0016	33.7s	74%	83	83	80	80	72	80%
62	GPT-4.1	$0.018	40.9s	73%	84	83	79	76	75	79%
39	Gemma 3 27B	$0.0007	45.4s	73%	85	83	80	77	73	79%
27	Mistral Small 4	$0.0011	18.0s	75%	84	81	80	78	74	79%
57	DeepSeek V4 Pro (Reasoning)	$0.0035	1.8m	76%	83	81	81	76	75	79%
63	Ministral 3 8B	$0.0023	1.5m	73%	86	81	80	75	75	79%
44	Gemini 3.1 Flash Lite (Preview)	$0.0026	8.1s	71%	86	82	77	76	76	79%
26	Ministral 3 14B	$0.0005	12.5s	75%	82	81	80	79	74	79%
83	Gemini 2.5 Pro	$0.040	41.1s	72%	84	82	78	77	74	79%
31	Mistral Medium 3.1	$0.0045	34.0s	76%	82	81	81	77	74	79%
61	Qwen 3.5 9B	$0.0008	1.2m	72%	85	80	78	75	75	79%
64	Z.AI GLM 5.2 (Reasoning, High)	$0.011	1.1m	73%	84	80	79	77	75	79%
34	DeepSeek V4 Flash (Reasoning)	$0.0008	38.0s	75%	82	82	81	77	73	79%
38	ByteDance Seed 1.6 Flash	$0.0014	30.1s	73%	84	80	78	77	75	79%
51	DeepSeek-V2 Chat	$0.0016	36.9s	72%	84	81	80	79	70	79%
29	Gemma 3 12B	$0.0004	37.1s	76%	81	81	80	76	75	79%
79	Cohere Command R+ (Aug. 2024)	$0.016	1.3m	71%	84	84	79	75	71	79%
36	GPT-5.4 Nano (Reasoning)	$0.0060	22.2s	75%	82	80	79	77	74	78%
59	Gemini 2.5 Flash (Reasoning)	$0.013	24.5s	72%	83	81	80	79	69	78%
58	Qwen 3.5 122B	$0.013	36.8s	74%	83	81	81	74	72	78%
114	Claude Opus 4.5	$0.067	56.1s	73%	82	80	77	77	75	78%
107	ByteDance Seed 2.0 Lite	$0.012	2.1m	68%	89	80	79	72	70	78%
65	MiniMax M2.7	$0.0033	1.3m	74%	82	80	78	76	75	78%
119	Claude Opus 4.6	$0.077	1.3m	75%	81	80	78	76	74	78%
69	Z.AI GLM 5	$0.0079	1.1m	73%	83	77	77	77	77	78%
60	Gemini 3.1 Flash Lite	$0.0026	9.6s	70%	82	82	77	76	70	78%
53	Gemini 2.5 Flash	$0.0057	11.6s	73%	81	78	76	76	74	77%
81	DeepSeek V4 Pro	$0.0044	1.4m	71%	83	79	77	76	71	77%
55	Xiaomi MIMO v2.5	$0.0057	34.4s	74%	79	78	77	77	73	77%
54	Gemini 2.5 Flash Lite	$0.0009	11.8s	72%	80	79	77	76	72	77%
91	Qwen 3.5 27B	$0.0097	57.2s	69%	84	77	75	73	72	76%
96	Qwen 3.5 35B	$0.011	37.9s	66%	86	77	74	73	72	76%
74	Llama 3.1 Nemotron 70B	$0.0023	32.3s	70%	83	78	76	72	72	76%
131	ByteDance Seed 2.0 Mini	$0.0049	5.4m	75%	77	77	76	76	75	76%
71	Llama 3.1 70B	$0.0007	33.1s	71%	80	78	75	74	74	76%
72	GPT-4o Mini (temp=1)	$0.0011	25.2s	71%	81	75	75	75	74	76%
127	DeepSeek V3.2	$0.0011	3.3m	67%	84	78	75	72	70	76%
68	Gemini 2.5 Flash Lite (Reasoning)	$0.0025	26.5s	72%	79	78	75	74	74	76%
110	MoonshotAI: Kimi K2.5	$0.012	1.0m	66%	86	77	76	70	68	75%
121	Claude Sonnet 4.6 (Reasoning)	$0.043	1.1m	70%	81	77	77	74	67	75%
112	Hermes 3 70B	$0.0006	31.1s	62%	84	82	71	71	69	75%
115	Z.AI GLM 4.7	$0.0098	1.9m	69%	80	77	74	73	72	75%
125	ByteDance Seed 1.6	$0.016	3.1m	72%	78	75	74	74	74	75%
90	Claude Sonnet 5 (Reasoning)	$0.023	34.5s	72%	78	78	77	74	68	75%
103	Aion 2.0	$0.0051	1.2m	68%	81	79	74	72	69	75%
67	GPT-4o Mini (temp=0)	$0.0010	24.7s	73%	77	76	75	74	73	75%
76	DeepSeek V3 (2024-12-26)	$0.0017	40.7s	71%	78	77	75	74	70	75%
108	Gemini 3.5 Flash (Reasoning)	$0.041	24.4s	70%	79	76	74	74	71	75%
95	Ministral 3B	$0.0001	1.1m	68%	82	76	73	72	71	75%
86	Ministral 3 3B	$0.0015	55.5s	70%	80	77	75	73	70	75%
78	Qwen 2.5 72B	$0.0006	27.7s	70%	79	79	75	71	71	75%
84	Xiaomi MIMO v2.5 Pro	$0.0080	52.9s	72%	77	76	75	74	71	75%
102	Cydonia 24B V4.1	$0.0009	34.2s	65%	82	80	73	72	66	75%
113	Z.AI GLM 4.7 Flash	$0.0020	1.3m	66%	84	73	73	72	70	75%
89	Z.AI GLM 4.5 Air	$0.0018	34.5s	69%	78	78	75	73	67	74%
128	Gemini 3 Pro (Preview)	$0.049	49.7s	69%	79	78	74	71	69	74%
122	Claude Sonnet 4	$0.028	43.4s	66%	79	79	74	73	65	74%
104	DeepSeek V3.1	$0.0014	1.7m	70%	78	76	75	71	70	74%
82	GPT-4o, Aug. 6th (temp=0)	$0.016	29.9s	73%	75	74	74	74	73	74%
117	Z.AI GLM 4.6	$0.0062	1.4m	67%	78	77	72	72	70	74%
98	GPT-4o, Aug. 6th (temp=1)	$0.016	36.9s	70%	79	75	75	71	69	74%
85	GPT-4.1 Mini	$0.0027	16.6s	69%	78	76	74	70	70	74%
106	Claude Haiku 4.5	$0.0093	20.2s	66%	80	79	73	69	67	74%
80	Gemma 3 4B	$0.0002	18.9s	70%	76	75	74	72	70	73%
92	WizardLM 2 8x22b	$0.0016	24.6s	69%	78	77	76	69	66	73%
105	Qwen 3.5 Plus (2026-02-15)	$0.0051	25.8s	67%	79	74	72	71	70	73%
94	Ministral 8B	$0.0002	9.9s	67%	80	77	76	68	63	73%
109	Gemini 3.1 Flash Lite (Reasoning)	$0.0026	22.1s	65%	83	77	77	68	60	73%
111	Z.AI GLM 4.5	$0.0045	37.2s	67%	79	73	73	72	66	73%
88	GPT-4.1 Nano	$0.0006	10.5s	70%	75	74	74	71	68	72%
99	Gemini 3 Flash (Preview)	$0.0067	17.5s	69%	74	72	72	71	68	72%
100	Arcee AI: Trinity Mini	$0.0003	9.0s	67%	77	75	73	68	66	72%
124	Gemma 4 31B (Reasoning)	$0.0012	1.7m	68%	74	74	70	70	70	72%
118	Gemini 3 Flash (Preview, Reasoning)	$0.010	25.1s	66%	78	73	72	68	66	72%
123	Gemma 4 31B	$0.0008	1.5m	67%	74	73	71	69	68	71%
130	Nemotron 3 Super	$0.0000	1.3m	65%	74	70	68	68	67	69%
133	Gemma 4 26B (Reasoning)	$0.0013	2.2m	66%	72	71	69	67	66	69%
132	Gemma 4 26B	$0.0007	41.6s	63%	77	72	71	66	60	69%
136	Mistral Small 3.2 24B	$0.0034	1.8m	62%	77	70	69	68	60	69%
129	Inception Mercury 2	$0.0031	7.9s	65%	68	68	67	66	65	67%
137	GPT-5 Nano	$0.0042	1.4m	63%	71	67	66	65	64	67%
135	GPT-OSS 120B	$0.0026	54.7s	63%	70	67	66	65	64	67%
138	Nemotron 3 Nano	$0.0007	42.1s	56%	77	69	63	63	61	67%
78.53%

Median	Evaluator	Top 3	Flop 3
91.4%	"Not X but Y" pattern overuse	100Ministral 3 14B 100Qwen3.7 Max 100GPT-4o, Aug. 6th (temp=1)	2GPT-5 Nano 28Mistral Small 4 33Writer: Palmyra X5
48.2%	Adverb-first sentence starts	95Writer: Palmyra X5 93Ministral 3 14B 91GPT-5.5 (Reasoning)	0GPT-4o Mini (temp=0) 0WizardLM 2 8x22b 0Qwen 2.5 72B
95.6%	Adverbs in dialogue tags	100Qwen 3.5 122B 100GPT-5.4 (Reasoning) 100GPT-OSS 120B	20GPT-4.1 Nano 31GPT-4.1 Mini 38Claude Haiku 4.5
87.8%	AI-ism adverb frequency	98ByteDance Seed 1.6 Flash 98ByteDance Seed 2.0 Lite 98DeepSeek V4 Flash	60Gemma 3 4B 68GPT-4.1 Nano 68GPT-4.1 Mini
100.0%	AI-ism character names	100Mistral Small 4 100Claude Opus 4.7 (Reasoning) 100MoonshotAI: Kimi K2.5	80Mistral NeMO 80Z.AI GLM 5 Turbo 80DeepSeek V4 Pro
100.0%	AI-ism location names	100Grok 4.20 100MiniMax M2.5 100Qwen 3.5 Flash	96Claude Haiku 4.5
52.6%	AI-ism word frequency	91Claude Sonnet 5 (Reasoning, Low) 85Claude Sonnet 5 85Claude Opus 4.7 (Reasoning)	3GPT-4o Mini (temp=0) 8GPT-4o, Aug. 6th (temp=1) 11GPT-4o, Aug. 6th (temp=0)
100.0%	Cliché density	100Xiaomi MIMO v2.5 Pro 100MiniMax M2.5 100GPT-5.1	33Mistral Small 3.2 24B 53GPT-OSS 120B 53Inception Mercury 2
80.5%	Dialogue tag variety (said vs. fancy)	100DeepSeek V4 Flash 100Claude Opus 4.5 100GPT-5.4	0Qwen 3.5 Plus (2026-02-15) 3Gemma 4 31B 8Gemini 3 Pro (Preview)
13.2%	Em-dash & semicolon overuse	100Qwen3.7 Max 100Qwen 3.5 397B A17B 100Qwen3.6 Max Preview	0Claude Opus 4.7 0DeepSeek-V2 Chat 0Nemotron 3 Super
100.0%	Emotion telling (show vs. tell)	100GPT-5.4 Nano (Reasoning, Low) 100Gemini 3 Flash (Preview, Reasoning) 100Qwen 3.5 122B	88Mistral Small 3.2 24B 90GPT-4o Mini (temp=0) 93Hermes 3 70B
93.4%	Filter word density	100Claude Sonnet 4.6 (Reasoning) 100GPT-5.4 Mini 100GPT-5.5 (Reasoning)	8Inception Mercury 2 9Nemotron 3 Nano 27GPT-OSS 120B
100.0%	Gibberish response detection	100Llama 3.1 70B 100GPT-4o Mini (temp=0) 100Qwen 3.5 Flash	98Qwen 2.5 72B 99MiniMax M2.5 99Qwen 3 32B
100.0%	Markdown formatting overuse	100Qwen 3.5 Plus (2026-04-20) 100Qwen3.6 Max Preview 100Gemini 3.1 Pro (Preview)	93ByteDance Seed 1.6 Flash
100.0%	Missing dialogue indicators (quotation marks)	100Gemma 3 4B 100GPT-5.4 Mini 100GPT-4.1	0Qwen3.6 Max Preview 20Qwen 3.5 397B A17B 39Gemini 3.1 Flash Lite (Reasoning)
49.6%	Name drop frequency	96Claude Opus 4.8 (Reasoning) 95Claude Sonnet 5 (Reasoning) 94GPT-4.1 Nano	0GPT-5.5 (Reasoning) 1Z.AI GLM 4.5 3GPT-5.5 (Reasoning, Low)
77.8%	Narrator intent-glossing	100Grok 4.3 (Reasoning) 100Qwen 3.5 9B 100Qwen 3.5 397B A17B	11GPT-5 Nano 17Nemotron 3 Nano 20GPT-4o, Aug. 6th (temp=1)
100.0%	Overuse of "that" (subordinate clause padding)	100Qwen 3.5 Plus (2026-02-15) 100Mistral Medium 3.1 100GPT-OSS 120B	70Claude Sonnet 5 (Reasoning) 74Mistral Small 3.2 24B 74Hermes 3 70B
100.0%	Paragraph length variance	100GPT-5.4 100GPT-5.4 Nano 100DeepSeek V4 Flash (Reasoning)	53GPT-5 Nano 65ByteDance Seed 2.0 Lite 67Nemotron 3 Nano
89.1%	Passive voice overuse	99Hermes 3 405B 99DeepSeek V3 (2025-03-24) 98o4 Mini High	65ByteDance Seed 2.0 Mini 70Nemotron 3 Super 76Nemotron 3 Nano
77.4%	Past progressive (was/were + -ing) overuse	100GPT-5.5 (Reasoning) 100Hermes 3 405B 100GPT-5.2	0Gemini 3 Flash (Preview) 10Gemma 4 31B (Reasoning) 12Z.AI GLM 4.6
93.1%	Pronoun-first sentence starts	100GPT-5.4 Mini 100Grok 4.20 (Reasoning) 100Mistral Large 2	35Gemma 4 26B 36Gemini 3.1 Flash Lite 40Qwen 3.5 27B
97.6%	Purple prose (modifier overload)	100Qwen 3.5 Plus (2026-04-20) 100Mistral Large 3 100MoonshotAI: Kimi K2.5	84GPT-4.1 Nano 86Gemma 3 4B 88ByteDance Seed 2.0 Mini
100.0%	Repeated phrase echo	100ByteDance Seed 2.0 Lite 100Claude Sonnet 5 (Reasoning, Low) 100GPT-4o Mini (temp=1)	—
100.0%	Sentence length variance	100Qwen 3.6 27B 100Gemma 4 26B (Reasoning) 100MoonshotAI: Kimi K2.5	99Cohere Command R+ (Aug. 2024) 99Hermes 3 70B 100Mistral Small 3.2 24B
56.9%	Sentence opener variety	97Claude Sonnet 5 (Reasoning) 92Claude Sonnet 5 91GPT-4o Mini (temp=1)	34GPT-5 Nano 37Inception Mercury 2 39Qwen 3.5 35B
45.4%	Subject-first sentence starts	93GPT-5.4 92Claude Sonnet 4.5 92Writer: Palmyra X5	1Inception Mercury 2 3GPT-OSS 120B 7Arcee AI: Trinity Mini
40.0%	Subordinate conjunction sentence starts	100Gemini 3.1 Flash Lite (Preview) 91Gemini 3.1 Flash Lite 91Qwen 3.6 35B	0Claude Sonnet 4 0GPT-4.1 Mini 0Claude Opus 4.7
64.7%	Technical jargon density	100GPT-5.5 (Reasoning, Low) 100Qwen 2.5 72B 100o4 Mini High	0GPT-5 Nano 0ByteDance Seed 2.0 Lite 0Nemotron 3 Nano
49.4%	Useless dialogue additions	100GPT-5.5 100Gemini 3.1 Flash Lite (Preview) 100GPT-5.4 (Reasoning, Low)	0DeepSeek V3.1 0Gemma 4 26B 0Gemini 2.5 Flash Lite

Bad Writing Habits

Mystery: examining a crime scene

Performance Score Distribution (Top 20)

Price-Performance Score Distribution (Top 20)

Most Stable Models (Top 20)

Top Overall Models (Top 20)