Thriller: chase through city streets

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Creative Writing Hallucination

Performance Score Distribution (Top 20)

Click a model name to view its detail page.

	Score
GPT-5.4	92%
GPT-5.4 (Reasoning)	91%
GPT-5.5	91%
GPT-5.4 (Reasoning, Low)	91%
GPT-5.4 Mini (Reasoning)	90%
Qwen3.6 Max Preview	90%
GPT-5.5 (Reasoning, Low)	90%
GPT-5.5 (Reasoning)	90%
GPT-5.4 Mini	89%
Qwen 3.6 27B	87%
GPT-5.4 Mini (Reasoning, Low)	87%
Qwen 3.5 397B A17B	87%
Gemini 3.1 Pro (Preview)	86%
Writer: Palmyra X5	86%
Qwen3 235B A22B Instruct 2507	86%
Claude Sonnet 4.5	85%
Qwen 3.6 Flash	85%
Z.AI GLM 5 Turbo	85%
Qwen 3.5 35B	84%
Claude Opus 4.8 (Reasoning, Low)	84%

	Score	Cost	Time
GPT-5.4 Mini (Reasoning)	90%	$0.022	26.3s
GPT-5.4 Mini	89%	$0.015	18.3s
Qwen 3.6 Flash	85%	$0.0085	36.0s
GPT-5.4 Mini (Reasoning, Low)	87%	$0.013	16.0s
Writer: Palmyra X5	86%	$0.011	22.5s
Gemini 2.5 Flash	81%	$0.0039	8.1s
Qwen3 235B A22B Instruct 2507	86%	$0.0011	1.1m
Qwen 3.6 27B	87%	$0.017	1.7m
GPT-5.4	92%	$0.050	1.4m
DeepSeek V4 Flash	81%	$0.0006	27.0s
GPT-5.4 (Reasoning, Low)	91%	$0.051	1.4m
Mistral Small 4 (Reasoning)	83%	$0.0023	32.8s
Z.AI GLM 5 Turbo	85%	$0.0068	30.7s
Qwen 3.5 397B A17B	87%	$0.016	1.5m
Grok 4.3	82%	$0.0051	25.2s
GPT-4.1	83%	$0.018	46.3s
Qwen 3.5 Plus (2026-04-20)	83%	$0.013	1.4m
Qwen 3.6 35B	83%	$0.0053	48.7s
Mistral Small 4	79%	$0.0011	17.2s
Qwen 3.5 35B	84%	$0.0068	26.9s

	Score	Cost	Speed	Stability
GPT-5.4 Mini	89%	$0.015	18.3s	86%
GPT-5.4 Mini (Reasoning)	90%	$0.022	26.3s	85%
GPT-5.4	92%	$0.050	1.4m	90%
GPT-5.4 Mini (Reasoning, Low)	87%	$0.013	16.0s	83%
Writer: Palmyra X5	86%	$0.011	22.5s	81%
GPT-5.4 (Reasoning, Low)	91%	$0.051	1.4m	86%
Qwen 3.6 Flash	85%	$0.0085	36.0s	80%
Qwen3 235B A22B Instruct 2507	86%	$0.0011	1.1m	79%
Z.AI GLM 5 Turbo	85%	$0.0068	30.7s	78%
Qwen 3.5 397B A17B	87%	$0.016	1.5m	82%
Qwen 3.6 35B	83%	$0.0053	48.7s	80%
Qwen 3.6 27B	87%	$0.017	1.7m	80%
Qwen 3.5 35B	84%	$0.0068	26.9s	76%
Claude Sonnet 4.5	85%	$0.027	34.3s	78%
DeepSeek V4 Flash	81%	$0.0006	27.0s	77%
MiniMax M2.5	81%	$0.0026	35.8s	78%
GPT-4.1 Mini	81%	$0.0025	14.4s	77%
Gemini 2.5 Flash Lite (Reasoning)	81%	$0.0027	28.7s	78%
GPT-5.4 Nano (Reasoning, Low)	80%	$0.0049	17.8s	78%
Mistral Small 4 (Reasoning)	83%	$0.0023	32.8s	75%

Rank	Model	Avg. Cost	Avg. Time	Stability	# 1	# 2	# 3	# 4	# 5	Total
3	GPT-5.4	$0.050	1.4m	90%	94	92	92	91	91	92%
28	GPT-5.4 (Reasoning)	$0.086	2.5m	88%	94	93	92	89	87	91%
91	GPT-5.5	$0.159	1.7m	87%	94	91	90	90	89	91%
6	GPT-5.4 (Reasoning, Low)	$0.051	1.4m	86%	94	94	92	88	86	91%
2	GPT-5.4 Mini (Reasoning)	$0.022	26.3s	85%	94	92	92	87	85	90%
39	Qwen3.6 Max Preview	$0.045	3.3m	83%	93	93	88	87	87	90%
86	GPT-5.5 (Reasoning, Low)	$0.148	1.9m	88%	92	90	89	89	89	90%
62	GPT-5.5 (Reasoning)	$0.130	1.7m	88%	91	90	90	89	88	90%
1	GPT-5.4 Mini	$0.015	18.3s	86%	92	90	89	88	86	89%
12	Qwen 3.6 27B	$0.017	1.7m	80%	94	90	90	84	79	87%
4	GPT-5.4 Mini (Reasoning, Low)	$0.013	16.0s	83%	91	88	88	88	82	87%
10	Qwen 3.5 397B A17B	$0.016	1.5m	82%	90	89	87	85	82	87%
65	Gemini 3.1 Pro (Preview)	$0.075	1.4m	80%	92	89	88	85	78	86%
5	Writer: Palmyra X5	$0.011	22.5s	81%	90	87	86	86	81	86%
8	Qwen3 235B A22B Instruct 2507	$0.0011	1.1m	79%	92	87	85	83	81	86%
14	Claude Sonnet 4.5	$0.027	34.3s	78%	91	87	84	84	81	85%
7	Qwen 3.6 Flash	$0.0085	36.0s	80%	90	89	89	82	76	85%
9	Z.AI GLM 5 Turbo	$0.0068	30.7s	78%	91	86	83	83	82	85%
13	Qwen 3.5 35B	$0.0068	26.9s	76%	90	86	82	81	81	84%
66	Claude Opus 4.8 (Reasoning, Low)	$0.061	42.1s	77%	90	86	83	81	80	84%
11	Qwen 3.6 35B	$0.0053	48.7s	80%	87	84	83	82	81	83%
46	Qwen 3.5 Plus (2026-04-20)	$0.013	1.4m	74%	92	87	85	78	75	83%
79	GPT-5.1	$0.060	1.5m	78%	87	86	84	80	78	83%
70	Qwen3.7 Max	$0.059	1.8m	82%	84	84	83	83	82	83%
21	GPT-4.1	$0.018	46.3s	78%	87	86	85	81	76	83%
68	Claude Opus 4.7	$0.056	28.3s	75%	90	88	86	78	74	83%
29	Z.AI GLM 5.2 (Reasoning, High)	$0.0088	51.7s	75%	90	81	81	81	80	83%
42	Gemini 2.5 Pro	$0.038	38.0s	77%	89	83	82	81	79	83%
20	Mistral Small 4 (Reasoning)	$0.0023	32.8s	75%	89	84	83	83	74	83%
118	GPT-5	$0.073	3.0m	80%	85	83	83	83	79	82%
104	MoonshotAI: Kimi K2.6	$0.035	3.1m	78%	87	86	85	79	76	82%
24	Grok 4.3	$0.0051	25.2s	75%	88	85	83	79	74	82%
55	Claude Sonnet 5	$0.024	34.9s	73%	87	86	80	79	76	82%
47	MiniMax M3	$0.0039	2.3m	78%	85	83	83	80	78	82%
134	Claude Opus 4	$0.181	1.4m	73%	87	85	80	78	76	81%
26	DeepSeek V4 Pro	$0.0043	1.1m	78%	84	84	82	78	78	81%
31	Gemini 2.5 Flash	$0.0039	8.1s	72%	88	86	85	79	68	81%
15	DeepSeek V4 Flash	$0.0006	27.0s	77%	84	83	83	81	75	81%
16	MiniMax M2.5	$0.0026	35.8s	78%	84	83	82	80	77	81%
49	Claude Sonnet 5 (Reasoning)	$0.026	37.9s	75%	86	84	82	80	74	81%
110	Claude Opus 4.6 (Reasoning)	$0.080	1.3m	77%	85	82	81	79	78	81%
25	o4 Mini High	$0.022	37.6s	79%	82	82	81	81	79	81%
41	Gemini 3.5 Flash (Reasoning)	$0.039	24.1s	78%	85	83	83	79	76	81%
56	Hermes 3 405B	$0.0022	1.5m	75%	88	83	83	76	75	81%
64	DeepSeek V3 (2025-03-24)	$0.0012	30.8s	69%	94	81	80	76	73	81%
22	Gemini 2.5 Flash (Reasoning)	$0.0100	19.8s	76%	85	83	81	79	77	81%
17	GPT-4.1 Mini	$0.0025	14.4s	77%	86	81	80	79	79	81%
38	Grok 4.20	$0.0068	40.2s	74%	87	81	80	79	77	81%
85	Claude Opus 4.7 (Reasoning)	$0.063	31.2s	76%	85	84	83	76	75	81%
18	Gemini 2.5 Flash Lite (Reasoning)	$0.0027	28.7s	78%	83	82	81	80	77	81%
34	Mistral Medium 3.1	$0.0053	49.4s	76%	84	81	80	79	77	80%
53	DeepSeek-V2 Chat	$0.0015	33.0s	71%	88	85	79	76	75	80%
97	Claude Opus 4.8 (Reasoning)	$0.063	42.3s	75%	84	84	84	78	72	80%
102	MoonshotAI: Kimi K2.5	$0.016	2.5m	74%	85	83	80	76	75	80%
44	Claude Haiku 4.5	$0.0093	19.7s	73%	86	82	80	80	72	80%
58	DeepSeek V3 (2024-12-26)	$0.0016	36.1s	71%	90	80	80	77	73	80%
54	Z.AI GLM 5.1	$0.012	1.0m	75%	84	82	81	80	73	80%
19	GPT-5.4 Nano (Reasoning, Low)	$0.0049	17.8s	78%	82	82	81	77	77	80%
51	MiniMax M2.7	$0.0034	59.8s	74%	85	80	78	78	78	80%
69	Qwen 3.5 9B	$0.0007	1.0m	71%	86	82	77	77	76	80%
112	GPT-5.2	$0.054	1.5m	73%	86	81	80	79	73	80%
32	o4 Mini	$0.014	22.1s	77%	82	82	80	78	77	80%
27	Qwen 3.5 Flash	$0.0018	44.1s	77%	82	82	81	77	77	80%
33	ByteDance Seed 1.6 Flash	$0.0014	28.9s	75%	83	82	79	78	77	80%
43	Aion 2.0	$0.0049	1.1m	76%	83	81	80	79	75	80%
71	Hermes 3 70B	$0.0006	45.2s	70%	86	85	79	76	72	80%
81	Gemma 3 27B	$0.0004	50.6s	68%	88	85	77	74	74	80%
30	Ministral 3 8B	$0.0003	8.5s	74%	85	79	78	78	77	79%
48	Mistral Small 4	$0.0011	17.2s	71%	87	83	81	78	69	79%
63	Gemini 3.5 Flash (Reasoning, Minimal)	$0.017	11.8s	72%	86	80	78	78	75	79%
23	Ministral 8B	$0.0002	8.4s	75%	83	81	80	77	75	79%
59	Xiaomi MIMO v2.5 Pro	$0.0074	48.8s	74%	84	80	79	77	75	79%
35	Gemini 2.5 Flash Lite	$0.0008	9.4s	74%	84	81	80	77	73	79%
37	GPT-5.4 Nano (Reasoning)	$0.0063	25.1s	75%	82	82	81	77	74	79%
94	Z.AI GLM 5	$0.0067	48.5s	68%	90	80	76	75	74	79%
61	Mistral Large 3	$0.0022	24.1s	71%	84	84	79	78	70	79%
83	Qwen 3.5 27B	$0.0087	47.6s	70%	85	84	77	75	74	79%
96	Claude Sonnet 4.6 (Reasoning)	$0.034	54.3s	73%	83	82	80	79	70	79%
50	DeepSeek V4 Flash (Reasoning)	$0.0006	32.3s	73%	84	82	79	74	74	79%
40	Ministral 3 14B	$0.0005	13.5s	73%	83	82	77	76	75	79%
78	Claude Sonnet 5 (Reasoning, Low)	$0.027	38.4s	74%	84	79	79	76	75	79%
106	Claude Opus 4.5	$0.065	57.5s	77%	80	80	79	79	75	79%
73	GPT-5 Mini	$0.011	42.4s	72%	83	82	77	76	75	79%
80	Mistral Large 2	$0.011	28.8s	70%	85	84	79	75	69	79%
113	Claude Sonnet 4.6	$0.028	37.7s	66%	90	79	76	74	73	78%
105	Claude Sonnet 4	$0.025	38.1s	68%	86	82	75	75	75	78%
52	Gemma 4 31B (Reasoning)	$0.0010	59.4s	75%	82	79	79	76	75	78%
75	GPT-4o, Aug. 6th (temp=1)	$0.015	30.5s	72%	84	82	78	76	73	78%
87	DeepSeek V3.1	$0.0016	1.7m	74%	82	82	81	75	70	78%
67	Z.AI GLM 4.5	$0.0046	39.1s	73%	83	79	78	78	72	78%
60	Xiaomi MIMO v2.5	$0.0045	28.8s	73%	82	82	81	72	72	78%
101	WizardLM 2 8x22b	$0.0018	24.7s	65%	90	82	80	76	62	78%
36	Gemma 3 4B	$0.0002	19.0s	75%	80	80	79	76	74	78%
107	DeepSeek V3.2	$0.0011	2.8m	75%	81	80	79	75	74	78%
72	GPT-5.4 Nano	$0.0048	18.5s	71%	84	79	77	76	72	78%
121	Claude Opus 4.6	$0.069	1.2m	73%	81	80	79	75	72	77%
111	Cohere Command R+ (Aug. 2024)	$0.021	2.1m	73%	81	78	77	76	75	77%
57	Ministral 3B	$0.0001	5.6s	72%	84	79	78	74	72	77%
45	Llama 3.1 Nemotron 70B	$0.0021	29.8s	75%	79	79	78	76	75	77%
89	DeepSeek V4 Pro (Reasoning)	$0.0025	1.3m	72%	81	80	79	76	69	77%
92	Z.AI GLM 4.6	$0.0076	35.5s	70%	82	81	75	74	73	77%
93	Gemma 4 26B (Reasoning)	$0.0008	1.7m	73%	80	78	76	75	74	77%
126	Grok 4.3 (Reasoning)	$0.017	2.3m	65%	85	83	75	74	66	77%
77	Mistral NeMO	$0.0005	20.5s	70%	83	77	75	75	73	77%
74	Gemma 3 12B	$0.0002	37.8s	72%	80	78	75	75	75	77%
98	GPT-4.1 Nano	$0.0006	10.0s	66%	87	81	76	70	68	76%
90	Qwen 3.5 122B	$0.012	45.0s	72%	80	78	77	75	71	76%
122	ByteDance Seed 1.6	$0.013	2.6m	71%	80	79	77	75	69	76%
95	Qwen 2.5 72B	$0.0008	39.1s	69%	82	78	75	72	72	76%
84	GPT-4o Mini (temp=1)	$0.0011	23.6s	71%	79	79	76	72	71	76%
76	Qwen 3.5 Plus (2026-02-15)	$0.0049	25.3s	73%	78	77	75	74	74	75%
120	Grok 4.20 (Reasoning)	$0.012	56.5s	65%	84	83	78	69	63	75%
88	Qwen 3 32B	$0.0012	50.5s	73%	77	77	75	74	73	75%
103	Z.AI GLM 4.7 Flash	$0.0015	1.1m	70%	80	77	76	71	71	75%
99	Llama 3.1 70B	$0.0007	23.3s	68%	81	79	79	74	62	75%
82	Arcee AI: Trinity Mini	$0.0003	10.0s	71%	78	76	74	73	71	74%
109	Ministral 3 3B	$0.0003	6.5s	65%	84	75	72	71	70	74%
114	GPT-4o, Aug. 6th (temp=0)	$0.016	21.9s	67%	81	76	73	71	71	74%
124	Cydonia 24B V4.1	$0.0011	41.5s	60%	85	82	72	69	64	74%
128	Gemma 4 31B	$0.0009	2.9m	68%	79	77	75	71	66	74%
100	Gemini 3.1 Flash Lite (Reasoning)	$0.0024	7.6s	69%	78	75	73	71	70	73%
108	Gemini 3.1 Flash Lite	$0.0024	22.6s	69%	76	74	73	72	68	72%
127	Z.AI GLM 4.7	$0.0098	2.0m	68%	77	73	71	71	70	72%
115	GPT-4o Mini (temp=0)	$0.0011	26.3s	68%	76	72	71	70	69	72%
117	Gemini 3 Flash (Preview, Reasoning)	$0.010	24.3s	69%	73	72	71	71	70	71%
116	Gemini 3 Flash (Preview)	$0.0066	17.3s	68%	75	72	72	68	68	71%
135	ByteDance Seed 2.0 Lite	$0.012	2.7m	57%	80	78	66	66	65	71%
130	Gemini 3 Pro (Preview)	$0.049	49.2s	66%	75	72	70	69	67	71%
119	Nemotron 3 Super	$0.0000	56.8s	69%	72	72	71	69	68	71%
123	Z.AI GLM 4.5 Air	$0.0020	34.3s	65%	75	73	70	68	66	70%
125	Gemini 3.1 Flash Lite (Preview)	$0.0025	7.6s	62%	80	72	70	65	64	70%
136	ByteDance Seed 2.0 Mini	$0.0047	5.3m	63%	75	74	68	67	66	70%
138	Mistral Small 3.2 24B	$0.011	7.1m	67%	71	69	69	68	66	69%
129	Gemma 4 26B	$0.0007	53.0s	64%	71	71	68	65	64	68%
133	GPT-5 Nano	$0.0044	1.6m	64%	68	68	68	66	61	66%
132	Nemotron 3 Nano	$0.0006	1.1m	62%	69	67	65	64	62	66%
131	Inception Mercury 2	$0.0030	8.2s	62%	65	64	63	63	62	63%
137	GPT-OSS 120B	$0.0020	3.3m	59%	67	65	63	63	57	63%
79.09%

Median	Evaluator	Top 3	Flop 3
100.0%	"Not X but Y" pattern overuse	100MiniMax M2.5 100Gemini 3.1 Flash Lite (Reasoning) 100DeepSeek V4 Flash	7GPT-5 Nano 32MoonshotAI: Kimi K2.6 36Mistral Small 4
58.7%	Adverb-first sentence starts	100Qwen3 235B A22B Instruct 2507 100GPT-5.4 (Reasoning, Low) 100Claude Opus 4.7	0Inception Mercury 2 2Mistral Small 3.2 24B 5Qwen 3 32B
94.6%	Adverbs in dialogue tags	100Gemma 3 27B 100Grok 4.3 (Reasoning) 100Inception Mercury 2	20GPT-5 Nano 31Claude Opus 4 40DeepSeek V3 (2025-03-24)
92.7%	AI-ism adverb frequency	99ByteDance Seed 2.0 Lite 99o4 Mini 98DeepSeek V4 Pro (Reasoning)	81Claude Sonnet 4.6 82Gemma 3 4B 82Hermes 3 405B
100.0%	AI-ism character names	100MiniMax M2.7 100GPT-OSS 120B 100MiniMax M3	92Claude Opus 4 96ByteDance Seed 2.0 Lite 96DeepSeek V4 Pro
100.0%	AI-ism location names	100Z.AI GLM 4.7 Flash 100Gemini 3.1 Flash Lite 100MiniMax M2.7	—
48.1%	AI-ism word frequency	88Claude Opus 4.7 83Claude Sonnet 5 (Reasoning) 83Claude Sonnet 4.6	0Mistral Small 3.2 24B 0GPT-4o Mini (temp=0) 1GPT-4o Mini (temp=1)
100.0%	Cliché density	100Gemini 3 Flash (Preview) 100GPT-5.4 Mini (Reasoning) 100Z.AI GLM 5.2 (Reasoning, High)	33Qwen 2.5 72B 47GPT-4o, Aug. 6th (temp=0) 47Mistral Small 3.2 24B
60.7%	Dialogue tag variety (said vs. fancy)	100Z.AI GLM 5 Turbo 100Qwen3.6 Max Preview 100Gemini 2.5 Flash	0Gemini 3 Flash (Preview, Reasoning) 0Gemma 4 26B 0Gemini 3 Pro (Preview)
0.4%	Em-dash & semicolon overuse	100Qwen 3.5 397B A17B 100Mistral NeMO 100Qwen3.6 Max Preview	0Claude Opus 4 0Claude Opus 4.5 0DeepSeek V3 (2025-03-24)
100.0%	Emotion telling (show vs. tell)	100GPT-5 Mini 100GPT-5.5 (Reasoning, Low) 100Claude Opus 4.8 (Reasoning, Low)	80WizardLM 2 8x22b 95GPT-4o Mini (temp=1) 96Gemma 4 31B
92.7%	Filter word density	100GPT-5.4 (Reasoning, Low) 100GPT-4.1 Mini 100GPT-5.5	0Nemotron 3 Nano 1Inception Mercury 2 4Nemotron 3 Super
100.0%	Gibberish response detection	100Claude Opus 4.8 (Reasoning, Low) 100Claude Sonnet 4.5 100Qwen 3.5 122B	32WizardLM 2 8x22b 98MiniMax M2.5 99Qwen 3.5 9B
100.0%	Markdown formatting overuse	100DeepSeek V4 Flash 100GPT-4o Mini (temp=1) 100Qwen 3.6 27B	80Ministral 3 3B 80Ministral 3 14B 95ByteDance Seed 1.6 Flash
100.0%	Missing dialogue indicators (quotation marks)	100Arcee AI: Trinity Mini 100Llama 3.1 Nemotron 70B 100DeepSeek V3 (2025-03-24)	60Qwen 3.5 397B A17B 64Gemini 3.1 Flash Lite (Reasoning) 75Qwen 3.6 Flash
88.7%	Name drop frequency	100Claude Opus 4.7 100Claude Opus 4.7 (Reasoning) 100Gemini 3.1 Flash Lite (Reasoning)	27ByteDance Seed 1.6 Flash 37Hermes 3 405B 45Qwen 2.5 72B
74.9%	Narrator intent-glossing	100GPT-5.4 100Ministral 3B 100o4 Mini	0Nemotron 3 Nano 1Inception Mercury 2 9GPT-5 Nano
100.0%	Overuse of "that" (subordinate clause padding)	100Z.AI GLM 5 Turbo 100Mistral Medium 3.1 100GPT-OSS 120B	48Claude Sonnet 5 (Reasoning, Low) 50ByteDance Seed 2.0 Lite 62Claude Sonnet 4.6
98.1%	Paragraph length variance	100Qwen 3.5 9B 100GPT-5.2 100Claude Sonnet 5	37Grok 4.3 (Reasoning) 38GPT-5 Nano 43ByteDance Seed 2.0 Lite
93.6%	Passive voice overuse	100GPT-5.5 (Reasoning, Low) 100GPT-5.5 100o4 Mini High	83ByteDance Seed 2.0 Lite 84DeepSeek V4 Pro (Reasoning) 85ByteDance Seed 2.0 Mini
78.7%	Past progressive (was/were + -ing) overuse	100GPT-5.5 (Reasoning, Low) 100Grok 4.3 (Reasoning) 100GPT-5 Mini	9Qwen3.7 Max 17Gemma 3 27B 19Z.AI GLM 4.7
80.3%	Pronoun-first sentence starts	100GPT-5.5 100GPT-5.4 Mini 100GPT-5.4 (Reasoning)	11Mistral Small 3.2 24B 17Gemini 3.1 Flash Lite 22Gemini 3.1 Flash Lite (Reasoning)
97.6%	Purple prose (modifier overload)	100ByteDance Seed 1.6 Flash 100Ministral 3 3B 100Claude Sonnet 4	83Gemini 3.5 Flash (Reasoning) 90GPT-5 Nano 90Gemma 4 26B
100.0%	Repeated phrase echo	100Mistral Large 2 100Mistral Medium 3.1 100GPT-5.4 Mini (Reasoning)	—
100.0%	Sentence length variance	100Gemma 4 31B 100Ministral 3B 100Z.AI GLM 4.7	82Llama 3.1 70B 92GPT-4o, Aug. 6th (temp=0) 93Grok 4.3 (Reasoning)
49.0%	Sentence opener variety	83WizardLM 2 8x22b 81Llama 3.1 Nemotron 70B 81Cydonia 24B V4.1	27GPT-5 Nano 27Mistral Small 3.2 24B 31Qwen 3.5 122B
48.9%	Subject-first sentence starts	96GPT-5.4 (Reasoning) 94GPT-5.4 93Writer: Palmyra X5	0Inception Mercury 2 3GPT-OSS 120B 11Z.AI GLM 4.7
53.9%	Subordinate conjunction sentence starts	100Qwen 3.5 397B A17B 100Qwen 3.5 Plus (2026-04-20) 100Qwen3.7 Max	0WizardLM 2 8x22b 0Mistral Large 3 0Mistral Small 3.2 24B
59.3%	Technical jargon density	100Aion 2.0 100GPT-5.5 98GPT-4o Mini (temp=0)	0Claude Sonnet 5 (Reasoning) 0Nemotron 3 Super 0MiniMax M2.5
60.0%	Useless dialogue additions	100WizardLM 2 8x22b 100Qwen3.6 Max Preview 100Z.AI GLM 5.1	0ByteDance Seed 2.0 Lite 0GPT-OSS 120B 0Qwen 2.5 72B

Bad Writing Habits

Thriller: chase through city streets

Performance Score Distribution (Top 20)

Price-Performance Score Distribution (Top 20)

Most Stable Models (Top 20)

Top Overall Models (Top 20)