Literary fiction: old friends reunite

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Creative Writing Hallucination

Performance Score Distribution (Top 20)

Click a model name to view its detail page.

	Score
GPT-5.5	90%
GPT-5.4	89%
GPT-5.5 (Reasoning, Low)	89%
GPT-5.4 (Reasoning, Low)	89%
GPT-5.5 (Reasoning)	88%
GPT-5.4 (Reasoning)	88%
GPT-5.4 Mini	88%
GPT-5.4 Mini (Reasoning, Low)	87%
GPT-5.4 Mini (Reasoning)	86%
Grok 4.20 (Reasoning)	85%
DeepSeek V3 (2025-03-24)	85%
Qwen3.6 Max Preview	84%
GPT-5.1	84%
Mistral Medium 3.1	83%
Claude Opus 4.8 (Reasoning, Low)	83%
Grok 4.20	83%
Qwen3 235B A22B Instruct 2507	83%
Writer: Palmyra X5	83%
Ministral 8B	83%
Qwen3.7 Max	83%

	Score	Cost	Time
GPT-5.4 Mini	88%	$0.018	20.0s
DeepSeek V3 (2025-03-24)	85%	$0.0012	39.8s
GPT-5.4 Mini (Reasoning, Low)	87%	$0.018	21.3s
Ministral 8B	83%	$0.0003	12.7s
GPT-5.4 Mini (Reasoning)	86%	$0.030	33.3s
Qwen 3.6 35B	82%	$0.0062	54.2s
Ministral 3 14B	82%	$0.0006	17.7s
Writer: Palmyra X5	83%	$0.011	23.1s
ByteDance Seed 1.6 Flash	83%	$0.0013	30.7s
Mistral Medium 3.1	83%	$0.0047	41.0s
Grok 4.20	83%	$0.0083	50.1s
DeepSeek V4 Flash (Reasoning)	81%	$0.0008	39.8s
o4 Mini	81%	$0.017	27.0s
Qwen3 235B A22B Instruct 2507	83%	$0.0011	1.2m
Mistral Small 4	81%	$0.0015	23.5s
Qwen 3.6 Flash	81%	$0.0096	38.9s
Ministral 3B	78%	$0.0001	7.0s
Mistral Large 3	79%	$0.0028	31.2s
Grok 4.3	80%	$0.0059	23.2s
Z.AI GLM 5 Turbo	81%	$0.0093	40.2s

	Score	Cost	Speed	Stability
GPT-5.4 Mini	88%	$0.018	20.0s	86%
GPT-5.4 Mini (Reasoning, Low)	87%	$0.018	21.3s	85%
GPT-5.4 Mini (Reasoning)	86%	$0.030	33.3s	84%
Ministral 3 14B	82%	$0.0006	17.7s	79%
DeepSeek V3 (2025-03-24)	85%	$0.0012	39.8s	79%
Ministral 8B	83%	$0.0003	12.7s	78%
Mistral Medium 3.1	83%	$0.0047	41.0s	79%
Grok 4.20	83%	$0.0083	50.1s	80%
Writer: Palmyra X5	83%	$0.011	23.1s	77%
ByteDance Seed 1.6 Flash	83%	$0.0013	30.7s	77%
DeepSeek V4 Flash (Reasoning)	81%	$0.0008	39.8s	78%
Mistral Small 4	81%	$0.0015	23.5s	75%
o4 Mini	81%	$0.017	27.0s	78%
Qwen3 235B A22B Instruct 2507	83%	$0.0011	1.2m	78%
Grok 4.3	80%	$0.0059	23.2s	77%
GPT-5.4	89%	$0.070	2.0m	85%
GPT-4.1	81%	$0.017	34.0s	77%
Mistral Small 4 (Reasoning)	81%	$0.0024	37.2s	75%
Grok 4.20 (Reasoning)	85%	$0.019	1.9m	81%
GPT-5.4 (Reasoning, Low)	89%	$0.070	1.9m	84%

Rank	Model	Avg. Cost	Avg. Time	Stability	# 1	# 2	# 3	# 4	# 5	Total
76	GPT-5.5	$0.179	2.0m	87%	92	91	89	88	88	90%
16	GPT-5.4	$0.070	2.0m	85%	92	91	89	88	87	89%
65	GPT-5.5 (Reasoning, Low)	$0.160	2.1m	88%	90	89	89	89	88	89%
20	GPT-5.4 (Reasoning, Low)	$0.070	1.9m	84%	92	91	88	87	86	89%
80	GPT-5.5 (Reasoning)	$0.164	2.0m	86%	90	90	89	86	86	88%
95	GPT-5.4 (Reasoning)	$0.125	3.6m	86%	89	89	88	88	87	88%
1	GPT-5.4 Mini	$0.018	20.0s	86%	90	89	89	86	85	88%
2	GPT-5.4 Mini (Reasoning, Low)	$0.018	21.3s	85%	90	88	87	86	86	87%
3	GPT-5.4 Mini (Reasoning)	$0.030	33.3s	84%	89	87	87	85	83	86%
19	Grok 4.20 (Reasoning)	$0.019	1.9m	81%	88	86	85	83	80	85%
5	DeepSeek V3 (2025-03-24)	$0.0012	39.8s	79%	91	87	86	80	78	85%
93	Qwen3.6 Max Preview	$0.054	3.9m	82%	85	85	84	83	82	84%
39	GPT-5.1	$0.066	1.5m	83%	85	85	85	83	82	84%
7	Mistral Medium 3.1	$0.0047	41.0s	79%	88	83	83	82	81	83%
54	Claude Opus 4.8 (Reasoning, Low)	$0.073	48.6s	75%	90	86	81	81	79	83%
8	Grok 4.20	$0.0083	50.1s	80%	86	84	84	83	80	83%
14	Qwen3 235B A22B Instruct 2507	$0.0011	1.2m	78%	88	85	83	82	79	83%
9	Writer: Palmyra X5	$0.011	23.1s	77%	89	85	84	79	79	83%
6	Ministral 8B	$0.0003	12.7s	78%	87	84	83	81	77	83%
61	Qwen3.7 Max	$0.061	2.1m	81%	83	83	83	83	80	83%
10	ByteDance Seed 1.6 Flash	$0.0013	30.7s	77%	88	83	82	81	78	83%
4	Ministral 3 14B	$0.0006	17.7s	79%	85	84	83	82	79	82%
127	MoonshotAI: Kimi K2.6	$0.050	4.6m	76%	86	85	81	80	80	82%
22	Mistral Large 2	$0.013	34.9s	74%	89	85	80	79	78	82%
129	Claude Opus 4	$0.190	1.4m	77%	86	84	83	78	77	82%
37	Qwen 3.6 35B	$0.0062	54.2s	72%	87	86	85	83	67	82%
17	GPT-4.1	$0.017	34.0s	77%	86	81	81	81	78	81%
114	GPT-5	$0.074	2.9m	78%	84	84	82	79	78	81%
12	Mistral Small 4	$0.0015	23.5s	75%	88	82	81	79	78	81%
23	Z.AI GLM 5 Turbo	$0.0093	40.2s	75%	87	85	82	79	75	81%
87	Grok 4.3 (Reasoning)	$0.021	2.4m	74%	87	85	80	79	76	81%
13	o4 Mini	$0.017	27.0s	78%	84	83	83	81	76	81%
11	DeepSeek V4 Flash (Reasoning)	$0.0008	39.8s	78%	84	83	82	79	78	81%
25	Qwen 3.6 Flash	$0.0096	38.9s	75%	89	82	82	77	75	81%
18	Mistral Small 4 (Reasoning)	$0.0024	37.2s	75%	87	82	80	79	77	81%
83	MoonshotAI: Kimi K2.5	$0.016	2.6m	75%	85	84	82	81	74	81%
38	Claude Sonnet 4.5	$0.035	41.2s	77%	86	82	82	78	77	81%
59	Claude Opus 4.8 (Reasoning)	$0.070	46.2s	77%	85	82	82	79	75	81%
57	Claude Opus 4.7	$0.062	29.0s	74%	86	83	79	78	77	81%
28	Qwen 3 32B	$0.0015	1.4m	78%	83	81	81	80	78	81%
31	o4 Mini High	$0.027	43.2s	77%	84	81	80	80	78	81%
48	Claude Sonnet 4	$0.035	53.9s	75%	85	81	79	78	78	80%
74	Z.AI GLM 5.1	$0.015	1.6m	70%	87	86	79	77	73	80%
15	Grok 4.3	$0.0059	23.2s	77%	84	80	80	79	77	80%
58	Qwen 3.5 Plus (2026-04-20)	$0.014	1.4m	73%	88	82	82	78	71	80%
27	Ministral 3 8B	$0.0004	13.8s	71%	87	83	78	77	76	80%
107	Gemini 3.1 Pro (Preview)	$0.073	1.3m	72%	89	82	80	76	74	80%
64	Qwen 3.6 27B	$0.017	1.6m	74%	86	82	80	78	74	80%
32	Qwen 3.5 9B	$0.0010	1.2m	77%	82	81	80	79	77	80%
36	MiniMax M2.7	$0.0036	1.4m	77%	81	81	80	79	77	80%
26	GPT-5.4 Nano (Reasoning)	$0.0063	24.9s	74%	85	81	79	78	76	80%
42	Z.AI GLM 5.2 (Reasoning, High)	$0.015	1.3m	77%	82	80	80	80	77	80%
24	DeepSeek V4 Flash	$0.0007	31.4s	75%	85	80	79	79	76	80%
35	Qwen 3.5 Flash	$0.0028	1.2m	76%	82	80	79	79	78	80%
41	MiniMax M2.5	$0.0041	1.1m	75%	84	81	80	78	75	80%
50	DeepSeek V4 Pro	$0.0063	1.4m	74%	85	81	79	77	75	79%
40	Mistral Large 3	$0.0028	31.2s	72%	86	84	81	73	72	79%
43	Qwen 3.5 27B	$0.012	1.1m	75%	83	79	78	77	77	79%
115	Claude Opus 4.6	$0.081	1.3m	72%	83	83	80	80	69	79%
46	DeepSeek V3 (2024-12-26)	$0.0019	58.2s	73%	82	82	79	76	72	78%
21	GPT-5.4 Nano (Reasoning, Low)	$0.0054	19.1s	77%	80	79	79	76	76	78%
30	Claude Haiku 4.5	$0.010	21.0s	75%	82	79	79	78	74	78%
126	Claude Opus 4.6 (Reasoning)	$0.112	1.8m	74%	81	80	78	78	74	78%
86	MiniMax M3	$0.0043	2.6m	76%	80	79	79	78	76	78%
89	Qwen 3.5 397B A17B	$0.020	2.0m	74%	82	79	79	78	73	78%
63	Gemini 2.5 Pro	$0.043	44.1s	75%	81	80	79	77	73	78%
66	Hermes 3 405B	$0.0020	1.6m	73%	83	80	78	76	73	78%
119	GPT-5.2	$0.076	1.9m	76%	80	79	79	75	75	78%
71	Claude Sonnet 4.6	$0.035	45.3s	72%	82	80	77	77	73	78%
33	Mistral NeMO	$0.0004	19.7s	73%	81	79	78	77	73	78%
44	Qwen 2.5 72B	$0.0008	33.9s	72%	83	80	78	73	73	78%
29	Ministral 3B	$0.0001	7.0s	73%	80	79	79	79	70	78%
101	Claude Opus 4.7 (Reasoning)	$0.078	35.8s	73%	82	80	78	76	72	77%
117	DeepSeek V4 Pro (Reasoning)	$0.012	3.1m	73%	81	77	76	76	76	77%
34	Mistral Small 3.2 24B	$0.0006	22.9s	74%	80	79	78	78	72	77%
45	Xiaomi MIMO v2.5	$0.0062	37.4s	73%	82	78	78	75	74	77%
137	ByteDance Seed 2.0 Mini	$0.0044	4.9m	70%	84	81	78	74	70	77%
68	Aion 2.0	$0.0063	1.5m	74%	81	78	78	75	74	77%
51	Llama 3.1 Nemotron 70B	$0.0026	41.8s	72%	81	79	76	75	73	77%
53	Gemini 2.5 Flash Lite	$0.0009	11.6s	68%	83	82	77	75	67	77%
118	ByteDance Seed 1.6	$0.015	2.9m	74%	79	77	76	75	75	77%
99	Z.AI GLM 4.7	$0.010	1.7m	71%	81	78	75	75	72	76%
105	Gemini 3.5 Flash (Reasoning)	$0.047	28.9s	68%	81	80	74	73	73	76%
47	Qwen 3.5 122B	$0.015	36.9s	75%	77	77	77	75	74	76%
92	DeepSeek V3.2	$0.0013	1.1m	67%	82	82	74	73	70	76%
55	GPT-5 Mini	$0.011	43.2s	73%	78	78	76	76	73	76%
116	Claude Opus 4.5	$0.077	1.0m	73%	80	79	78	72	72	76%
70	Gemma 3 27B	$0.0004	1.1m	71%	80	79	76	74	72	76%
60	Z.AI GLM 4.5 Air	$0.0028	35.2s	70%	81	78	75	74	73	76%
85	Claude Sonnet 5 (Reasoning, Low)	$0.029	39.3s	71%	80	79	75	75	72	76%
113	Z.AI GLM 5	$0.0081	1.5m	66%	86	76	74	72	72	76%
52	GPT-5.4 Nano	$0.0057	21.3s	72%	81	76	76	75	71	76%
73	Cohere Command R+ (Aug. 2024)	$0.020	52.5s	72%	80	77	77	74	72	76%
132	Claude Sonnet 4.6 (Reasoning)	$0.072	1.7m	66%	85	78	74	71	71	76%
124	ByteDance Seed 2.0 Lite	$0.013	2.4m	68%	83	77	74	73	72	76%
84	Qwen 3.5 35B	$0.016	52.3s	71%	80	79	76	74	70	76%
49	Ministral 3 3B	$0.0003	10.1s	70%	80	78	75	72	72	76%
78	Z.AI GLM 4.5	$0.0066	55.8s	71%	79	78	75	74	71	75%
79	Hermes 3 70B	$0.0007	55.2s	70%	79	79	75	73	71	75%
77	Xiaomi MIMO v2.5 Pro	$0.0088	58.7s	71%	78	77	77	76	69	75%
82	WizardLM 2 8x22b	$0.0023	42.7s	69%	82	77	76	70	69	75%
112	Llama 3.1 70B	$0.0023	54.6s	63%	88	74	74	72	67	75%
67	Gemini 3.1 Flash Lite (Reasoning)	$0.0027	20.9s	69%	79	79	75	73	69	75%
62	Gemini 3.1 Flash Lite	$0.0027	17.0s	70%	78	77	73	73	72	75%
56	Gemini 2.5 Flash	$0.0065	13.6s	71%	78	77	76	74	69	75%
94	Z.AI GLM 4.7 Flash	$0.0016	1.0m	68%	82	75	75	70	70	74%
69	GPT-4o, Aug. 6th (temp=0)	$0.018	24.5s	72%	77	75	75	73	72	74%
100	DeepSeek-V2 Chat	$0.0019	47.3s	66%	80	78	73	72	67	74%
91	Gemini 3.5 Flash (Reasoning, Minimal)	$0.024	16.1s	68%	81	76	74	70	69	74%
106	Claude Sonnet 5 (Reasoning)	$0.029	39.8s	69%	78	76	76	74	65	74%
96	Gemma 3 12B	$0.0005	47.6s	67%	78	78	74	74	64	74%
88	Qwen 3.5 Plus (2026-02-15)	$0.0052	32.0s	69%	79	76	75	72	67	74%
109	Gemini 2.5 Flash (Reasoning)	$0.012	24.8s	65%	82	75	72	70	69	74%
90	Gemini 3 Flash (Preview)	$0.0069	17.7s	66%	81	77	73	70	68	74%
98	Cydonia 24B V4.1	$0.0014	55.1s	68%	79	77	75	70	67	73%
108	GPT-4o, Aug. 6th (temp=1)	$0.018	34.1s	67%	80	73	72	72	71	73%
72	Gemini 3 Flash (Preview, Reasoning)	$0.010	24.1s	71%	76	75	74	71	71	73%
81	GPT-4o Mini (temp=0)	$0.0012	27.7s	69%	77	74	73	72	71	73%
104	Claude Sonnet 5	$0.025	35.3s	69%	76	76	73	71	70	73%
75	Arcee AI: Trinity Mini	$0.0003	10.5s	68%	77	75	73	73	67	73%
120	Gemini 2.5 Flash Lite (Reasoning)	$0.0032	43.9s	62%	81	79	71	71	64	73%
122	Gemini 3 Pro (Preview)	$0.051	49.5s	70%	75	75	74	72	69	73%
102	Z.AI GLM 4.6	$0.0077	57.0s	69%	75	75	72	72	70	73%
97	Gemma 3 4B	$0.0002	19.2s	66%	76	75	71	70	68	72%
133	Gemma 4 26B	$0.0007	2.6m	66%	78	75	73	70	64	72%
103	Gemini 3.1 Flash Lite (Preview)	$0.0027	7.9s	65%	78	74	72	67	66	71%
110	GPT-4o Mini (temp=1)	$0.0012	31.4s	67%	74	74	71	69	67	71%
131	DeepSeek V3.1	$0.0020	2.4m	67%	75	72	72	69	65	70%
125	Gemma 4 31B	$0.0008	1.2m	65%	74	72	69	68	67	70%
111	GPT-4.1 Mini	$0.0024	18.3s	66%	73	72	70	67	67	70%
130	Gemma 4 31B (Reasoning)	$0.0012	1.4m	63%	75	74	69	65	65	69%
128	Gemma 4 26B (Reasoning)	$0.0018	1.7m	66%	70	70	68	68	67	69%
123	GPT-4.1 Nano	$0.0007	12.1s	63%	71	70	67	67	64	68%
121	Inception Mercury 2	$0.0028	7.6s	66%	70	68	68	66	65	67%
136	Nemotron 3 Super	$0.0000	53.0s	57%	77	69	64	64	61	67%
138	GPT-OSS 120B	$0.0019	2.5m	62%	72	68	66	65	63	67%
135	Nemotron 3 Nano	$0.0008	57.0s	59%	72	71	64	63	63	66%
134	GPT-5 Nano	$0.0037	1.2m	63%	70	67	66	65	63	66%
77.86%

Median	Evaluator	Top 3	Flop 3
98.3%	"Not X but Y" pattern overuse	100Qwen 3.5 9B 100Z.AI GLM 4.5 Air 100Qwen 2.5 72B	2GPT-5 Nano 32Nemotron 3 Super 45Xiaomi MIMO v2.5 Pro
54.3%	Adverb-first sentence starts	100GPT-5.4 Mini 100GPT-5.4 Mini (Reasoning, Low) 100Mistral Small 4	0Claude Sonnet 5 (Reasoning) 0ByteDance Seed 2.0 Lite 0ByteDance Seed 1.6
83.7%	Adverbs in dialogue tags	100GPT-5.5 100Qwen3.6 Max Preview 100Llama 3.1 70B	1GPT-4.1 Nano 7GPT-4.1 Mini 19DeepSeek V3.1
83.8%	AI-ism adverb frequency	97ByteDance Seed 2.0 Lite 97ByteDance Seed 1.6 96ByteDance Seed 1.6 Flash	52GPT-4.1 Nano 60Cydonia 24B V4.1 67Gemma 3 4B
100.0%	AI-ism character names	100Gemma 4 31B (Reasoning) 100Gemini 2.5 Flash 100Nemotron 3 Nano	68Claude Opus 4 76Claude Sonnet 4 80Claude Opus 4.5
100.0%	AI-ism location names	100Gemma 4 26B 100DeepSeek V4 Flash 100Grok 4.20	—
51.3%	AI-ism word frequency	88Claude Opus 4.7 (Reasoning) 86Claude Opus 4.7 86Claude Opus 4.8 (Reasoning)	0GPT-4o Mini (temp=0) 0GPT-4o, Aug. 6th (temp=0) 0GPT-4o, Aug. 6th (temp=1)
100.0%	Cliché density	100Ministral 3 8B 100Mistral Small 4 (Reasoning) 100GPT-5	40Gemini 2.5 Flash 60Mistral Small 3.2 24B 67GPT-4o Mini (temp=0)
93.9%	Dialogue tag variety (said vs. fancy)	100GPT-5.5 (Reasoning, Low) 100GPT-5.4 100Ministral 3 14B	28GPT-4o, Aug. 6th (temp=1) 32Qwen 3.5 Plus (2026-02-15) 34GPT-4o Mini (temp=1)
23.8%	Em-dash & semicolon overuse	100Qwen3.7 Max 100Mistral NeMO 100Gemini 3.1 Pro (Preview)	0Qwen3 235B A22B Instruct 2507 0Gemma 4 31B (Reasoning) 0Nemotron 3 Nano
100.0%	Emotion telling (show vs. tell)	100Gemini 3 Pro (Preview) 100GPT-5.5 (Reasoning, Low) 100GPT-4.1	86Llama 3.1 70B 90Mistral NeMO 91Hermes 3 405B
97.8%	Filter word density	100Aion 2.0 100Grok 4.20 (Reasoning) 100DeepSeek V3.1	22Nemotron 3 Nano 34Inception Mercury 2 38Llama 3.1 Nemotron 70B
100.0%	Gibberish response detection	100GPT-5.4 (Reasoning) 100GPT-4o Mini (temp=0) 100Qwen3 235B A22B Instruct 2507	52Llama 3.1 70B 80Z.AI GLM 4.5 80WizardLM 2 8x22b
100.0%	Markdown formatting overuse	100DeepSeek V3 (2024-12-26) 100GPT-5.1 100GPT-5.5	80Mistral Medium 3.1 98ByteDance Seed 1.6 Flash
100.0%	Missing dialogue indicators (quotation marks)	100Mistral Large 3 100GPT-4.1 100GPT-5.5	0Qwen3.7 Max 2Qwen3.6 Max Preview 19Qwen 3.5 397B A17B
67.4%	Name drop frequency	100Z.AI GLM 4.6 100Grok 4.20 100Grok 4.20 (Reasoning)	0GPT-5.1 0GPT-5.5 (Reasoning, Low) 0GPT-5.5 (Reasoning)
69.4%	Narrator intent-glossing	100Cohere Command R+ (Aug. 2024) 99ByteDance Seed 2.0 Lite 98Writer: Palmyra X5	0Nemotron 3 Nano 2GPT-5 Nano 8GPT-5 Mini
100.0%	Overuse of "that" (subordinate clause padding)	100Grok 4.20 (Reasoning) 100ByteDance Seed 1.6 Flash 100DeepSeek V3 (2025-03-24)	72ByteDance Seed 2.0 Mini 78Claude Haiku 4.5 79Llama 3.1 70B
100.0%	Paragraph length variance	100Z.AI GLM 5.2 (Reasoning, High) 100GPT-4o, Aug. 6th (temp=0) 100Gemma 3 4B	65GPT-5 Nano 66Arcee AI: Trinity Mini 73Cohere Command R+ (Aug. 2024)
96.7%	Passive voice overuse	100GPT-4o, Aug. 6th (temp=0) 100o4 Mini High 100GPT-4o Mini (temp=0)	81Claude Sonnet 5 (Reasoning, Low) 86Gemma 4 31B 86Gemma 4 26B (Reasoning)
93.2%	Past progressive (was/were + -ing) overuse	100DeepSeek V3 (2025-03-24) 100Cydonia 24B V4.1 100Gemini 3.1 Flash Lite (Preview)	25Z.AI GLM 4.6 26Claude Sonnet 4.6 40Claude Opus 4.7 (Reasoning)
47.8%	Pronoun-first sentence starts	100GPT-5.5 (Reasoning, Low) 100GPT-5.4 Mini (Reasoning) 100GPT-5.5 (Reasoning)	0Gemma 3 4B 0o4 Mini High 0DeepSeek V3.1
94.9%	Purple prose (modifier overload)	100MoonshotAI: Kimi K2.5 100Nemotron 3 Super 100DeepSeek V4 Flash	76GPT-4.1 Nano 79GPT-5.4 Mini (Reasoning, Low) 82Cydonia 24B V4.1
100.0%	Repeated phrase echo	100Claude Sonnet 4.6 (Reasoning) 100Claude Sonnet 5 (Reasoning, Low) 100DeepSeek V4 Flash (Reasoning)	—
100.0%	Sentence length variance	100Claude Haiku 4.5 100Hermes 3 70B 100Claude Opus 4.6	96GPT-4o, Aug. 6th (temp=1) 96Llama 3.1 70B 99Cohere Command R+ (Aug. 2024)
48.3%	Sentence opener variety	86GPT-4o, Aug. 6th (temp=1) 85DeepSeek V3 (2025-03-24) 84GPT-4o Mini (temp=1)	32Qwen 3.5 9B 33Gemini 3.1 Flash Lite 33Qwen 3.5 397B A17B
26.8%	Subject-first sentence starts	79Qwen3 235B A22B Instruct 2507 73Mistral Small 4 70GPT-5.5 (Reasoning, Low)	0Qwen 3.5 9B 0GPT-OSS 120B 0ByteDance Seed 1.6
33.3%	Subordinate conjunction sentence starts	82GPT-5.5 80Ministral 3 8B 78Llama 3.1 Nemotron 70B	0Claude Opus 4.8 (Reasoning) 0Claude Opus 4.7 (Reasoning) 0GPT-4o Mini (temp=0)
63.4%	Technical jargon density	100Qwen 2.5 72B 99o4 Mini High 97Ministral 3 14B	0Claude Sonnet 5 (Reasoning) 0GPT-5 Nano 0Nemotron 3 Nano
64.8%	Useless dialogue additions	100Qwen3.7 Max 100Gemini 3.1 Flash Lite (Reasoning) 100Qwen 3.5 397B A17B	0Qwen 2.5 72B 0Gemma 3 4B 0Gemini 2.5 Flash Lite

Bad Writing Habits

Literary fiction: old friends reunite

Performance Score Distribution (Top 20)

Price-Performance Score Distribution (Top 20)

Most Stable Models (Top 20)

Top Overall Models (Top 20)