Romance: separated couple reunites

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Creative Writing Hallucination

Performance Score Distribution (Top 20)

Click a model name to view its detail page.

	Score
GPT-5.4	92%
GPT-5.4 (Reasoning)	91%
GPT-5.4 (Reasoning, Low)	90%
GPT-5.5 (Reasoning, Low)	90%
GPT-5.5	90%
GPT-5.4 Mini (Reasoning, Low)	90%
GPT-5.4 Mini (Reasoning)	89%
GPT-5.5 (Reasoning)	89%
GPT-5.4 Mini	88%
Qwen3.6 Max Preview	85%
DeepSeek V3 (2025-03-24)	85%
Claude Opus 4	85%
Z.AI GLM 5 Turbo	84%
Qwen 3.6 35B	83%
GPT-5.1	83%
MiniMax M2.7	83%
Qwen3 235B A22B Instruct 2507	83%
Qwen 3.5 397B A17B	83%
DeepSeek V4 Flash	83%
Z.AI GLM 5.1	82%

	Score	Cost	Time
GPT-5.4 Mini (Reasoning)	89%	$0.019	21.8s
GPT-5.4 Mini (Reasoning, Low)	90%	$0.017	19.1s
GPT-5.4 Mini	88%	$0.018	19.0s
DeepSeek V3 (2025-03-24)	85%	$0.0011	38.6s
DeepSeek V4 Flash	83%	$0.0006	29.5s
Qwen 3.6 35B	83%	$0.0099	1.0m
Qwen 3.6 Flash	81%	$0.0094	40.4s
Mistral Small 4	81%	$0.0016	23.0s
MiniMax M2.7	83%	$0.0039	1.1m
DeepSeek V4 Flash (Reasoning)	81%	$0.0006	31.5s
Mistral Small 4 (Reasoning)	82%	$0.0025	37.5s
Qwen3 235B A22B Instruct 2507	83%	$0.0006	52.1s
Writer: Palmyra X5	80%	$0.011	23.9s
GPT-5.4	92%	$0.065	1.9m
Z.AI GLM 5 Turbo	84%	$0.0082	37.6s
ByteDance Seed 1.6 Flash	81%	$0.0012	26.6s
Z.AI GLM 5.1	82%	$0.012	57.8s
Ministral 3 14B	79%	$0.0005	12.6s
Ministral 3 3B	79%	$0.0002	4.1s
Qwen 3.5 35B	82%	$0.0096	39.3s

	Score	Cost	Speed	Stability
GPT-5.4 Mini (Reasoning, Low)	90%	$0.017	19.1s	88%
GPT-5.4 Mini (Reasoning)	89%	$0.019	21.8s	86%
GPT-5.4 Mini	88%	$0.018	19.0s	85%
GPT-5.4	92%	$0.065	1.9m	90%
DeepSeek V3 (2025-03-24)	85%	$0.0011	38.6s	81%
GPT-5.4 (Reasoning, Low)	90%	$0.067	1.7m	88%
Mistral Small 4	81%	$0.0016	23.0s	79%
Mistral Small 4 (Reasoning)	82%	$0.0025	37.5s	79%
Qwen3 235B A22B Instruct 2507	83%	$0.0006	52.1s	78%
DeepSeek V4 Flash	83%	$0.0006	29.5s	77%
ByteDance Seed 1.6 Flash	81%	$0.0012	26.6s	77%
Qwen 3.6 35B	83%	$0.0099	1.0m	79%
Qwen 3.5 35B	82%	$0.0096	39.3s	79%
Z.AI GLM 5 Turbo	84%	$0.0082	37.6s	76%
MiniMax M2.7	83%	$0.0039	1.1m	78%
DeepSeek V4 Flash (Reasoning)	81%	$0.0006	31.5s	76%
Z.AI GLM 5.1	82%	$0.012	57.8s	78%
Mistral Large 3	81%	$0.0028	32.4s	76%
Mistral Medium 3.1	81%	$0.0054	44.5s	78%
Qwen 3.5 Flash	80%	$0.0020	36.1s	77%

Rank	Model	Avg. Cost	Avg. Time	Stability	# 1	# 2	# 3	# 4	# 5	Total
4	GPT-5.4	$0.065	1.9m	90%	94	93	93	91	90	92%
22	GPT-5.4 (Reasoning)	$0.092	2.6m	87%	93	91	90	90	88	91%
6	GPT-5.4 (Reasoning, Low)	$0.067	1.7m	88%	92	91	91	90	88	90%
50	GPT-5.5 (Reasoning, Low)	$0.167	2.2m	89%	91	90	90	90	89	90%
61	GPT-5.5	$0.180	1.9m	87%	91	91	89	89	89	90%
1	GPT-5.4 Mini (Reasoning, Low)	$0.017	19.1s	88%	91	90	89	89	89	90%
2	GPT-5.4 Mini (Reasoning)	$0.019	21.8s	86%	91	91	90	89	86	89%
57	GPT-5.5 (Reasoning)	$0.169	2.1m	87%	91	90	89	88	88	89%
3	GPT-5.4 Mini	$0.018	19.0s	85%	91	89	88	86	85	88%
53	Qwen3.6 Max Preview	$0.049	4.2m	84%	86	86	86	85	83	85%
5	DeepSeek V3 (2025-03-24)	$0.0011	38.6s	81%	87	86	84	84	82	85%
130	Claude Opus 4	$0.218	1.5m	80%	88	86	84	82	82	85%
14	Z.AI GLM 5 Turbo	$0.0082	37.6s	76%	89	89	83	82	78	84%
12	Qwen 3.6 35B	$0.0099	1.0m	79%	86	85	84	83	77	83%
46	GPT-5.1	$0.058	1.3m	77%	89	84	83	82	79	83%
15	MiniMax M2.7	$0.0039	1.1m	78%	87	86	84	80	77	83%
9	Qwen3 235B A22B Instruct 2507	$0.0006	52.1s	78%	87	84	84	81	77	83%
31	Qwen 3.5 397B A17B	$0.013	2.1m	79%	86	83	82	82	81	83%
10	DeepSeek V4 Flash	$0.0006	29.5s	77%	88	85	83	80	76	83%
17	Z.AI GLM 5.1	$0.012	57.8s	78%	86	85	84	82	76	82%
8	Mistral Small 4 (Reasoning)	$0.0025	37.5s	79%	85	82	82	81	79	82%
30	DeepSeek V4 Pro	$0.0026	1.7m	77%	86	84	84	78	76	82%
13	Qwen 3.5 35B	$0.0096	39.3s	79%	84	83	82	81	79	82%
25	Qwen 3.6 Flash	$0.0094	40.4s	75%	85	84	83	82	72	81%
7	Mistral Small 4	$0.0016	23.0s	79%	83	82	82	82	78	81%
11	ByteDance Seed 1.6 Flash	$0.0012	26.6s	77%	85	82	80	80	80	81%
23	Grok 4.3	$0.0064	38.2s	75%	86	83	80	80	77	81%
66	Claude Opus 4.6	$0.074	1.2m	76%	85	83	81	80	77	81%
16	DeepSeek V4 Flash (Reasoning)	$0.0006	31.5s	76%	86	83	82	77	77	81%
18	Mistral Large 3	$0.0028	32.4s	76%	85	82	80	80	77	81%
47	Claude Opus 4.8 (Reasoning)	$0.066	44.3s	78%	85	83	83	78	77	81%
34	Grok 4.20 (Reasoning)	$0.014	1.2m	77%	85	83	82	78	76	81%
35	Qwen 3 32B	$0.0014	52.4s	74%	85	84	79	79	76	81%
19	Mistral Medium 3.1	$0.0054	44.5s	78%	83	82	81	80	77	81%
56	Claude Sonnet 4.5	$0.034	43.0s	71%	89	82	80	78	72	80%
33	GPT-4.1	$0.017	35.0s	76%	84	83	80	78	77	80%
122	DeepSeek V4 Pro (Reasoning)	$0.027	4.1m	72%	88	81	78	78	76	80%
28	Writer: Palmyra X5	$0.011	23.9s	75%	84	84	82	76	74	80%
42	Z.AI GLM 5	$0.0083	1.2m	74%	84	83	78	78	77	80%
39	Aion 2.0	$0.0055	1.3m	76%	84	81	80	79	76	80%
70	MiniMax M3	$0.0043	2.5m	72%	86	82	78	77	76	80%
120	GPT-5	$0.064	2.5m	72%	85	85	80	79	71	80%
20	Qwen 3.5 Flash	$0.0020	36.1s	77%	82	81	79	79	78	80%
38	Hermes 3 405B	$0.0022	1.6m	77%	82	81	80	79	77	80%
32	Mistral Large 2	$0.012	33.6s	76%	83	82	81	80	74	80%
24	Qwen 3.5 122B	$0.013	38.8s	78%	82	80	79	79	78	80%
71	Claude Opus 4.5	$0.067	53.3s	75%	83	81	80	80	74	80%
65	Claude Opus 4.7	$0.060	28.8s	73%	85	80	79	79	73	79%
63	Claude Opus 4.8 (Reasoning, Low)	$0.059	40.2s	75%	84	81	81	78	72	79%
137	MoonshotAI: Kimi K2.6	$0.050	7.5m	75%	82	81	78	78	76	79%
41	Ministral 3 14B	$0.0005	12.6s	70%	89	81	80	76	71	79%
21	Ministral 3 3B	$0.0002	4.1s	75%	82	80	79	79	75	79%
40	Grok 4.20	$0.0092	45.3s	75%	82	82	80	77	74	79%
26	Ministral 3 8B	$0.0007	21.1s	75%	83	80	78	77	77	79%
43	Claude Sonnet 4	$0.034	50.9s	77%	80	80	79	78	77	79%
118	Qwen3.7 Max	$0.063	2.2m	73%	84	81	79	76	74	79%
44	Cohere Command R+ (Aug. 2024)	$0.020	43.0s	74%	81	81	78	78	75	79%
69	Claude Opus 4.7 (Reasoning)	$0.063	28.9s	74%	82	82	80	77	72	79%
37	Gemma 3 27B	$0.0005	55.2s	75%	82	79	78	78	76	79%
121	Gemini 3.1 Pro (Preview)	$0.076	1.4m	69%	85	84	77	74	73	79%
102	Claude Opus 4.6 (Reasoning)	$0.081	1.3m	75%	82	80	79	75	75	78%
29	GPT-5.4 Nano (Reasoning, Low)	$0.0056	20.6s	75%	81	79	79	78	74	78%
27	GPT-5.4 Nano	$0.0056	20.2s	77%	79	78	78	77	76	78%
94	DeepSeek V3.2	$0.0013	3.1m	73%	83	80	80	74	72	78%
60	Gemini 2.5 Pro	$0.039	40.6s	74%	82	79	78	76	73	78%
36	GPT-5.4 Nano (Reasoning)	$0.0059	22.2s	75%	80	78	77	76	75	78%
48	Z.AI GLM 5.2 (Reasoning, High)	$0.011	59.1s	74%	80	80	79	75	73	77%
111	GPT-5.2	$0.071	1.8m	75%	79	78	77	76	75	77%
83	Claude Sonnet 5 (Reasoning, Low)	$0.027	37.0s	69%	83	80	75	74	73	77%
51	Xiaomi MIMO v2.5	$0.0060	36.5s	72%	82	78	76	74	74	77%
79	Qwen 3.5 Plus (2026-04-20)	$0.014	1.5m	71%	83	77	77	74	71	77%
55	o4 Mini High	$0.020	37.2s	73%	80	78	78	77	71	77%
73	Claude Sonnet 4.6 (Reasoning)	$0.037	50.1s	73%	80	77	76	76	74	77%
67	GPT-5 Mini	$0.0097	41.1s	69%	82	80	75	74	73	77%
59	o4 Mini	$0.014	23.5s	70%	82	81	77	75	69	77%
62	Qwen 3.5 27B	$0.011	58.5s	72%	80	78	75	75	74	77%
52	Qwen 3.5 9B	$0.0009	1.2m	73%	79	78	77	75	73	76%
58	MiniMax M2.5	$0.0047	52.2s	71%	81	77	76	74	73	76%
64	Hermes 3 70B	$0.0006	49.8s	70%	81	80	80	74	65	76%
115	Grok 4.3 (Reasoning)	$0.017	2.6m	72%	81	78	77	73	71	76%
45	Qwen 2.5 72B	$0.0009	37.0s	74%	77	77	75	75	74	76%
98	ByteDance Seed 2.0 Lite	$0.011	2.2m	72%	78	78	76	73	73	76%
126	MoonshotAI: Kimi K2.5	$0.021	3.1m	70%	82	78	78	70	69	75%
54	Llama 3.1 70B	$0.0008	39.0s	72%	77	77	74	73	73	75%
99	Gemini 3.5 Flash (Reasoning)	$0.049	28.2s	71%	79	79	77	70	70	75%
133	ByteDance Seed 2.0 Mini	$0.0047	5.2m	71%	78	74	74	74	74	75%
49	Gemini 3.1 Flash Lite (Reasoning)	$0.0028	9.1s	72%	78	75	74	74	73	75%
116	Qwen 3.6 27B	$0.024	2.2m	71%	78	77	75	73	71	75%
80	Gemini 2.5 Flash	$0.0062	12.8s	67%	80	80	73	72	69	75%
85	DeepSeek V3.1	$0.0022	1.8m	72%	77	76	74	73	73	75%
101	DeepSeek V3 (2024-12-26)	$0.0019	1.0m	66%	83	75	73	72	71	75%
68	Claude Haiku 4.5	$0.011	23.8s	70%	78	78	76	72	68	74%
76	Cydonia 24B V4.1	$0.0011	44.9s	69%	78	76	75	74	68	74%
89	Claude Sonnet 5 (Reasoning)	$0.026	36.5s	70%	77	76	74	73	70	74%
84	Z.AI GLM 4.7	$0.0092	58.9s	70%	78	74	73	73	72	74%
88	Xiaomi MIMO v2.5 Pro	$0.0090	1.0m	70%	78	76	74	73	69	74%
75	Gemma 3 12B	$0.0005	34.2s	69%	78	76	75	74	66	74%
72	Claude Sonnet 5	$0.022	31.9s	73%	75	74	74	73	73	74%
95	Z.AI GLM 4.6	$0.0069	46.0s	68%	78	77	75	73	66	74%
81	Ministral 3B	$0.0001	4.9s	66%	82	78	76	67	65	74%
93	Mistral NeMO	$0.0003	9.8s	65%	82	75	73	71	67	74%
103	Gemini 3.5 Flash (Reasoning, Minimal)	$0.018	12.9s	66%	82	76	74	70	66	73%
82	Gemini 2.5 Flash (Reasoning)	$0.012	25.2s	70%	77	75	73	71	70	73%
96	Claude Sonnet 4.6	$0.030	40.1s	71%	75	75	74	73	69	73%
87	Gemini 2.5 Flash Lite (Reasoning)	$0.0030	37.4s	68%	78	77	75	70	67	73%
131	ByteDance Seed 1.6	$0.016	3.0m	69%	78	76	74	70	69	73%
91	GPT-4o, Aug. 6th (temp=0)	$0.015	23.6s	69%	78	76	74	70	68	73%
112	GPT-4o, Aug. 6th (temp=1)	$0.017	27.8s	66%	79	76	71	70	69	73%
106	Ministral 8B	$0.0004	17.2s	63%	82	75	71	70	67	73%
77	Gemini 3 Flash (Preview)	$0.0072	19.2s	70%	76	74	73	71	69	72%
109	GPT-4o Mini (temp=0)	$0.0012	32.7s	65%	81	74	71	68	68	72%
78	Gemini 3.1 Flash Lite	$0.0027	8.8s	69%	75	74	71	71	70	72%
86	Arcee AI: Trinity Mini	$0.0003	10.1s	67%	78	75	75	69	64	72%
125	Gemini 3 Pro (Preview)	$0.053	53.4s	69%	75	74	71	71	70	72%
114	Z.AI GLM 4.5	$0.0050	45.5s	66%	78	74	71	70	67	72%
74	Gemini 2.5 Flash Lite	$0.0009	12.4s	70%	74	73	72	71	70	72%
104	Qwen 3.5 Plus (2026-02-15)	$0.0052	29.5s	66%	78	74	72	70	66	72%
100	DeepSeek-V2 Chat	$0.0018	50.7s	68%	75	73	73	73	66	72%
97	Gemma 4 31B	$0.0008	47.1s	69%	74	73	73	73	67	72%
123	Gemma 4 26B (Reasoning)	$0.0010	1.6m	66%	76	75	71	70	67	72%
119	Z.AI GLM 4.5 Air	$0.0029	41.5s	65%	76	75	70	68	67	71%
92	Llama 3.1 Nemotron 70B	$0.0023	35.4s	70%	73	72	72	71	68	71%
108	Mistral Small 3.2 24B	$0.0006	19.3s	65%	78	76	76	68	59	71%
105	Gemini 3 Flash (Preview, Reasoning)	$0.011	26.9s	68%	74	73	72	71	65	71%
90	GPT-4.1 Mini	$0.0023	14.4s	68%	73	72	71	70	69	71%
110	GPT-4o Mini (temp=1)	$0.0012	29.9s	66%	74	73	69	69	68	71%
113	Gemma 4 26B	$0.0007	38.2s	66%	75	72	70	69	66	71%
107	Gemini 3.1 Flash Lite (Preview)	$0.0027	8.7s	65%	76	73	72	67	64	71%
128	Gemma 4 31B (Reasoning)	$0.0012	2.0m	68%	73	71	70	68	67	70%
129	Z.AI GLM 4.7 Flash	$0.0018	1.5m	65%	73	71	69	67	67	69%
117	Gemma 3 4B	$0.0002	16.7s	65%	72	72	70	70	63	69%
127	GPT-4.1 Nano	$0.0006	11.1s	63%	70	69	66	66	64	67%
134	GPT-5 Nano	$0.0039	1.2m	62%	72	68	66	65	64	67%
124	WizardLM 2 8x22b	$0.0016	24.8s	65%	69	68	67	66	65	67%
135	Nemotron 3 Super	$0.0000	1.2m	61%	69	69	66	63	60	66%
132	GPT-OSS 120B	$0.0013	54.4s	63%	68	68	67	65	60	65%
136	Inception Mercury 2	$0.0029	7.6s	59%	65	62	62	61	58	62%
138	Nemotron 3 Nano	$0.0009	1.2m	58%	64	62	60	60	58	61%
77.13%

Median	Evaluator	Top 3	Flop 3
100.0%	"Not X but Y" pattern overuse	100ByteDance Seed 1.6 Flash 100Claude Sonnet 4 100Gemma 4 31B (Reasoning)	32GPT-5 Nano 49Xiaomi MIMO v2.5 Pro 51Nemotron 3 Nano
57.1%	Adverb-first sentence starts	100GPT-5.4 Mini (Reasoning) 100Grok 4.20 100GPT-5.4 Mini (Reasoning, Low)	6Inception Mercury 2 8Nemotron 3 Nano 9Mistral Small 3.2 24B
92.8%	Adverbs in dialogue tags	100GPT-5.5 (Reasoning, Low) 100MiniMax M3 100Claude Opus 4.7 (Reasoning)	0GPT-4.1 Nano 32GPT-5 Nano 35Claude Haiku 4.5
83.1%	AI-ism adverb frequency	99ByteDance Seed 2.0 Lite 97ByteDance Seed 1.6 94ByteDance Seed 1.6 Flash	53Mistral Small 3.2 24B 53GPT-4.1 Nano 64Gemma 3 4B
100.0%	AI-ism character names	100Inception Mercury 2 100Hermes 3 70B 100GPT-5.4 Mini (Reasoning)	92Claude Sonnet 4 92Qwen 3.6 35B 96Gemma 3 27B
100.0%	AI-ism location names	100Writer: Palmyra X5 100Claude Sonnet 5 (Reasoning) 100Qwen 3.5 Flash	96DeepSeek V3.2 96o4 Mini
48.4%	AI-ism word frequency	86Claude Opus 4.7 85Claude Sonnet 4.6 84ByteDance Seed 1.6 Flash	0GPT-4o Mini (temp=0) 0Llama 3.1 Nemotron 70B 0GPT-4o Mini (temp=1)
100.0%	Cliché density	100o4 Mini High 100DeepSeek V4 Flash (Reasoning) 100Gemini 3 Flash (Preview)	33Qwen 2.5 72B 47Inception Mercury 2 53Mistral NeMO
80.6%	Dialogue tag variety (said vs. fancy)	100GPT-5.5 (Reasoning) 100Qwen 3.5 27B 100Z.AI GLM 5.1	0Gemma 4 31B 0Gemma 4 31B (Reasoning) 0Gemini 3.5 Flash (Reasoning, Minimal)
16.5%	Em-dash & semicolon overuse	100Qwen3.6 Max Preview 100GPT-5.4 Mini 100Qwen 3.5 397B A17B	0GPT-4.1 0Claude Opus 4.8 (Reasoning, Low) 0Gemini 3 Flash (Preview, Reasoning)
100.0%	Emotion telling (show vs. tell)	100GPT-5.2 100GPT-5.4 Mini (Reasoning) 100GPT-5	83Llama 3.1 70B 90Mistral Small 3.2 24B 95Nemotron 3 Nano
95.2%	Filter word density	100Grok 4.3 100o4 Mini High 100Mistral Large 2	2Nemotron 3 Nano 9Inception Mercury 2 26Gemini 3.1 Flash Lite
100.0%	Gibberish response detection	100Gemini 3 Flash (Preview) 100Grok 4.20 100Xiaomi MIMO v2.5 Pro	80Llama 3.1 70B 87DeepSeek V3 (2025-03-24) 98ByteDance Seed 2.0 Mini
100.0%	Markdown formatting overuse	100Grok 4.20 100GPT-5 Mini 100DeepSeek V3 (2024-12-26)	80Ministral 3B
100.0%	Missing dialogue indicators (quotation marks)	100MiniMax M2.5 100Z.AI GLM 4.7 100DeepSeek V3 (2025-03-24)	0Qwen3.6 Max Preview 22Qwen 3.5 35B 26Qwen 3.5 397B A17B
90.6%	Name drop frequency	100Gemma 4 26B (Reasoning) 100GPT-5 Mini 100GPT-4.1 Nano	29Mistral Large 2 30GPT-5.4 Nano 41Ministral 8B
70.5%	Narrator intent-glossing	100Gemini 3.5 Flash (Reasoning) 100Gemini 3.1 Pro (Preview) 100Mistral NeMO	0Nemotron 3 Nano 0GPT-5 Nano 0Inception Mercury 2
100.0%	Overuse of "that" (subordinate clause padding)	100GPT-4.1 Mini 100GPT-5.2 100Gemma 4 31B (Reasoning)	49Mistral Small 3.2 24B 50ByteDance Seed 2.0 Mini 58ByteDance Seed 2.0 Lite
100.0%	Paragraph length variance	100Ministral 3 14B 100Claude Opus 4.7 100DeepSeek V4 Flash	47WizardLM 2 8x22b 51GPT-5 Nano 59Nemotron 3 Nano
97.6%	Passive voice overuse	100Gemma 3 12B 100DeepSeek V3 (2025-03-24) 100Claude Opus 4.5	88Claude Sonnet 5 (Reasoning) 91ByteDance Seed 2.0 Mini 91Gemini 3.1 Flash Lite (Reasoning)
93.3%	Past progressive (was/were + -ing) overuse	100GPT-5.5 100WizardLM 2 8x22b 100Ministral 3 3B	31MiniMax M2.5 33Claude Sonnet 4.6 37Claude Sonnet 5 (Reasoning, Low)
25.6%	Pronoun-first sentence starts	100Llama 3.1 Nemotron 70B 96GPT-5.4 Mini 96Llama 3.1 70B	0Gemini 3.1 Flash Lite (Preview) 0Z.AI GLM 4.7 Flash 0ByteDance Seed 2.0 Mini
94.2%	Purple prose (modifier overload)	100DeepSeek V4 Flash (Reasoning) 100ByteDance Seed 2.0 Mini 100Nemotron 3 Nano	79GPT-4.1 Nano 79Gemma 3 4B 82GPT-5.4 (Reasoning, Low)
100.0%	Repeated phrase echo	100DeepSeek V4 Flash 100Ministral 8B 100GPT-5.5 (Reasoning, Low)	—
100.0%	Sentence length variance	100Nemotron 3 Super 100Gemini 3.5 Flash (Reasoning) 100Nemotron 3 Nano	91WizardLM 2 8x22b 96GPT-4o, Aug. 6th (temp=1) 99Mistral NeMO
48.1%	Sentence opener variety	87GPT-4o, Aug. 6th (temp=1) 83Claude Sonnet 5 (Reasoning, Low) 78GPT-4.1 Mini	33Qwen 3.5 35B 34Qwen 3.5 9B 35Qwen 3.6 35B
28.5%	Subject-first sentence starts	87GPT-5.4 79GPT-5.4 Mini (Reasoning, Low) 77GPT-5.4 (Reasoning)	0GPT-OSS 120B 2Qwen 3.5 Plus (2026-02-15) 2Inception Mercury 2
32.0%	Subordinate conjunction sentence starts	98Qwen3.6 Max Preview 85Qwen 3.6 Flash 84Gemini 3.1 Flash Lite (Reasoning)	0MoonshotAI: Kimi K2.6 0GPT-4o, Aug. 6th (temp=0) 0Claude Sonnet 4.6
60.9%	Technical jargon density	100DeepSeek V3 (2025-03-24) 100GPT-5.4 (Reasoning) 96GPT-5.1	0GPT-5 Nano 0Inception Mercury 2 0Nemotron 3 Nano
45.1%	Useless dialogue additions	100Qwen3.6 Max Preview 100Gemini 3.1 Flash Lite 100Gemini 3.1 Flash Lite (Reasoning)	0GPT-4o, Aug. 6th (temp=0) 0Arcee AI: Trinity Mini 0Inception Mercury 2

Bad Writing Habits

Romance: separated couple reunites

Performance Score Distribution (Top 20)

Price-Performance Score Distribution (Top 20)

Most Stable Models (Top 20)

Top Overall Models (Top 20)