Fantasy: entering an ancient ruin

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Creative Writing Hallucination

Performance Score Distribution (Top 20)

Click a model name to view its detail page.

	Score
GPT-5.5 (Reasoning, Low)	90%
GPT-5.4 (Reasoning)	89%
GPT-5.5 (Reasoning)	89%
GPT-5.5	87%
GPT-5.4 (Reasoning, Low)	86%
GPT-5.4 Mini (Reasoning)	84%
GPT-5.4	84%
GPT-5.4 Mini	83%
Qwen3.6 Max Preview	83%
GPT-5.4 Mini (Reasoning, Low)	83%
Qwen3.7 Max	83%
Qwen 3.6 Flash	81%
Claude Opus 4.7 (Reasoning)	80%
Claude Opus 4.8 (Reasoning)	80%
Qwen 3.5 397B A17B	80%
Grok 4.20 (Reasoning)	80%
Hermes 3 405B	80%
Gemini 3.1 Pro (Preview)	80%
GPT-5.1	80%
Claude Opus 4.8 (Reasoning, Low)	80%

	Score	Cost	Time
GPT-5.4 Mini (Reasoning, Low)	83%	$0.016	18.6s
GPT-5.4 Mini	83%	$0.016	18.8s
GPT-5.4 Mini (Reasoning)	84%	$0.024	29.9s
Qwen 3.6 Flash	81%	$0.0087	35.9s
ByteDance Seed 1.6 Flash	76%	$0.0014	29.6s
Qwen 3.6 35B	78%	$0.0064	51.6s
Mistral Small 4	76%	$0.0015	19.7s
Qwen 3.5 9B	79%	$0.0008	56.9s
GPT-5.4 Nano	77%	$0.0050	17.8s
Qwen3 235B A22B Instruct 2507	78%	$0.0010	56.7s
Grok 4.20	80%	$0.0097	52.7s
DeepSeek V4 Flash (Reasoning)	76%	$0.0007	36.4s
Z.AI GLM 5	79%	$0.0078	45.3s
Gemini 3.1 Flash Lite (Preview)	76%	$0.0029	8.8s
Hermes 3 405B	80%	$0.0030	1.6m
DeepSeek V4 Flash	76%	$0.0007	28.1s
GPT-4o Mini (temp=1)	77%	$0.0012	28.4s
Gemini 3.1 Flash Lite (Reasoning)	75%	$0.0027	8.7s
o4 Mini	77%	$0.015	23.0s
Z.AI GLM 5 Turbo	77%	$0.0074	34.2s

	Score	Consistency	Stability
GPT-5.5 (Reasoning, Low)	90%	95%	87%
GPT-5.5 (Reasoning)	89%	96%	86%
GPT-5.4 (Reasoning)	89%	95%	85%
GPT-5.4 (Reasoning, Low)	86%	98%	85%
GPT-5.5	87%	98%	84%
GPT-5.4	84%	96%	81%
Qwen3.6 Max Preview	83%	97%	80%
GPT-5.4 Mini (Reasoning, Low)	83%	97%	80%
GPT-5.4 Mini	83%	96%	80%
Qwen3.7 Max	83%	94%	77%
GPT-5.4 Mini (Reasoning)	84%	92%	76%
Claude Opus 4.7 (Reasoning)	80%	95%	76%
Grok 4.20	80%	96%	76%
Claude Opus 4.8 (Reasoning)	80%	95%	76%
Qwen 3.5 9B	79%	96%	75%
DeepSeek V4 Flash	76%	98%	75%
DeepSeek V4 Pro	76%	98%	75%
GPT-5.4 Nano	77%	96%	75%
Claude Opus 4.7	79%	95%	75%
Qwen3 235B A22B Instruct 2507	78%	95%	75%

	Score	Cost	Speed	Stability
GPT-5.4 Mini	83%	$0.016	18.8s	80%
GPT-5.4 Mini (Reasoning, Low)	83%	$0.016	18.6s	80%
GPT-5.4 (Reasoning, Low)	86%	$0.056	1.4m	85%
GPT-5.4 Mini (Reasoning)	84%	$0.024	29.9s	76%
GPT-5.4 (Reasoning)	89%	$0.079	2.2m	85%
GPT-5.4	84%	$0.048	1.4m	81%
Qwen 3.6 Flash	81%	$0.0087	35.9s	74%
Grok 4.20	80%	$0.0097	52.7s	76%
Qwen 3.5 9B	79%	$0.0008	56.9s	75%
GPT-5.4 Nano	77%	$0.0050	17.8s	75%
DeepSeek V4 Flash	76%	$0.0007	28.1s	75%
Qwen3 235B A22B Instruct 2507	78%	$0.0010	56.7s	75%
GPT-5.5 (Reasoning, Low)	90%	$0.148	1.9m	87%
Qwen 3.6 35B	78%	$0.0064	51.6s	73%
Hermes 3 405B	80%	$0.0030	1.6m	73%
Gemini 3.1 Flash Lite (Preview)	76%	$0.0029	8.8s	72%
Z.AI GLM 5	79%	$0.0078	45.3s	72%
o4 Mini	77%	$0.015	23.0s	74%
GPT-5 Mini	77%	$0.010	41.6s	73%
DeepSeek-V2 Chat	76%	$0.0023	46.2s	73%

Rank	Model	Avg. Cost	Avg. Time	Stability	# 1	# 2	# 3	# 4	# 5	Total
13	GPT-5.5 (Reasoning, Low)	$0.148	1.9m	87%	93	92	92	88	87	90%
5	GPT-5.4 (Reasoning)	$0.079	2.2m	85%	93	89	89	89	86	89%
26	GPT-5.5 (Reasoning)	$0.152	1.9m	86%	91	91	90	88	85	89%
57	GPT-5.5	$0.168	1.8m	84%	89	88	86	86	86	87%
3	GPT-5.4 (Reasoning, Low)	$0.056	1.4m	85%	88	87	86	85	85	86%
4	GPT-5.4 Mini (Reasoning)	$0.024	29.9s	76%	91	87	83	81	80	84%
6	GPT-5.4	$0.048	1.4m	81%	87	86	84	82	81	84%
1	GPT-5.4 Mini	$0.016	18.8s	80%	86	85	83	83	80	83%
62	Qwen3.6 Max Preview	$0.060	4.4m	80%	85	84	83	82	81	83%
2	GPT-5.4 Mini (Reasoning, Low)	$0.016	18.6s	80%	85	84	83	82	80	83%
35	Qwen3.7 Max	$0.056	1.9m	77%	88	84	82	81	79	83%
7	Qwen 3.6 Flash	$0.0087	35.9s	74%	86	84	80	79	74	81%
42	Claude Opus 4.7 (Reasoning)	$0.080	36.0s	76%	83	83	80	80	76	80%
36	Claude Opus 4.8 (Reasoning)	$0.070	44.6s	76%	84	82	80	79	77	80%
53	Qwen 3.5 397B A17B	$0.031	2.7m	74%	85	83	81	80	73	80%
27	Grok 4.20 (Reasoning)	$0.017	1.4m	72%	86	83	79	78	74	80%
15	Hermes 3 405B	$0.0030	1.6m	73%	85	85	80	78	73	80%
84	Gemini 3.1 Pro (Preview)	$0.079	1.6m	73%	87	81	78	78	76	80%
47	GPT-5.1	$0.057	1.4m	74%	83	83	79	78	76	80%
61	Claude Opus 4.8 (Reasoning, Low)	$0.071	44.7s	72%	87	83	79	76	75	80%
8	Grok 4.20	$0.0097	52.7s	76%	83	81	80	78	76	80%
30	Claude Sonnet 4.6	$0.034	47.3s	73%	84	80	77	77	77	79%
40	Claude Opus 4.7	$0.062	28.5s	75%	84	79	79	77	76	79%
9	Qwen 3.5 9B	$0.0008	56.9s	75%	81	81	79	79	75	79%
28	GPT-4.1	$0.018	45.5s	71%	86	77	77	77	76	79%
101	GPT-5	$0.068	2.6m	73%	83	81	78	76	76	79%
17	Z.AI GLM 5	$0.0078	45.3s	72%	83	82	78	78	72	79%
14	Qwen 3.6 35B	$0.0064	51.6s	73%	82	81	81	80	69	78%
12	Qwen3 235B A22B Instruct 2507	$0.0010	56.7s	75%	82	79	78	77	75	78%
32	Qwen 3.5 Plus (2026-04-20)	$0.016	1.6m	74%	81	81	81	78	71	78%
81	Claude Opus 4.6 (Reasoning)	$0.081	1.2m	74%	82	80	78	76	74	78%
33	Mistral Large 2	$0.013	31.1s	70%	83	81	76	75	74	78%
117	MoonshotAI: Kimi K2.6	$0.043	3.8m	71%	84	77	76	76	74	77%
19	GPT-5 Mini	$0.010	41.6s	73%	80	78	76	76	75	77%
41	Z.AI GLM 5 Turbo	$0.0074	34.2s	69%	84	81	76	75	69	77%
51	Claude Sonnet 4.5	$0.032	37.0s	70%	83	81	79	75	68	77%
38	Claude Sonnet 5 (Reasoning)	$0.025	33.1s	72%	81	80	76	76	72	77%
10	GPT-5.4 Nano	$0.0050	17.8s	75%	79	78	78	76	74	77%
37	GPT-4o Mini (temp=1)	$0.0012	28.4s	68%	83	82	76	76	67	77%
43	Z.AI GLM 5.1	$0.013	1.8m	74%	78	78	77	76	74	77%
109	Claude Opus 4.6	$0.075	1.2m	70%	82	80	77	74	70	77%
75	GPT-5.2	$0.058	1.5m	74%	79	78	77	76	74	77%
18	o4 Mini	$0.015	23.0s	74%	79	79	78	75	73	77%
59	Claude Sonnet 4.6 (Reasoning)	$0.047	1.1m	73%	80	78	77	74	74	76%
25	DeepSeek V4 Flash (Reasoning)	$0.0007	36.4s	71%	83	78	77	72	72	76%
67	Grok 4.3 (Reasoning)	$0.017	1.9m	70%	82	81	79	73	67	76%
95	ByteDance Seed 1.6	$0.013	2.6m	68%	84	78	74	73	73	76%
11	DeepSeek V4 Flash	$0.0007	28.1s	75%	78	76	76	76	75	76%
48	Gemini 2.5 Pro	$0.038	38.4s	73%	80	77	76	76	73	76%
21	ByteDance Seed 1.6 Flash	$0.0014	29.6s	72%	80	80	79	72	69	76%
24	Mistral Small 4	$0.0015	19.7s	71%	81	79	78	76	67	76%
20	DeepSeek-V2 Chat	$0.0023	46.2s	73%	79	76	76	75	75	76%
23	DeepSeek V4 Pro	$0.0042	1.2m	75%	77	76	76	76	75	76%
74	Qwen 3.5 35B	$0.0083	34.6s	64%	83	83	72	71	71	76%
73	DeepSeek V4 Pro (Reasoning)	$0.0037	1.8m	68%	82	80	75	73	70	76%
16	Gemini 3.1 Flash Lite (Preview)	$0.0029	8.8s	72%	80	77	76	73	73	76%
104	Qwen 3.6 27B	$0.021	2.2m	67%	81	80	73	73	72	76%
71	ByteDance Seed 2.0 Lite	$0.012	2.2m	71%	80	79	78	73	67	76%
29	Gemini 3.1 Flash Lite (Reasoning)	$0.0027	8.7s	70%	81	77	75	72	71	75%
44	Writer: Palmyra X5	$0.012	23.7s	70%	80	79	76	73	68	75%
49	Xiaomi MIMO v2.5	$0.0069	38.4s	69%	82	75	74	73	73	75%
31	GPT-5.4 Nano (Reasoning)	$0.0059	21.4s	72%	79	75	75	73	73	75%
77	MiniMax M2.5	$0.0033	1.3m	66%	84	75	73	71	71	75%
45	DeepSeek V3 (2025-03-24)	$0.0015	30.6s	69%	81	77	74	72	72	75%
22	Gemini 3.1 Flash Lite	$0.0026	7.9s	72%	77	76	74	74	73	75%
92	Claude Sonnet 5 (Reasoning, Low)	$0.025	32.4s	64%	83	80	73	73	66	75%
76	Claude Sonnet 5	$0.026	34.8s	67%	83	78	76	70	67	75%
52	Gemini 3.5 Flash (Reasoning, Minimal)	$0.017	11.3s	68%	82	77	75	71	69	75%
85	MiniMax M3	$0.0045	2.7m	71%	78	76	74	73	72	75%
110	MoonshotAI: Kimi K2.5	$0.018	2.8m	69%	78	78	75	74	68	75%
39	Mistral Medium 3.1	$0.0048	33.3s	71%	76	76	76	75	68	74%
34	GPT-5.4 Nano (Reasoning, Low)	$0.0056	19.7s	72%	77	75	74	73	72	74%
78	Qwen 3 32B	$0.0015	56.9s	65%	83	78	73	69	68	74%
60	Z.AI GLM 5.2 (Reasoning, High)	$0.012	59.5s	70%	80	76	75	70	70	74%
137	Claude Opus 4	$0.195	1.4m	69%	78	77	73	73	69	74%
54	Grok 4.3	$0.0070	44.9s	70%	78	74	73	73	72	74%
79	o4 Mini High	$0.035	59.1s	71%	78	75	75	72	69	74%
80	Gemini 3.5 Flash (Reasoning)	$0.042	25.0s	70%	77	76	75	73	67	74%
46	Claude Haiku 4.5	$0.011	21.1s	70%	77	75	75	73	68	74%
86	DeepSeek V3.2	$0.0014	1.6m	67%	78	76	71	71	71	73%
56	Qwen 3.5 122B	$0.011	29.7s	70%	78	76	75	69	69	73%
50	Ministral 3 14B	$0.0007	18.0s	68%	79	74	73	73	68	73%
100	Mistral Large 3	$0.0033	32.5s	62%	86	74	71	69	66	73%
98	Xiaomi MIMO v2.5 Pro	$0.0091	58.9s	65%	81	76	71	70	69	73%
99	Hermes 3 70B	$0.0011	1.5m	65%	81	73	72	71	68	73%
65	Mistral NeMO	$0.0005	7.9s	65%	80	79	74	67	65	73%
63	Gemma 3 12B	$0.0003	51.8s	68%	79	73	73	72	68	73%
58	GPT-4o Mini (temp=0)	$0.0011	28.9s	68%	77	76	72	70	69	73%
64	Mistral Small 4 (Reasoning)	$0.0026	37.5s	68%	78	73	72	71	70	73%
90	Qwen 3.5 Flash	$0.0021	46.0s	64%	81	74	71	70	69	73%
55	Gemini 2.5 Flash Lite	$0.0010	11.1s	68%	78	75	73	70	68	73%
66	GPT-4o, Aug. 6th (temp=1)	$0.017	16.5s	68%	77	75	73	69	69	73%
70	Gemini 2.5 Flash	$0.0061	12.2s	66%	79	73	71	70	70	73%
105	Gemini 3 Pro (Preview)	$0.050	47.5s	70%	74	74	72	72	70	72%
97	Cohere Command R+ (Aug. 2024)	$0.021	51.5s	67%	79	73	72	70	68	72%
68	Qwen 3.5 Plus (2026-02-15)	$0.0059	34.2s	67%	77	75	71	70	70	72%
130	Claude Opus 4.5	$0.067	53.7s	61%	83	75	70	68	66	72%
126	ByteDance Seed 2.0 Mini	$0.0048	5.3m	70%	74	74	73	70	68	72%
106	Llama 3.1 Nemotron 70B	$0.0031	32.6s	62%	83	71	70	69	66	72%
69	Ministral 8B	$0.0004	11.7s	66%	78	75	71	70	66	72%
72	Z.AI GLM 4.7 Flash	$0.0016	1.2m	69%	73	73	71	71	70	72%
89	Gemma 3 27B	$0.0005	48.8s	65%	79	71	71	71	67	72%
82	Aion 2.0	$0.0064	1.5m	70%	73	72	72	71	70	72%
96	Qwen 3.5 27B	$0.011	1.0m	67%	76	74	73	68	65	71%
91	GPT-4.1 Mini	$0.0031	19.2s	64%	78	72	71	68	66	71%
115	Z.AI GLM 4.6	$0.0063	1.1m	64%	78	73	70	68	65	71%
94	Qwen 2.5 72B	$0.0010	39.4s	66%	74	73	70	70	64	70%
111	DeepSeek V3 (2024-12-26)	$0.0022	45.8s	63%	77	72	69	68	64	70%
87	Ministral 3 8B	$0.0008	19.6s	66%	74	72	69	69	67	70%
83	Gemini 3 Flash (Preview, Reasoning)	$0.011	25.8s	69%	71	71	70	69	68	70%
113	MiniMax M2.7	$0.0041	1.0m	64%	76	74	72	65	61	70%
112	Gemma 3 4B	$0.0002	16.6s	62%	78	69	67	67	67	70%
107	Gemini 2.5 Flash (Reasoning)	$0.0099	20.4s	65%	75	71	68	68	66	70%
118	DeepSeek V3.1	$0.0022	1.4m	63%	77	73	71	66	61	70%
93	Gemini 3 Flash (Preview)	$0.0065	16.3s	66%	72	72	69	68	66	70%
122	Claude Sonnet 4	$0.030	43.6s	63%	74	74	68	67	64	69%
114	Cydonia 24B V4.1	$0.0014	47.4s	63%	76	71	69	67	64	69%
103	Z.AI GLM 4.5	$0.0061	49.7s	67%	72	69	69	68	68	69%
129	Gemma 4 31B (Reasoning)	$0.0012	2.3m	60%	78	71	67	64	64	69%
102	Z.AI GLM 4.5 Air	$0.0026	49.0s	67%	71	70	69	67	66	69%
88	Ministral 3B	$0.0001	3.5s	66%	71	70	69	67	66	68%
121	Z.AI GLM 4.7	$0.0085	1.1m	63%	74	72	68	65	63	68%
108	Ministral 3 3B	$0.0005	12.1s	64%	73	69	68	65	65	68%
116	Gemma 4 31B	$0.0008	1.2m	65%	70	69	67	67	66	68%
128	GPT-4o, Aug. 6th (temp=0)	$0.017	17.4s	57%	77	71	64	63	63	68%
123	WizardLM 2 8x22b	$0.0021	23.8s	60%	72	70	66	62	60	66%
138	Mistral Small 3.2 24B	$0.012	8.0m	61%	72	68	66	65	60	66%
127	Gemma 4 26B	$0.0008	1.1m	59%	72	69	64	62	62	66%
119	GPT-4.1 Nano	$0.0007	12.3s	62%	70	67	66	65	61	66%
120	Arcee AI: Trinity Mini	$0.0003	10.1s	63%	68	66	65	64	62	65%
132	Gemma 4 26B (Reasoning)	$0.0011	1.3m	59%	72	66	65	61	61	65%
124	Gemini 2.5 Flash Lite (Reasoning)	$0.0030	27.6s	61%	69	67	66	62	60	65%
125	Llama 3.1 70B	$0.0010	37.9s	59%	71	66	65	63	58	65%
135	GPT-5 Nano	$0.0042	1.4m	58%	71	65	63	63	59	64%
136	GPT-OSS 120B	$0.0014	3.9m	60%	66	65	63	63	60	63%
134	Nemotron 3 Super	$0.0000	1.2m	59%	66	66	62	62	57	63%
133	Nemotron 3 Nano	$0.0009	1.1m	60%	63	63	62	61	59	62%
131	Inception Mercury 2	$0.0030	7.5s	59%	65	63	63	61	56	62%
74.40%

Median	Evaluator	Top 3	Flop 3
67.6%	"Not X but Y" pattern overuse	100Llama 3.1 70B 100Gemini 3.1 Pro (Preview) 100Mistral Small 3.2 24B	0Z.AI GLM 5.2 (Reasoning, High) 0Writer: Palmyra X5 0GPT-5 Nano
55.2%	Adverb-first sentence starts	100GPT-5.5 (Reasoning, Low) 100GPT-5.4 Mini (Reasoning, Low) 100GPT-5.4	0Arcee AI: Trinity Mini 0Inception Mercury 2 7Qwen 3.5 27B
97.1%	Adverbs in dialogue tags	100Inception Mercury 2 100Ministral 3 3B 100Qwen 3.5 397B A17B	40Claude Sonnet 4.6 (Reasoning) 49Claude Opus 4.7 52GPT-4.1 Nano
85.5%	AI-ism adverb frequency	100ByteDance Seed 1.6 97o4 Mini High 97GPT-5 Mini	55GPT-4.1 Nano 66GPT-4o Mini (temp=0) 68Claude Opus 4.7
100.0%	AI-ism character names	100Z.AI GLM 4.7 Flash 100Gemma 3 4B 100Gemini 3 Flash (Preview, Reasoning)	92GPT-4.1 96Grok 4.3 96Gemini 2.5 Flash Lite (Reasoning)
100.0%	AI-ism location names	100Claude Opus 4.8 (Reasoning, Low) 100Gemini 2.5 Flash Lite (Reasoning) 100Grok 4.3	—
27.8%	AI-ism word frequency	77Claude Opus 4.7 (Reasoning) 72Claude Opus 4.7 71Claude Sonnet 5 (Reasoning)	0WizardLM 2 8x22b 0GPT-4o Mini (temp=0) 0Gemini 2.5 Flash Lite (Reasoning)
100.0%	Cliché density	100Qwen 3.6 Flash 100Xiaomi MIMO v2.5 100o4 Mini High	73Ministral 3 8B 73Nemotron 3 Nano 73Inception Mercury 2
35.1%	Dialogue tag variety (said vs. fancy)	100Gemini 3.1 Flash Lite (Reasoning) 100Gemini 3.1 Flash Lite (Preview) 100Gemini 3.1 Flash Lite	0Gemini 3 Flash (Preview) 0Z.AI GLM 4.7 0Gemini 3.5 Flash (Reasoning, Minimal)
1.6%	Em-dash & semicolon overuse	100Mistral Small 3.2 24B 100Qwen3.6 Max Preview 99Qwen 3.6 Flash	0Claude Sonnet 5 (Reasoning) 0Claude Sonnet 5 0Nemotron 3 Super
100.0%	Emotion telling (show vs. tell)	100Xiaomi MIMO v2.5 100Gemini 2.5 Pro 100Claude Sonnet 4.6	36Llama 3.1 70B 65GPT-4o Mini (temp=0) 68GPT-4o, Aug. 6th (temp=0)
87.2%	Filter word density	100Claude Opus 4.6 100Ministral 3 8B 100GPT-5.4	0Nemotron 3 Nano 0Gemma 4 26B (Reasoning) 0GPT-OSS 120B
100.0%	Gibberish response detection	100Claude Sonnet 4.5 100Mistral Small 4 100Qwen3.6 Max Preview	80Z.AI GLM 4.6 87Hermes 3 70B 91DeepSeek V3 (2025-03-24)
100.0%	Markdown formatting overuse	100Claude Sonnet 4.6 (Reasoning) 100Inception Mercury 2 100Llama 3.1 Nemotron 70B	40Mistral Large 3 55Ministral 8B 60Ministral 3 8B
100.0%	Missing dialogue indicators (quotation marks)	100Ministral 3 8B 100Claude Sonnet 4.6 100GPT-5.4 Mini (Reasoning, Low)	0Gemini 3.1 Flash Lite (Preview) 0Qwen3.6 Max Preview 0Gemini 3.1 Flash Lite
69.4%	Name drop frequency	99GPT-5 Mini 97Z.AI GLM 4.6 97Nemotron 3 Super	10GPT-5.2 27Qwen3 235B A22B Instruct 2507 27Z.AI GLM 4.5 Air
36.8%	Narrator intent-glossing	97Qwen3.6 Max Preview 96o4 Mini 94GPT-5.5	0Gemini 2.5 Flash Lite (Reasoning) 0Claude Sonnet 4 0Nemotron 3 Super
100.0%	Overuse of "that" (subordinate clause padding)	100Ministral 3B 100Gemini 2.5 Flash 100Ministral 3 8B	40Mistral Small 3.2 24B 61Llama 3.1 70B 66GPT-4o, Aug. 6th (temp=0)
98.2%	Paragraph length variance	100GPT-5.4 Nano 100Z.AI GLM 5 100GPT-5.4 (Reasoning, Low)	11Mistral Small 3.2 24B 35WizardLM 2 8x22b 40Llama 3.1 Nemotron 70B
97.3%	Passive voice overuse	100DeepSeek V3 (2025-03-24) 100Qwen3 235B A22B Instruct 2507 100Claude Opus 4	83ByteDance Seed 2.0 Mini 83Llama 3.1 70B 85Claude Sonnet 5
96.9%	Past progressive (was/were + -ing) overuse	100Gemini 3.5 Flash (Reasoning, Minimal) 100Ministral 3 3B 100GPT-5	14Z.AI GLM 4.7 37Gemma 4 26B 38Gemma 4 26B (Reasoning)
96.7%	Pronoun-first sentence starts	100Gemma 4 31B 100GPT-5.5 (Reasoning) 100Claude Opus 4.5	30Mistral Small 3.2 24B 34Mistral NeMO 64Z.AI GLM 4.7 Flash
95.3%	Purple prose (modifier overload)	100Nemotron 3 Super 100Mistral Small 3.2 24B 100GPT-OSS 120B	76Gemini 3.5 Flash (Reasoning) 79Gemini 3.1 Pro (Preview) 80Gemini 2.5 Flash (Reasoning)
100.0%	Repeated phrase echo	100GPT-5.4 Mini (Reasoning) 100Nemotron 3 Super 100Mistral Medium 3.1	—
100.0%	Sentence length variance	100GPT-5 Mini 100Nemotron 3 Super 100Gemini 2.5 Flash (Reasoning)	57Mistral Small 3.2 24B 74WizardLM 2 8x22b 82Llama 3.1 70B
55.0%	Sentence opener variety	86Claude Sonnet 5 (Reasoning, Low) 85GPT-4o Mini (temp=1) 84Claude Sonnet 5 (Reasoning)	33GPT-5 Nano 34Mistral Small 3.2 24B 37Qwen 3.5 9B
43.5%	Subject-first sentence starts	90GPT-5.5 (Reasoning, Low) 89GPT-5.4 (Reasoning, Low) 88GPT-5.4 (Reasoning)	2Arcee AI: Trinity Mini 3GPT-5 Nano 3Ministral 3 3B
42.6%	Subordinate conjunction sentence starts	100Gemini 3.1 Flash Lite 100Gemini 3.1 Flash Lite (Preview) 92Qwen 3.5 397B A17B	0Inception Mercury 2 0Qwen 2.5 72B 0MiniMax M3
31.3%	Technical jargon density	93GPT-5.5 (Reasoning) 90GPT-5.4 (Reasoning) 89GPT-5.5 (Reasoning, Low)	0ByteDance Seed 2.0 Lite 0Nemotron 3 Super 0Claude Sonnet 5 (Reasoning, Low)
43.3%	Useless dialogue additions	100Gemini 3.1 Flash Lite (Reasoning) 100GPT-5.4 (Reasoning) 100Gemini 3.1 Flash Lite	0Gemini 2.5 Flash Lite 0DeepSeek V3.1 0GPT-4.1 Nano

Bad Writing Habits

Fantasy: entering an ancient ruin

Performance Score Distribution (Top 20)

Price-Performance Score Distribution (Top 20)

Most Stable Models (Top 20)

Top Overall Models (Top 20)