Fantasy: entering an ancient ruin

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Creative Writing Hallucination

Performance Score Distribution (Top 20)

Click a model name to view its detail page.

	Score
GPT-5.4 (Reasoning)	92%
GPT-5.5	90%
GPT-5.4	89%
GPT-5.4 (Reasoning, Low)	89%
Claude Opus 4.7 (Reasoning)	89%
GPT-5.5 (Reasoning, Low)	88%
Claude Opus 4.7	88%
Z.AI GLM 5.2 (Reasoning, High)	88%
GPT-5.5 (Reasoning)	88%
GPT-5.4 Mini (Reasoning, Low)	87%
GPT-5.4 Mini (Reasoning)	87%
Claude Opus 4.8 (Reasoning)	87%
Claude Opus 4.8 (Reasoning, Low)	87%
Claude Sonnet 4.5	87%
Grok 4.5 (Reasoning, Low)	87%
Claude Sonnet 4.6	87%
Grok 4.5 (Reasoning, High)	87%
Claude Sonnet 4.6 (Reasoning)	87%
Claude Opus 4.6	86%
MiniMax M3	86%

	Score	Cost	Time
GPT-5.4 Mini (Reasoning)	87%	$0.018	22.3s
GPT-5.4 Mini (Reasoning, Low)	87%	$0.014	16.2s
Z.AI GLM 5.2 (Reasoning, High)	88%	$0.013	1.1m
GPT-5.4 Mini	86%	$0.016	17.1s
GPT-5.4	89%	$0.039	1.2m
Z.AI GLM 5 Turbo	85%	$0.0090	25.9s
Aion 3.0 Mini	83%	$0.0076	1.2m
GPT-5.4 (Reasoning, Low)	89%	$0.056	1.3m
Grok 4.5 (Reasoning, Low)	87%	$0.022	53.9s
Qwen 3.6 Flash	83%	$0.010	39.2s
Claude Sonnet 4.6	87%	$0.040	37.2s
Writer: Palmyra X5	84%	$0.013	22.1s
Grok 4.5 (Reasoning, High)	87%	$0.032	1.6m
Claude Sonnet 4.5	87%	$0.045	39.5s
Claude Sonnet 5 (Reasoning)	84%	$0.042	39.2s
Grok 4.20 (Reasoning)	84%	$0.018	1.2m
Qwen 3.6 35B	83%	$0.0087	51.9s
Claude Sonnet 5 (Reasoning, Low)	86%	$0.043	42.0s
Qwen 3.5 35B	81%	$0.043	2.3m
Claude Opus 4.7	88%	$0.092	32.6s

	Score	Cost	Speed	Stability
GPT-5.4 Mini (Reasoning, Low)	87%	$0.014	16.2s	84%
GPT-5.4	89%	$0.039	1.2m	88%
Z.AI GLM 5.2 (Reasoning, High)	88%	$0.013	1.1m	85%
GPT-5.4 Mini (Reasoning)	87%	$0.018	22.3s	81%
GPT-5.4 (Reasoning)	92%	$0.081	2.3m	90%
GPT-5.4 Mini	86%	$0.016	17.1s	81%
Claude Sonnet 4.6	87%	$0.040	37.2s	84%
Grok 4.5 (Reasoning, Low)	87%	$0.022	53.9s	82%
Claude Sonnet 4.5	87%	$0.045	39.5s	83%
Z.AI GLM 5 Turbo	85%	$0.0090	25.9s	80%
GPT-5.4 (Reasoning, Low)	89%	$0.056	1.3m	84%
Claude Sonnet 5 (Reasoning, Low)	86%	$0.043	42.0s	82%
GPT-5.5	90%	$0.114	1.5m	88%
Claude Opus 4.7 (Reasoning)	89%	$0.100	34.4s	82%
GPT-5.1	86%	$0.053	1.8m	85%
Claude Opus 4.7	88%	$0.092	32.6s	82%
Grok 4.5 (Reasoning, High)	87%	$0.032	1.6m	81%
Claude Opus 4.8 (Reasoning, Low)	87%	$0.094	42.8s	83%
Grok 4.20 (Reasoning)	84%	$0.018	1.2m	80%
Claude Sonnet 5 (Reasoning)	84%	$0.042	39.2s	80%

Rank	Model	Avg. Cost	Avg. Time	Stability	# 1	# 2	# 3	# 4	# 5	Total
5	GPT-5.4 (Reasoning)	$0.081	2.3m	90%	94	93	92	92	90	92%
13	GPT-5.5	$0.114	1.5m	88%	91	90	90	89	87	90%
2	GPT-5.4	$0.039	1.2m	88%	91	91	90	88	87	89%
11	GPT-5.4 (Reasoning, Low)	$0.056	1.3m	84%	93	92	91	86	83	89%
14	Claude Opus 4.7 (Reasoning)	$0.100	34.4s	82%	93	93	89	85	84	89%
27	GPT-5.5 (Reasoning, Low)	$0.133	1.6m	87%	89	89	89	87	87	88%
16	Claude Opus 4.7	$0.092	32.6s	82%	95	90	90	84	82	88%
3	Z.AI GLM 5.2 (Reasoning, High)	$0.013	1.1m	85%	90	89	88	87	86	88%
28	GPT-5.5 (Reasoning)	$0.136	1.4m	86%	89	89	88	87	85	88%
1	GPT-5.4 Mini (Reasoning, Low)	$0.014	16.2s	84%	90	89	87	86	85	87%
4	GPT-5.4 Mini (Reasoning)	$0.018	22.3s	81%	92	90	88	84	82	87%
24	Claude Opus 4.8 (Reasoning)	$0.091	41.3s	81%	91	90	86	86	82	87%
18	Claude Opus 4.8 (Reasoning, Low)	$0.094	42.8s	83%	91	87	86	86	85	87%
9	Claude Sonnet 4.5	$0.045	39.5s	83%	91	88	87	87	82	87%
8	Grok 4.5 (Reasoning, Low)	$0.022	53.9s	82%	91	88	86	85	85	87%
7	Claude Sonnet 4.6	$0.040	37.2s	84%	89	89	87	86	84	87%
17	Grok 4.5 (Reasoning, High)	$0.032	1.6m	81%	91	89	86	86	82	87%
66	Claude Sonnet 4.6 (Reasoning)	$0.139	2.3m	82%	91	87	86	85	84	87%
26	Claude Opus 4.6	$0.087	1.1m	82%	89	88	86	85	84	86%
38	MiniMax M3	$0.0075	3.6m	82%	89	89	88	84	81	86%
15	GPT-5.1	$0.053	1.8m	85%	87	87	86	85	85	86%
6	GPT-5.4 Mini	$0.016	17.1s	81%	90	89	87	84	81	86%
12	Claude Sonnet 5 (Reasoning, Low)	$0.043	42.0s	82%	90	86	85	84	84	86%
62	GPT-5	$0.064	3.0m	80%	88	87	84	83	82	85%
10	Z.AI GLM 5 Turbo	$0.0090	25.9s	80%	88	87	86	83	79	85%
46	Claude Opus 4.6 (Reasoning)	$0.103	1.4m	82%	87	85	84	84	83	85%
19	Grok 4.20 (Reasoning)	$0.018	1.2m	80%	88	86	84	84	80	84%
43	Qwen 3.5 397B A17B	$0.0048	3.3m	81%	87	86	86	83	79	84%
20	Claude Sonnet 5 (Reasoning)	$0.042	39.2s	80%	88	87	86	84	78	84%
67	Qwen3.6 Max Preview	$0.053	3.4m	81%	86	86	83	83	83	84%
37	Qwen3 235B A22B Instruct 2507	$0.0017	1.1m	73%	90	90	82	81	77	84%
138	Claude Opus 4	$0.305	1.9m	76%	91	87	85	79	77	84%
29	DeepSeek V4 Pro	$0.012	2.0m	79%	87	85	83	82	82	84%
126	MoonshotAI: Kimi K2.6	$0.066	6.4m	79%	88	84	83	82	82	84%
59	Qwen3.7 Max	$0.079	2.5m	81%	86	86	84	82	81	84%
22	Writer: Palmyra X5	$0.013	22.1s	76%	90	87	83	80	79	84%
25	Qwen 3.6 35B	$0.0087	51.9s	76%	90	84	82	82	79	83%
61	DeepSeek V4 Pro (Reasoning)	$0.015	3.5m	79%	87	86	85	81	78	83%
98	Gemini 3.1 Pro (Preview)	$0.136	2.2m	78%	88	85	84	80	79	83%
32	Aion 3.0	$0.031	55.1s	77%	87	86	82	80	79	83%
50	Claude Opus 4.5	$0.079	48.4s	77%	86	85	85	85	74	83%
35	Qwen 3.6 Flash	$0.010	39.2s	74%	90	88	84	80	72	83%
33	Aion 3.0 Mini	$0.0076	1.2m	76%	89	86	85	78	75	83%
49	Z.AI GLM 5	$0.0095	1.5m	74%	89	87	84	80	73	83%
31	Z.AI GLM 5.1	$0.017	1.3m	78%	86	84	83	81	77	82%
21	Claude Sonnet 5	$0.035	34.2s	81%	84	82	82	82	81	82%
39	Grok 4.3 (Reasoning)	$0.018	1.6m	79%	85	83	81	81	81	82%
41	MiniMax M2.7	$0.0032	59.3s	74%	90	85	81	78	77	82%
23	Grok 4.20	$0.012	42.6s	79%	84	82	81	80	80	81%
65	Qwen 3.5 35B	$0.043	2.3m	78%	84	82	82	81	77	81%
63	Qwen 3.5 Plus (2026-04-20)	$0.020	2.0m	75%	87	84	82	78	75	81%
30	Qwen 3.5 Flash	$0.0033	45.4s	76%	86	81	81	79	78	81%
36	Qwen 3.5 9B	$0.0011	43.6s	75%	85	82	79	79	79	81%
71	ByteDance Seed 2.0 Lite	$0.011	2.0m	74%	87	83	82	77	73	80%
34	Qwen 3.5 122B	$0.016	39.5s	77%	83	83	81	78	77	80%
105	Qwen 3.6 27B	$0.031	2.5m	70%	89	82	78	76	76	80%
57	DeepSeek V4 Flash	$0.0005	24.4s	69%	89	81	77	76	75	80%
47	GPT-4.1	$0.021	40.8s	76%	82	82	82	78	73	79%
51	DeepSeek V3.2	$0.0022	1.1m	74%	85	79	79	79	76	79%
77	Hermes 3 70B	$0.0021	2.0m	72%	85	82	79	78	73	79%
52	Grok 4.3	$0.0098	28.6s	73%	85	83	80	76	72	79%
44	Cydonia 24B V4.1	$0.0024	48.4s	75%	83	80	79	77	76	79%
48	Qwen 3.5 27B	$0.012	1.1m	77%	81	80	79	79	77	79%
42	o4 Mini	$0.019	34.0s	77%	81	80	79	78	76	79%
40	GPT-5.4 Nano (Reasoning)	$0.0060	26.5s	76%	82	79	79	77	76	79%
87	GPT-5.2	$0.057	1.4m	75%	81	80	78	78	74	78%
82	o4 Mini High	$0.028	53.0s	71%	84	81	77	75	74	78%
80	Gemma 4 31B (Reasoning)	$0.0019	2.2m	74%	81	80	79	77	73	78%
60	DeepSeek V3 (2025-03-24)	$0.0017	30.2s	71%	85	77	76	76	75	78%
79	DeepSeek-V2 Chat	$0.0029	42.0s	68%	87	79	79	78	66	78%
73	Mistral Large 2	$0.020	32.2s	71%	86	77	77	75	74	78%
120	ByteDance Seed 2.0 Mini	$0.0046	4.6m	73%	82	80	78	77	73	78%
78	Xiaomi MIMO v2.5 Pro	$0.0092	50.2s	70%	85	79	77	77	70	78%
117	MoonshotAI: Kimi K2.5	$0.021	3.3m	72%	83	78	78	77	72	78%
72	Aion 2.0	$0.0088	1.4m	74%	80	80	77	76	75	78%
74	Gemini 3 Flash (Preview, Reasoning)	$0.012	26.2s	69%	83	82	77	76	69	78%
45	Gemini 3.5 Flash (Reasoning, Minimal)	$0.017	10.7s	76%	79	79	79	76	75	78%
54	DeepSeek V3 (2024-12-26)	$0.0029	35.6s	73%	82	81	80	74	71	78%
81	Gemini 3.5 Flash (Reasoning)	$0.072	36.3s	75%	79	78	77	77	76	77%
58	GPT-5.4 Nano (Reasoning, Low)	$0.0051	19.3s	72%	81	81	77	76	72	77%
104	ByteDance Seed 1.6	$0.011	2.0m	70%	83	79	75	75	73	77%
101	Z.AI GLM 4.7	$0.011	1.3m	67%	84	82	79	77	63	77%
55	Gemma 3 27B	$0.0010	45.9s	74%	80	78	77	76	75	77%
92	Hermes 3 405B	$0.0062	49.2s	67%	87	77	76	73	72	77%
107	WizardLM 2 8x22b	$0.0045	3.3m	73%	80	79	76	75	75	77%
56	Gemini 2.5 Flash	$0.0057	10.6s	72%	82	78	78	74	72	77%
83	GPT-5 Mini	$0.010	1.0m	71%	82	77	76	74	74	77%
53	GPT-5.4 Nano	$0.0050	18.4s	74%	78	78	76	76	75	77%
94	Gemini 2.5 Pro	$0.039	37.7s	70%	82	80	79	76	65	76%
70	DeepSeek V4 Flash (Reasoning)	$0.0010	30.0s	71%	81	78	76	74	72	76%
89	Qwen 3 32B	$0.0020	45.5s	68%	83	83	78	69	68	76%
69	Ministral 3 14B	$0.0013	9.5s	70%	83	79	77	71	70	76%
64	ByteDance Seed 1.6 Flash	$0.0015	30.1s	72%	79	78	75	74	74	76%
68	Mistral Small 4 (Reasoning)	$0.0027	32.6s	72%	80	76	76	75	72	76%
96	Mistral Large 3	$0.0064	51.4s	68%	83	78	75	71	70	75%
90	DeepSeek V3.1	$0.0027	1.6m	72%	79	77	76	74	71	75%
85	Qwen 3.5 Plus (2026-02-15)	$0.0073	37.0s	70%	79	78	75	75	70	75%
75	Gemma 3 12B	$0.0004	43.8s	72%	77	76	76	75	70	75%
76	Mistral Small 4	$0.0019	19.9s	71%	79	77	75	73	70	75%
91	GPT-4o, Aug. 6th (temp=1)	$0.022	18.5s	69%	80	78	77	72	66	75%
84	Qwen 2.5 72B	$0.0017	49.7s	72%	76	76	76	75	69	74%
106	Claude Haiku 4.5	$0.015	23.3s	66%	80	77	72	71	69	74%
114	Z.AI GLM 4.7 Flash	$0.0019	1.2m	66%	82	76	73	69	69	74%
103	MiniMax M2.5	$0.0035	52.2s	68%	79	75	74	72	68	74%
93	Z.AI GLM 4.6	$0.0073	31.3s	70%	77	74	72	72	71	73%
99	Gemma 3 4B	$0.0003	22.4s	67%	79	74	71	71	71	73%
102	GPT-4o, Aug. 6th (temp=0)	$0.020	17.0s	68%	78	75	73	73	67	73%
97	GPT-4o Mini (temp=1)	$0.0016	50.4s	70%	76	74	73	72	71	73%
88	GPT-4.1 Mini	$0.0037	26.1s	70%	76	75	74	70	70	73%
111	Mistral Medium 3.1	$0.0061	44.7s	67%	78	77	72	69	68	73%
127	Gemma 4 26B (Reasoning)	$0.0015	2.5m	67%	78	77	74	70	66	73%
109	Cohere Command R+ (Aug. 2024)	$0.029	41.0s	69%	76	75	74	71	67	73%
121	Claude Sonnet 4	$0.040	40.5s	66%	79	76	71	71	67	73%
95	Gemini 2.5 Flash (Reasoning)	$0.013	22.2s	70%	76	72	72	72	72	73%
86	Mistral NeMO	$0.0010	10.4s	70%	74	73	73	73	68	72%
100	Gemini 2.5 Flash Lite	$0.0012	8.8s	67%	77	75	72	70	67	72%
113	Ministral 3 8B	$0.0010	8.2s	64%	77	77	70	68	68	72%
118	GPT-4o Mini (temp=0)	$0.0015	40.6s	65%	76	75	72	69	63	71%
116	Z.AI GLM 4.5	$0.0065	40.9s	67%	74	73	72	72	64	71%
115	Ministral 3B	$0.0002	3.4s	64%	77	74	72	71	60	71%
108	Gemini 3 Flash (Preview)	$0.0083	19.2s	68%	73	72	70	69	69	71%
110	Gemma 4 26B	$0.0011	44.1s	69%	73	72	72	69	68	71%
124	Z.AI GLM 4.5 Air	$0.0040	1.2m	65%	76	70	69	69	68	70%
132	Gemma 4 31B	$0.0015	2.4m	66%	75	72	70	68	68	70%
112	Xiaomi MIMO v2.5	$0.0053	28.6s	68%	73	73	72	68	66	70%
122	Ministral 8B	$0.0007	13.2s	62%	79	72	68	66	66	70%
119	Gemini 3.1 Flash Lite (Preview)	$0.0039	8.4s	65%	75	69	68	68	67	69%
125	Gemini 2.5 Flash Lite (Reasoning)	$0.0033	29.8s	65%	72	70	69	66	64	68%
131	Llama 3.1 70B	$0.0038	28.0s	61%	75	71	69	67	58	68%
128	GPT-4.1 Nano	$0.0010	15.1s	62%	75	68	68	67	63	68%
130	Nemotron 3 Super	$0.0000	43.3s	63%	73	68	68	67	62	68%
134	GPT-OSS 120B	$0.0014	53.5s	62%	74	72	68	63	62	68%
135	Nemotron 3 Nano	$0.0011	54.1s	62%	74	71	68	63	62	68%
129	Arcee AI: Trinity Mini	$0.0005	8.0s	62%	72	71	68	65	59	67%
137	Inception Mercury 2	$0.0099	13.7s	60%	71	71	66	63	62	67%
123	Gemini 3.1 Flash Lite (Reasoning)	$0.0038	9.2s	65%	68	67	66	66	66	66%
139	GPT-5 Nano	$0.0042	1.3m	62%	69	65	65	64	63	65%
133	Gemini 3.1 Flash Lite	$0.0039	9.4s	63%	67	66	66	64	62	65%
136	Ministral 3 3B	$0.0006	3.5s	61%	69	68	67	62	57	65%
140	Mistral Small 3.2 24B	$0.0083	7.6m	58%	67	66	63	62	56	63%
78.30%

Median	Evaluator	Top 3	Flop 3
80.0%	"Not X but Y" pattern overuse	100Mistral Small 3.2 24B 100Gemini 3.5 Flash (Reasoning) 100Claude Opus 4.7 (Reasoning)	0GPT-5 Nano 0Gemini 2.5 Flash Lite (Reasoning) 0Xiaomi MIMO v2.5
50.5%	Adverb-first sentence starts	100Writer: Palmyra X5 99Claude Sonnet 4.5 97Claude Sonnet 5 (Reasoning, Low)	0Inception Mercury 2 0Gemini 3.1 Flash Lite (Reasoning) 3Qwen 3.5 9B
100.0%	Adverbs in dialogue tags	100Gemini 3.1 Flash Lite (Reasoning) 100Qwen3 235B A22B Instruct 2507 100Ministral 3 14B	47Hermes 3 70B 47GPT-4.1 Nano 58Hermes 3 405B
91.0%	AI-ism adverb frequency	100Qwen 3.5 122B 99Nemotron 3 Super 99Qwen 3.6 Flash	70Mistral Small 3.2 24B 72Gemma 3 4B 73Gemini 2.5 Flash (Reasoning)
100.0%	AI-ism character names	100Cohere Command R+ (Aug. 2024) 100GPT-5.5 (Reasoning, Low) 100GPT-5 Nano	92GPT-5.5 96Claude Sonnet 4.5 96Grok 4.20 (Reasoning)
100.0%	AI-ism location names	100MiniMax M2.5 100Qwen 2.5 72B 100Z.AI GLM 4.5 Air	96Gemma 3 27B
28.7%	AI-ism word frequency	78ByteDance Seed 2.0 Lite 71Claude Opus 4.7 71GPT-5.5	0Qwen 2.5 72B 0GPT-4o, Aug. 6th (temp=1) 0GPT-4.1 Nano
100.0%	Cliché density	100DeepSeek-V2 Chat 100MiniMax M3 100Ministral 3 8B	7Mistral Small 3.2 24B 53GPT-OSS 120B 60Inception Mercury 2
53.6%	Dialogue tag variety (said vs. fancy)	100Claude Sonnet 5 (Reasoning) 100MiniMax M3 100GPT-5.4 (Reasoning)	0Gemini 3.1 Flash Lite (Reasoning) 0Qwen 3.5 Plus (2026-02-15) 0Nemotron 3 Nano
73.9%	Em-dash & semicolon overuse	100Qwen3.6 Max Preview 100Qwen 3.6 35B 100GPT-5.5	0Qwen3 235B A22B Instruct 2507 0GPT-5 Nano 0Gemma 3 4B
100.0%	Emotion telling (show vs. tell)	100DeepSeek V3.1 100Gemma 4 31B (Reasoning) 100Arcee AI: Trinity Mini	65Mistral Small 3.2 24B 65GPT-4o, Aug. 6th (temp=0) 68Llama 3.1 70B
95.5%	Filter word density	100Qwen 3.5 27B 100GPT-5.4 (Reasoning) 100DeepSeek V4 Pro (Reasoning)	14Cohere Command R+ (Aug. 2024) 18Mistral Small 3.2 24B 21Gemini 2.5 Flash Lite (Reasoning)
100.0%	Gibberish response detection	100Claude Opus 4.6 (Reasoning) 100Gemma 3 27B 100Cohere Command R+ (Aug. 2024)	80Llama 3.1 70B 80Hermes 3 70B 80ByteDance Seed 2.0 Lite
100.0%	Markdown formatting overuse	100Hermes 3 405B 100Qwen 3.5 Plus (2026-04-20) 100GPT-5 Mini	69Ministral 3 8B 75Ministral 8B 80Hermes 3 70B
100.0%	Missing dialogue indicators (quotation marks)	100Mistral Small 4 (Reasoning) 100DeepSeek V3.1 100DeepSeek V3.2	40Qwen 3.5 35B 60Qwen 3.6 35B 80Qwen 3.5 27B
67.1%	Name drop frequency	100Gemini 3.1 Flash Lite (Reasoning) 99Gemini 3.1 Flash Lite 98GPT-4.1 Nano	0Qwen 3.5 9B 6Qwen 3.5 122B 26Qwen 3.5 27B
60.1%	Narrator intent-glossing	100Qwen 3.6 Flash 100o4 Mini High 100Qwen 3.5 Plus (2026-04-20)	0Claude Sonnet 4 0Mistral Small 3.2 24B 0GPT-5 Nano
100.0%	Overuse of "that" (subordinate clause padding)	100Gemini 3 Flash (Preview, Reasoning) 100Z.AI GLM 5 100Qwen 3.5 397B A17B	50Mistral Small 3.2 24B 80Llama 3.1 70B 80Mistral NeMO
100.0%	Paragraph length variance	100MiniMax M2.5 100Mistral Large 3 100Grok 4.5 (Reasoning, High)	35Arcee AI: Trinity Mini 63GPT-OSS 120B 65Inception Mercury 2
99.3%	Passive voice overuse	100DeepSeek V3 (2024-12-26) 100Gemini 3.1 Pro (Preview) 100Qwen3 235B A22B Instruct 2507	84ByteDance Seed 2.0 Lite 89Llama 3.1 70B 93ByteDance Seed 2.0 Mini
100.0%	Past progressive (was/were + -ing) overuse	100Z.AI GLM 5 100DeepSeek V4 Flash 100GPT-4o, Aug. 6th (temp=1)	34Ministral 8B 38Ministral 3 3B 44Gemini 3 Flash (Preview)
100.0%	Pronoun-first sentence starts	100Llama 3.1 70B 100Claude Opus 4.7 (Reasoning) 100GPT-4o, Aug. 6th (temp=1)	19Gemini 3.1 Flash Lite 44ByteDance Seed 2.0 Mini 46Z.AI GLM 4.7
96.4%	Purple prose (modifier overload)	100Z.AI GLM 5.1 100DeepSeek V4 Pro 100Z.AI GLM 4.5 Air	80Gemini 3.5 Flash (Reasoning, Minimal) 82Gemini 3.1 Flash Lite 82Cydonia 24B V4.1
100.0%	Repeated phrase echo	100GPT-4.1 Mini 100GPT-4.1 Nano 100MoonshotAI: Kimi K2.6	—
100.0%	Sentence length variance	100ByteDance Seed 1.6 Flash 100Qwen 3.6 27B 100Mistral Large 3	86GPT-4o, Aug. 6th (temp=0) 88Mistral Small 3.2 24B 93GPT-4o, Aug. 6th (temp=1)
59.6%	Sentence opener variety	97Cydonia 24B V4.1 93DeepSeek V3 (2025-03-24) 92Claude Sonnet 5 (Reasoning, Low)	31GPT-5 Nano 37Qwen 3.5 35B 38Mistral Small 3.2 24B
35.3%	Subject-first sentence starts	93Writer: Palmyra X5 92Cydonia 24B V4.1 89Claude Sonnet 5 (Reasoning, Low)	0GPT-OSS 120B 0Qwen 3.5 122B 1Inception Mercury 2
20.0%	Subordinate conjunction sentence starts	93Gemma 3 27B 81Cydonia 24B V4.1 80Cohere Command R+ (Aug. 2024)	0Mistral Small 3.2 24B 0Gemma 4 26B (Reasoning) 0Qwen 3.5 27B
51.7%	Technical jargon density	100Gemini 3.1 Pro (Preview) 99Hermes 3 405B 99Qwen 3.5 35B	0GPT-5 Nano 0Ministral 3B 5ByteDance Seed 2.0 Lite
57.2%	Useless dialogue additions	100GPT-5.5 (Reasoning) 100GPT-5.4 Mini (Reasoning) 100Qwen 3.5 397B A17B	0GPT-OSS 120B 0Gemini 2.5 Pro 0Inception Mercury 2

Bad Writing Habits

Fantasy: entering an ancient ruin

Performance Score Distribution (Top 20)

Price-Performance Score Distribution (Top 20)

Most Stable Models (Top 20)

Top Overall Models (Top 20)