Location rename: market square, outer ring, bridge, northern mines

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Text Editing

Performance Score Distribution (Top 20)

Click a model name to view its detail page.

	Score
Qwen3.7 Max	100%
Claude Opus 4.6 (Reasoning)	100%
Qwen3.6 Max Preview	100%
Gemini 3.1 Pro (Preview)	100%
Z.AI GLM 5.1	100%
Z.AI GLM 5 Turbo	100%
Gemini 3.5 Flash (Reasoning)	100%
Claude Sonnet 4.6 (Reasoning)	100%
Grok 4.3 (Reasoning)	100%
GPT-5.4 (Reasoning)	100%
GPT-5 Mini	100%
GPT-5.5 (Reasoning, Low)	100%
GPT-5.1	100%
Claude Opus 4.6	100%
MoonshotAI: Kimi K2.6	100%
GPT-5	100%
Qwen 3.5 397B A17B	100%
Gemma 4 31B (Reasoning)	100%
Qwen 3.5 122B	100%
Qwen 3.5 Plus (2026-04-20)	100%

	Score	Cost	Time
Gemini 2.5 Flash Lite	100%	$0.0003	1.5s
GPT-4.1 Nano	98%	$0.0003	3.5s
Mistral Small 4	99%	$0.0004	2.9s
Stealth: Healer Alpha	100%	$0.0000	6.9s
Mistral Small 3.2 24B	100%	$0.0002	4.6s
Inception Mercury	100%	$0.0004	4.0s
Grok 4 Fast	100%	$0.0005	3.8s
Gemini 3.1 Flash Lite (Preview)	100%	$0.0009	1.7s
Gemini 3.1 Flash Lite	100%	$0.0009	10.3s
Gemini 3.1 Flash Lite (Reasoning)	100%	$0.0009	1.8s
ByteDance Seed 1.6 Flash	99%	$0.0003	5.3s
DeepSeek V4 Flash	100%	$0.0002	8.1s
Inception Mercury 2	100%	$0.0010	1.4s
Grok 4.1 Fast	100%	$0.0006	5.2s
Llama 3.1 70B	100%	$0.0005	17.8s
Gemma 3 12B	100%	$0.0001	8.6s
Gemini 2.5 Flash	100%	$0.0014	2.1s
Gemini 2.5 Flash Lite (Reasoning)	100%	$0.0009	7.3s
Gemma 4 26B	100%	$0.0002	21.8s
GPT-4.1 Mini	100%	$0.0010	5.8s

	Score	Cost	Speed	Stability
Gemini 2.5 Flash Lite	100%	$0.0003	1.5s	100%
Inception Mercury 2	100%	$0.0010	1.4s	100%
Gemini 3.1 Flash Lite (Preview)	100%	$0.0009	1.7s	100%
Gemini 3.1 Flash Lite (Reasoning)	100%	$0.0009	1.8s	100%
Gemini 2.5 Flash	100%	$0.0014	2.1s	100%
Grok 4 Fast	100%	$0.0005	3.8s	100%
Inception Mercury	100%	$0.0004	4.0s	100%
Mistral Small 3.2 24B	100%	$0.0002	4.6s	100%
Grok 4.1 Fast	100%	$0.0006	5.2s	100%
Stealth: Healer Alpha	100%	$0.0000	6.9s	100%
GPT-5.4 Mini	100%	$0.0027	2.2s	100%
Grok 4.3	100%	$0.0019	4.2s	100%
Grok 4.20	100%	$0.0020	4.1s	100%
GPT-4.1 Mini	100%	$0.0010	5.8s	100%
Mistral Small 4	99%	$0.0004	2.9s	96%
DeepSeek V4 Flash	100%	$0.0002	8.1s	100%
Gemma 3 12B	100%	$0.0001	8.6s	100%
Gemini 2.5 Flash Lite (Reasoning)	100%	$0.0009	7.3s	100%
Claude Haiku 4.5	100%	$0.0034	2.8s	100%
ByteDance Seed 1.6 Flash	99%	$0.0003	5.3s	97%

Median	Evaluator	Top 3	Flop 3
100.0%	Location replacement accuracy	100Qwen 3.6 35B 100GPT-5.1 100DeepSeek V3 (2024-12-26)	0Claude 3 Haiku 25LFM2 24B 30Ministral 8B
100.0%	No remaining old location names	100Z.AI GLM 4.5 100DeepSeek V3 (2024-12-26) 100GPT-4o, May 13th (temp=1)	0Claude 3 Haiku 0LFM2 24B 86Rocinante 12B
100.0%	Non-location text preserved	100DeepSeek-V2 Chat 100Stealth: Hunter Alpha 100Mistral Small Creative	0Claude 3 Haiku 14LFM2 24B 45Rocinante 12B

Text Replacement

Location rename: market square, outer ring, bridge, northern mines

Performance Score Distribution (Top 20)

Price-Performance Score Distribution (Top 20)

Most Stable Models (Top 20)

Top Overall Models (Top 20)

Rank	Model	Avg. Cost	Avg. Time	Stability	# 1	# 2	# 3	# 4	# 5	# 6	# 7	Total
141	Qwen3.7 Max	$0.027	44.2s	100%	100	100	100	100	100	100	100	100%
107	Claude Opus 4.6 (Reasoning)	$0.019	6.7s	100%	100	100	100	100	100	100	100	100%
147	Qwen3.6 Max Preview	$0.022	1.1m	100%	100	100	100	100	100	100	100	100%
134	Gemini 3.1 Pro (Preview)	$0.027	25.2s	100%	100	100	100	100	100	100	100	100%
133	Z.AI GLM 5.1	$0.0081	59.5s	100%	100	100	100	100	100	100	100	100%
80	Z.AI GLM 5 Turbo	$0.0065	15.2s	100%	100	100	100	100	100	100	100	100%
108	Gemini 3.5 Flash (Reasoning)	$0.019	7.9s	100%	100	100	100	100	100	100	100	100%
89	Claude Sonnet 4.6 (Reasoning)	$0.013	6.4s	100%	100	100	100	100	100	100	100	100%
91	Grok 4.3 (Reasoning)	$0.0056	21.4s	100%	100	100	100	100	100	100	100	100%
85	GPT-5.4 (Reasoning)	$0.011	8.4s	100%	100	100	100	100	100	100	100	100%
92	GPT-5 Mini	$0.0036	26.3s	100%	100	100	100	100	100	100	100	100%
101	GPT-5.5 (Reasoning, Low)	$0.018	4.6s	100%	100	100	100	100	100	100	100	100%
76	GPT-5.1	$0.0087	9.2s	100%	100	100	100	100	100	100	100	100%
98	Claude Opus 4.6	$0.017	5.6s	100%	100	100	100	100	100	100	100	100%
142	MoonshotAI: Kimi K2.6	$0.011	1.3m	100%	100	100	100	100	100	100	100	100%
130	GPT-5	$0.021	32.5s	100%	100	100	100	100	100	100	100	100%
145	Qwen 3.5 397B A17B	$0.0083	1.4m	100%	100	100	100	100	100	100	100	100%
120	Gemma 4 31B (Reasoning)	$0.0006	53.5s	100%	100	100	100	100	100	100	100	100%
103	Qwen 3.5 122B	$0.0097	20.4s	100%	100	100	100	100	100	100	100	100%
137	Qwen 3.5 Plus (2026-04-20)	$0.0090	1.0m	100%	100	100	100	100	100	100	100	100%
138	Gemma 4 26B (Reasoning)	$0.0006	1.3m	100%	100	100	100	100	100	100	100	100%
86	Grok 4.20 (Beta, Reasoning)	$0.012	6.5s	100%	100	100	100	100	100	100	100	100%
67	GPT-5.4 (Reasoning, Low)	$0.0092	5.5s	100%	100	100	100	100	100	100	100	100%
79	Grok 4.20 (Reasoning)	$0.0054	16.9s	100%	100	100	100	100	100	100	100	100%
96	Z.AI GLM 5	$0.0052	26.5s	100%	100	100	100	100	100	100	100	100%
136	MoonshotAI: Kimi K2.5	$0.0051	1.1m	100%	100	100	100	100	100	100	100	100%
123	Qwen 3.5 27B	$0.0095	41.6s	100%	100	100	100	100	100	100	100	100%
104	ByteDance Seed 1.6	$0.0030	32.7s	100%	100	100	100	100	100	100	100	100%
69	Qwen 3.6 Flash	$0.0048	13.7s	100%	100	100	100	100	100	100	100	100%
24	GPT-5.4 Mini (Reasoning)	$0.0034	3.7s	100%	100	100	100	100	100	100	100	100%
63	Gemini 3 Flash (Preview, Reasoning)	$0.0063	10.0s	100%	100	100	100	100	100	100	100	100%
106	o4 Mini High	$0.012	19.3s	100%	100	100	100	100	100	100	100	100%
59	GPT-5.2	$0.0080	5.4s	100%	100	100	100	100	100	100	100	100%
110	DeepSeek V4 Pro (Reasoning)	$0.0033	39.4s	100%	100	100	100	100	100	100	100	100%
114	Claude Opus 4.7	$0.024	4.3s	100%	100	100	100	100	100	100	100	100%
124	Qwen 3.6 27B	$0.010	42.0s	100%	100	100	100	100	100	100	100	100%
97	Claude Opus 4.5	$0.017	5.4s	100%	100	100	100	100	100	100	100	100%
9	Grok 4.1 Fast	$0.0006	5.2s	100%	100	100	100	100	100	100	100	100%
105	Aion 2.0	$0.0029	33.2s	100%	100	100	100	100	100	100	100	100%
78	Z.AI GLM 4.6	$0.0029	21.3s	100%	100	100	100	100	100	100	100	100%
102	GPT-5.5	$0.018	4.6s	100%	100	100	100	100	100	100	100	100%
77	Qwen 3.6 35B	$0.0036	19.2s	100%	100	100	100	100	100	100	100	100%
121	DeepSeek V4 Flash (Reasoning)	$0.0003	54.7s	100%	100	100	100	100	100	100	100	100%
132	Gemini 3 Pro (Preview)	$0.029	20.5s	100%	100	100	100	100	100	100	100	100%
75	Claude Sonnet 4	$0.010	5.7s	100%	100	100	100	100	100	100	100	100%
128	MiniMax M2.5	$0.0012	1.0m	100%	100	100	100	100	100	100	100	100%
30	GPT-4.1	$0.0052	4.1s	100%	100	100	100	100	100	100	100	100%
127	Gemini 2.5 Pro	$0.024	17.6s	100%	100	100	100	100	100	100	100	100%
93	o4 Mini	$0.0099	15.0s	100%	100	100	100	100	100	100	100	100%
122	Grok 4	$0.020	22.4s	100%	100	100	100	100	100	100	100	100%
73	Claude Sonnet 4.5	$0.010	4.6s	100%	100	100	100	100	100	100	100	100%
116	Qwen 3.5 35B	$0.0098	33.7s	100%	100	100	100	100	100	100	100	100%
144	Claude Opus 4	$0.050	7.9s	100%	100	100	100	100	100	100	100	100%
43	Xiaomi MIMO v2.5 Pro	$0.0025	10.9s	100%	100	100	100	100	100	100	100	100%
47	Stealth: Hunter Alpha	$0.0000	16.3s	100%	100	100	100	100	100	100	100	100%
84	Gemma 4 31B	$0.0003	27.7s	100%	100	100	100	100	100	100	100	100%
38	Gemini 2.5 Flash (Reasoning)	$0.0044	6.7s	100%	100	100	100	100	100	100	100	100%
88	GPT-OSS 120B	$0.0005	28.9s	100%	100	100	100	100	100	100	100	100%
4	Gemini 3.1 Flash Lite (Reasoning)	$0.0009	1.8s	100%	100	100	100	100	100	100	100	100%
117	Qwen 3.5 Flash	$0.0019	48.2s	100%	100	100	100	100	100	100	100	100%
65	Z.AI GLM 4.5	$0.0023	18.1s	100%	100	100	100	100	100	100	100	100%
6	Grok 4 Fast	$0.0005	3.8s	100%	100	100	100	100	100	100	100	100%
143	Qwen 3.5 9B	$0.0011	1.6m	100%	100	100	100	100	100	100	100	100%
22	Qwen 3.5 Plus (2026-02-15)	$0.0015	6.8s	100%	100	100	100	100	100	100	100	100%
10	Stealth: Healer Alpha	$0.0000	6.9s	100%	100	100	100	100	100	100	100	100%
3	Gemini 3.1 Flash Lite (Preview)	$0.0009	1.7s	100%	100	100	100	100	100	100	100	100%
66	Gemma 4 26B	$0.0002	21.8s	100%	100	100	100	100	100	100	100	100%
27	Gemini 3.1 Flash Lite	$0.0009	10.3s	100%	100	100	100	100	100	100	100	100%
23	GPT-5.4 Mini (Reasoning, Low)	$0.0030	4.2s	100%	100	100	100	100	100	100	100	100%
18	Gemini 2.5 Flash Lite (Reasoning)	$0.0009	7.3s	100%	100	100	100	100	100	100	100	100%
21	Mistral Large 3	$0.0010	7.3s	100%	100	100	100	100	100	100	100	100%
68	GPT-4o, May 13th (temp=0)	$0.010	3.1s	100%	100	100	100	100	100	100	100	100%
19	Claude Haiku 4.5	$0.0034	2.8s	100%	100	100	100	100	100	100	100	100%
34	Xiaomi MIMO v2.5	$0.0024	9.9s	100%	100	100	100	100	100	100	100	100%
99	ByteDance Seed 2.0 Lite	$0.0029	31.4s	100%	100	100	100	100	100	100	100	100%
87	Nemotron 3 Super	$0.0000	29.8s	100%	100	100	100	100	100	100	100	100%
64	GPT-5.4	$0.0090	5.6s	100%	100	100	100	100	100	100	100	100%
111	Claude 3.5 Sonnet	$0.020	9.6s	100%	100	100	100	100	100	100	100	100%
2	Inception Mercury 2	$0.0010	1.4s	100%	100	100	100	100	100	100	100	100%
44	DeepSeek V3 (2024-12-26)	$0.0007	14.5s	100%	100	100	100	100	100	100	100	100%
74	Claude 3.7 Sonnet	$0.010	5.7s	100%	100	100	100	100	100	100	100	100%
14	GPT-4.1 Mini	$0.0010	5.8s	100%	100	100	100	100	100	100	100	100%
56	Z.AI GLM 4.5 Air	$0.0009	17.2s	100%	100	100	100	100	100	100	100	100%
51	DeepSeek V4 Pro	$0.0010	16.0s	100%	100	100	100	100	100	100	100	100%
37	GPT-4o, Aug. 6th (temp=1)	$0.0065	2.8s	100%	100	100	100	100	100	100	100	100%
95	GPT-5 Nano	$0.0013	31.2s	100%	100	100	100	100	100	100	100	100%
42	GPT-4o, Aug. 6th (temp=0)	$0.0065	3.3s	100%	100	100	100	100	100	100	100	100%
11	GPT-5.4 Mini	$0.0027	2.2s	100%	100	100	100	100	100	100	100	100%
39	Mistral Large 2	$0.0042	7.2s	100%	100	100	100	100	100	100	100	100%
125	DeepSeek V3.2	$0.0004	1.0m	100%	100	100	100	100	100	100	100	100%
16	DeepSeek V4 Flash	$0.0002	8.1s	100%	100	100	100	100	100	100	100	100%
13	Grok 4.20	$0.0020	4.1s	100%	100	100	100	100	100	100	100	100%
1	Gemini 2.5 Flash Lite	$0.0003	1.5s	100%	100	100	100	100	100	100	100	100%
5	Gemini 2.5 Flash	$0.0014	2.1s	100%	100	100	100	100	100	100	100	100%
40	Mistral Large	$0.0042	7.2s	100%	100	100	100	100	100	100	100	100%
57	Qwen3 235B A22B Instruct 2507	$0.0003	18.4s	100%	100	100	100	100	100	100	100	100%
41	Writer: Palmyra X5	$0.0034	8.9s	100%	100	100	100	100	100	100	100	100%
7	Inception Mercury	$0.0004	4.0s	100%	100	100	100	100	100	100	100	100%
12	Grok 4.3	$0.0019	4.2s	100%	100	100	100	100	100	100	100	100%
8	Mistral Small 3.2 24B	$0.0002	4.6s	100%	100	100	100	100	100	100	100	100%
17	Gemma 3 12B	$0.0001	8.6s	100%	100	100	100	100	100	100	100	100%
55	Llama 3.1 70B	$0.0005	17.8s	100%	100	100	100	100	100	100	100	100%
48	Llama 3.1 Nemotron 70B	$0.0014	13.9s	100%	100	100	100	100	100	100	100	100%
100	WizardLM 2 8x22b	$0.0007	36.2s	100%	100	100	100	100	100	100	100	100%
115	GPT-5.5 (Reasoning)	$0.021	7.5s	97%	100	100	100	100	100	100	96	99%
58	MiniMax M2.7	$0.0014	14.0s	97%	100	100	100	100	100	100	96	99%
146	Z.AI GLM 4.7	$0.0050	1.5m	97%	100	100	100	100	100	100	96	99%
45	Gemini 3.5 Flash (Reasoning, Minimal)	$0.0054	2.6s	97%	100	100	100	100	100	100	96	99%
54	DeepSeek-V2 Chat	$0.0007	14.2s	97%	100	100	100	100	100	100	96	99%
25	Grok 4.20 (Beta)	$0.0032	1.7s	97%	100	100	100	100	100	100	96	99%
109	DeepSeek V3.1	$0.0006	39.4s	97%	100	100	100	100	100	100	96	99%
20	ByteDance Seed 1.6 Flash	$0.0003	5.3s	97%	100	100	100	100	100	100	96	99%
81	GPT-4o, May 13th (temp=1)	$0.010	3.0s	95%	100	100	100	100	100	100	93	99%
53	Llama 3.1 8B	$0.0000	11.9s	94%	100	100	100	100	100	100	92	99%
82	Claude Sonnet 4.6	$0.010	4.5s	96%	100	100	100	100	100	96	96	99%
112	Z.AI GLM 4.7 Flash	$0.0009	40.7s	96%	100	100	100	100	100	96	96	99%
83	Hermes 3 405B	$0.0011	21.4s	96%	100	100	100	100	100	96	96	99%
29	Mistral Small 4 (Reasoning)	$0.0008	7.0s	96%	100	100	100	100	100	96	96	99%
72	Qwen 3 32B	$0.0005	17.2s	96%	100	100	100	100	100	96	96	99%
118	Nemotron 3 Nano	$0.0008	45.9s	96%	100	100	100	100	100	96	96	99%
15	Mistral Small 4	$0.0004	2.9s	96%	100	100	100	100	100	96	96	99%
119	Claude Opus 4.7 (Reasoning)	$0.024	4.5s	96%	100	100	100	100	96	96	96	98%
131	ByteDance Seed 2.0 Mini	$0.0011	1.1m	96%	100	100	100	100	96	96	96	98%
52	Gemma 3 27B	$0.0002	12.0s	96%	100	100	100	100	96	96	96	98%
90	Arcee AI: Trinity Large (Preview)	$0.0000	25.2s	96%	100	100	100	100	96	96	96	98%
70	Hermes 3 70B	$0.0003	16.5s	96%	100	100	100	100	96	96	96	98%
32	GPT-5.4 Nano (Reasoning, Low)	$0.0007	3.1s	92%	100	100	100	96	96	96	96	98%
71	Cydonia 24B V4.1	$0.0004	12.1s	92%	100	100	100	96	96	96	96	98%
33	GPT-4.1 Nano	$0.0003	3.5s	91%	100	100	100	100	100	96	88	98%
49	Gemini 3 Flash (Preview)	$0.0018	3.3s	92%	100	100	96	96	96	96	96	97%
62	GPT-4o Mini (temp=1)	$0.0004	9.9s	92%	100	100	96	96	96	96	96	97%
60	Qwen 2.5 72B	$0.0003	9.9s	92%	100	100	96	96	96	96	96	97%
35	GPT-5.4 Nano	$0.0007	3.1s	92%	100	100	96	96	96	96	96	97%
31	GPT-5.4 Nano (Reasoning)	$0.0007	2.7s	93%	100	96	96	96	96	96	96	96%
50	GPT-4o Mini (temp=0)	$0.0004	8.4s	96%	96	96	96	96	96	96	96	96%
36	Mistral Medium 3.1	$0.0012	4.6s	96%	96	96	96	96	96	96	96	96%
26	Mistral Small Creative	$0.0002	2.9s	96%	96	96	96	96	96	96	96	96%
28	Ministral 3 14B	$0.0002	4.4s	96%	96	96	96	96	96	96	96	96%
46	Gemma 3 4B	$0.0001	8.4s	96%	96	96	96	96	96	96	96	96%
94	Skyfall 36B V2	$0.0007	9.1s	82%	96	96	96	96	96	96	76	93%
61	Arcee AI: Trinity Mini	$0.0002	5.0s	91%	93	93	93	93	91	91	91	92%
113	DeepSeek V3 (2025-03-24)	$0.0006	12.2s	71%	100	100	100	100	95	93	58	92%
129	Ministral 3B	$0.0000	1.8s	54%	96	76	71	70	67	61	61	72%
126	Ministral 3 3B	$0.0001	2.1s	66%	78	71	71	71	67	67	67	71%
140	Mistral NeMO	$0.0002	3.0s	47%	89	67	63	60	60	56	54	64%
149	Cohere Command R+ (Aug. 2024)	$0.0067	38.4s	38%	92	79	71	58	49	47	42	63%
135	Ministral 3 8B	$0.0002	3.0s	60%	65	63	63	63	63	63	58	62%
139	Ministral 8B	$0.0001	3.6s	54%	65	65	63	63	63	56	45	60%
148	Rocinante 12B	$0.0003	8.5s	18%	96	83	74	45	42	42	2	55%
150	LFM2 24B	$0.0001	13.5s	13%	13	13	13	13	13	13	13	13%
151	Claude 3 Haiku	$0.0009	4.5s	0%	0	0	0	0	0	0	0	0%
96.48%