Tense rewriting: past to present

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Text Editing

Performance Score Distribution (Top 20)

Click a model name to view its detail page.

	Score
Claude Sonnet 4.5	100%
Claude Opus 4.6 (Reasoning)	100%
Qwen3 235B A22B Instruct 2507	100%
Claude Sonnet 4	100%
DeepSeek V4 Pro	100%
Writer: Palmyra X5	100%
Claude Opus 4.8 (Reasoning, Low)	99%
Claude Opus 4.5	99%
GPT-5.6 Terra	99%
Qwen 3.5 Plus (2026-02-15)	99%
Z.AI GLM 5.2 (Reasoning, High)	99%
Claude Opus 4.8 (Reasoning)	99%
Claude Opus 4.6	99%
Gemma 4 31B (Reasoning)	99%
Gemma 4 26B	99%
Mistral Large 3	99%
Mistral Large 2	99%
Mistral Small 3.2 24B	99%
Grok 4.5 (Reasoning, High)	99%
Gemini 3.5 Flash (Reasoning)	99%

	Score	Cost	Time
Qwen3 235B A22B Instruct 2507	100%	$0.0003	10.6s
Gemini 2.5 Flash Lite	98%	$0.0003	1.5s
Mistral Small 3.2 24B	99%	$0.0002	5.9s
Mistral Small 4	99%	$0.0004	2.8s
Ministral 3 14B	99%	$0.0002	3.8s
Mistral NeMO	98%	$0.0002	3.0s
DeepSeek V4 Flash	98%	$0.0002	6.8s
Ministral 3 8B	98%	$0.0002	3.7s
Gemma 4 26B	99%	$0.0002	12.9s
Ministral 8B	97%	$0.0001	3.6s
Qwen 2.5 72B	99%	$0.0003	10.4s
Mistral Large 3	99%	$0.0010	7.4s
DeepSeek V4 Pro	100%	$0.0009	11.9s
Qwen 3.5 Plus (2026-02-15)	99%	$0.0015	6.6s
Cydonia 24B V4.1	98%	$0.0004	11.2s
Xiaomi MIMO v2.5	99%	$0.0021	9.3s
Gemini 3.1 Flash Lite (Reasoning)	96%	$0.0009	1.7s
Gemini 3.1 Flash Lite	96%	$0.0009	1.8s
Gemini 3.1 Flash Lite (Preview)	96%	$0.0009	1.7s
GPT-5.4 Nano (Reasoning)	95%	$0.0008	3.6s

	Score	Cost	Speed	Stability
Mistral Small 3.2 24B	99%	$0.0002	5.9s	99%
Qwen3 235B A22B Instruct 2507	100%	$0.0003	10.6s	99%
Mistral Large 3	99%	$0.0010	7.4s	99%
Ministral 3 14B	99%	$0.0002	3.8s	99%
Gemma 4 26B	99%	$0.0002	12.9s	99%
Qwen 3.5 Plus (2026-02-15)	99%	$0.0015	6.6s	99%
DeepSeek V4 Pro	100%	$0.0009	11.9s	99%
Ministral 3 8B	98%	$0.0002	3.7s	98%
Z.AI GLM 5.2 (Reasoning, High)	99%	$0.0027	9.6s	99%
Mistral Small 4	99%	$0.0004	2.8s	97%
Writer: Palmyra X5	100%	$0.0033	9.2s	99%
Mistral Large 2	99%	$0.0041	7.2s	99%
Mistral NeMO	98%	$0.0002	3.0s	97%
Qwen 2.5 72B	99%	$0.0003	10.4s	96%
Xiaomi MIMO v2.5	99%	$0.0021	9.3s	97%
Gemini 2.5 Flash Lite	98%	$0.0003	1.5s	96%
DeepSeek V4 Flash	98%	$0.0002	6.8s	96%
Cydonia 24B V4.1	98%	$0.0004	11.2s	94%
GPT-5.6 Terra	99%	$0.0088	2.9s	99%
Claude Sonnet 4.5	100%	$0.0099	4.5s	100%

Rank	Model	Avg. Cost	Avg. Time	Stability	# 1	# 2	# 3	# 4	# 5	# 6	# 7	Total
20	Claude Sonnet 4.5	$0.0099	4.5s	100%	100	100	100	100	100	100	99	100%
42	Claude Opus 4.6 (Reasoning)	$0.017	5.9s	99%	100	100	100	100	100	99	99	100%
2	Qwen3 235B A22B Instruct 2507	$0.0003	10.6s	99%	100	100	100	100	100	99	99	100%
24	Claude Sonnet 4	$0.0099	5.7s	99%	100	100	100	100	99	99	99	100%
7	DeepSeek V4 Pro	$0.0009	11.9s	99%	100	100	100	99	99	99	99	100%
11	Writer: Palmyra X5	$0.0033	9.2s	99%	100	100	99	99	99	99	99	100%
58	Claude Opus 4.8 (Reasoning, Low)	$0.023	7.5s	99%	100	99	99	99	99	99	99	99%
45	Claude Opus 4.5	$0.017	5.1s	99%	100	99	99	99	99	99	99	99%
19	GPT-5.6 Terra	$0.0088	2.9s	99%	100	99	99	99	99	99	99	99%
6	Qwen 3.5 Plus (2026-02-15)	$0.0015	6.6s	99%	100	99	99	99	99	99	99	99%
9	Z.AI GLM 5.2 (Reasoning, High)	$0.0027	9.6s	99%	99	99	99	99	99	99	99	99%
56	Claude Opus 4.8 (Reasoning)	$0.023	7.2s	99%	99	99	99	99	99	99	99	99%
43	Claude Opus 4.6	$0.017	5.7s	99%	99	99	99	99	99	99	99	99%
91	Gemma 4 31B (Reasoning)	$0.0012	2.8m	99%	99	99	99	99	99	99	99	99%
5	Gemma 4 26B	$0.0002	12.9s	99%	99	99	99	99	99	99	99	99%
3	Mistral Large 3	$0.0010	7.4s	99%	99	99	99	99	99	99	99	99%
12	Mistral Large 2	$0.0041	7.2s	99%	99	99	99	99	99	99	99	99%
1	Mistral Small 3.2 24B	$0.0002	5.9s	99%	99	99	99	99	99	99	99	99%
76	Grok 4.5 (Reasoning, High)	$0.021	37.5s	97%	100	100	100	100	100	99	96	99%
83	Gemini 3.5 Flash (Reasoning)	$0.030	13.1s	99%	99	99	99	99	99	99	99	99%
47	Gemma 4 26B (Reasoning)	$0.0017	1.3m	99%	99	99	99	99	99	99	99	99%
15	Xiaomi MIMO v2.5	$0.0021	9.3s	97%	100	100	100	99	99	99	97	99%
35	Claude Sonnet 4.6 (Reasoning)	$0.011	5.4s	97%	100	100	100	100	99	97	97	99%
33	Claude Sonnet 5 (Reasoning, Low)	$0.0094	11.3s	98%	99	99	99	99	99	99	99	99%
133	Gemini 3.1 Pro (Preview)	$0.059	56.1s	99%	99	99	99	99	99	99	99	99%
63	Claude Opus 4.7 (Reasoning)	$0.023	5.9s	99%	99	99	99	99	99	99	99	99%
32	Claude Sonnet 5 (Reasoning)	$0.0094	9.0s	99%	99	99	99	99	99	99	99	99%
61	Claude Opus 4.7	$0.023	4.6s	99%	99	99	99	99	99	99	99	99%
29	Claude Sonnet 5	$0.0092	8.6s	99%	99	99	99	99	99	99	99	99%
4	Ministral 3 14B	$0.0002	3.8s	99%	99	99	99	99	99	99	99	99%
14	Qwen 2.5 72B	$0.0003	10.4s	96%	100	99	99	99	99	97	96	99%
10	Mistral Small 4	$0.0004	2.8s	97%	100	99	99	99	99	99	96	99%
18	Cydonia 24B V4.1	$0.0004	11.2s	94%	99	99	99	99	99	99	93	98%
131	MoonshotAI: Kimi K2.6	$0.032	2.1m	95%	100	100	100	99	97	96	95	98%
31	Xiaomi MIMO v2.5 Pro	$0.0032	15.5s	95%	100	99	99	99	99	97	93	98%
17	DeepSeek V4 Flash	$0.0002	6.8s	96%	100	99	99	99	97	96	96	98%
8	Ministral 3 8B	$0.0002	3.7s	98%	98	98	98	98	98	98	98	98%
53	Z.AI GLM 5	$0.0064	49.9s	96%	100	99	99	99	97	96	96	98%
122	Qwen3.7 Max	$0.036	1.2m	96%	99	99	99	99	96	96	96	98%
139	Qwen3.6 Max Preview	$0.045	2.5m	96%	99	99	99	99	96	96	96	98%
39	Gemma 4 31B	$0.0003	50.7s	96%	99	99	99	99	96	96	96	98%
13	Mistral NeMO	$0.0002	3.0s	97%	99	99	98	98	98	97	96	98%
16	Gemini 2.5 Flash Lite	$0.0003	1.5s	96%	99	99	99	99	96	96	96	98%
64	Gemini 3 Flash (Preview, Reasoning)	$0.014	24.3s	94%	99	99	99	99	99	96	92	98%
92	Z.AI GLM 5.1	$0.015	1.1m	94%	99	99	99	99	96	96	92	97%
138	Z.AI GLM 4.7	$0.012	4.7m	92%	100	100	100	99	97	97	89	97%
38	MiniMax M2.7	$0.0022	22.5s	93%	100	99	99	99	95	95	92	97%
25	Ministral 8B	$0.0001	3.6s	93%	99	98	98	98	97	97	91	97%
52	DeepSeek V4 Flash (Reasoning)	$0.0005	23.3s	87%	100	100	99	99	99	99	81	97%
94	Qwen 3.5 27B	$0.014	1.1m	94%	99	97	97	96	96	96	96	97%
46	Claude Sonnet 4.6	$0.0099	4.4s	97%	97	97	97	97	97	97	97	97%
34	GPT-4o, Aug. 6th (temp=0)	$0.0063	2.6s	97%	97	97	97	97	97	97	97	97%
48	Z.AI GLM 4.5 Air	$0.0015	36.1s	93%	100	97	97	97	97	97	93	97%
26	Claude Haiku 4.5	$0.0033	3.3s	96%	97	97	97	97	97	97	96	97%
70	Grok 4.20 (Reasoning)	$0.0091	30.9s	90%	100	100	99	96	96	92	92	97%
114	Gemini 2.5 Pro	$0.032	24.8s	94%	99	96	96	96	96	96	95	96%
68	GPT-5.5	$0.018	5.1s	96%	97	97	97	97	96	96	96	96%
41	GPT-4o, Aug. 6th (temp=1)	$0.0063	2.8s	93%	99	97	97	97	97	96	93	96%
37	Llama 3.1 70B	$0.0007	26.0s	95%	97	97	97	96	96	96	95	96%
111	Aion 3.0	$0.020	1.1m	91%	100	99	97	97	96	93	92	96%
27	Gemini 3 Flash (Preview)	$0.0018	3.2s	95%	97	97	97	96	96	95	95	96%
21	Gemini 3.1 Flash Lite (Reasoning)	$0.0009	1.7s	96%	96	96	96	96	96	96	96	96%
22	Gemini 3.1 Flash Lite (Preview)	$0.0009	1.7s	96%	96	96	96	96	96	96	96	96%
23	Gemini 3.1 Flash Lite	$0.0009	1.8s	96%	96	96	96	96	96	96	96	96%
126	Claude Opus 4	$0.050	7.7s	95%	97	96	96	96	96	96	95	96%
81	Grok 4.5 (Reasoning, Low)	$0.011	36.1s	90%	100	99	99	99	97	89	89	96%
124	Qwen 3.5 397B A17B	$0.011	2.7m	90%	99	99	96	96	96	96	89	96%
129	GPT-5	$0.036	59.1s	90%	100	100	100	100	93	89	89	96%
82	ByteDance Seed 2.0 Lite	$0.0047	52.6s	87%	100	100	99	93	93	93	93	96%
112	Qwen 3.5 122B	$0.022	49.6s	90%	99	99	99	96	92	92	92	96%
30	Grok 4.20	$0.0019	4.4s	95%	96	96	96	96	95	95	95	96%
28	Mistral Medium 3.1	$0.0012	6.5s	95%	96	96	96	96	95	95	95	96%
66	GPT-5.4 (Reasoning, Low)	$0.0090	4.9s	88%	100	99	99	96	96	91	88	96%
78	MiniMax M2.5	$0.0015	1.2m	89%	100	99	96	96	95	95	88	96%
115	DeepSeek V4 Pro (Reasoning)	$0.0058	1.9m	87%	100	100	100	99	97	90	82	95%
50	Mistral Small 4 (Reasoning)	$0.0014	12.6s	89%	100	99	97	96	93	93	90	95%
40	GPT-5.4 Mini (Reasoning, Low)	$0.0027	2.9s	93%	96	96	96	96	96	95	91	95%
36	GPT-5.4 Nano (Reasoning)	$0.0008	3.6s	92%	96	96	96	95	95	95	91	95%
72	Z.AI GLM 5 Turbo	$0.0087	19.4s	89%	100	99	96	96	93	93	89	95%
102	Grok 4.3 (Reasoning)	$0.011	1.2m	88%	100	100	100	99	89	88	88	95%
101	GPT-5.1	$0.016	19.9s	84%	100	100	100	93	93	89	89	95%
55	GPT-5.4 Mini (Reasoning)	$0.0039	5.2s	88%	100	99	97	96	92	91	88	95%
125	MoonshotAI: Kimi K2.5	$0.020	1.8m	89%	99	97	96	96	96	89	89	95%
44	Hermes 3 70B	$0.0003	13.9s	92%	97	96	96	96	93	93	92	95%
51	Gemini 3.5 Flash (Reasoning, Minimal)	$0.0053	2.5s	93%	95	95	95	95	95	92	92	94%
90	GPT-5.5 (Reasoning, Low)	$0.018	5.3s	91%	97	97	96	96	93	93	89	94%
80	Qwen 3.6 35B	$0.0083	37.0s	91%	97	96	96	96	93	92	90	94%
121	ByteDance Seed 2.0 Mini	$0.0024	2.6m	88%	98	97	97	93	93	93	89	94%
49	Inception Mercury 2	$0.0014	1.7s	90%	97	97	95	95	95	92	87	94%
87	Qwen 3.5 Flash	$0.0033	1.0m	90%	97	97	97	97	93	89	88	94%
62	GPT-5.4 Mini	$0.0026	2.1s	85%	99	99	97	96	96	88	82	94%
84	GPT-5.6 Terra (Reasoning)	$0.010	4.5s	86%	99	99	99	99	89	89	81	94%
69	GPT-OSS 120B	$0.0006	36.9s	89%	97	97	95	95	95	89	88	94%
108	Qwen 3.5 35B	$0.015	47.1s	87%	99	97	96	93	93	89	89	94%
54	Grok 4.3	$0.0018	4.8s	88%	97	96	96	95	93	93	85	94%
95	GPT-5.6 Sol	$0.018	3.7s	89%	97	97	97	97	89	89	89	93%
98	Z.AI GLM 4.6	$0.0063	32.5s	81%	100	100	99	92	89	86	86	93%
106	Aion 3.0 Mini	$0.0049	1.3m	85%	99	97	97	96	93	89	81	93%
135	MiniMax M3	$0.0089	3.4m	86%	100	97	93	93	89	89	89	93%
107	Aion 2.0	$0.0050	1.1m	82%	100	97	97	93	92	89	81	93%
67	DeepSeek-V2 Chat	$0.0007	21.4s	88%	96	96	96	95	89	88	88	93%
59	GPT-4.1 Nano	$0.0003	3.5s	86%	97	96	96	93	91	89	86	93%
57	GPT-4.1 Mini	$0.0010	5.9s	88%	95	95	92	92	92	92	88	92%
60	GPT-5.4 Nano (Reasoning, Low)	$0.0007	2.9s	86%	96	95	95	91	91	89	89	92%
119	o4 Mini High	$0.020	37.0s	87%	96	96	93	92	92	89	89	92%
65	Gemini 2.5 Flash	$0.0014	2.1s	86%	96	96	93	92	92	92	85	92%
100	GPT-5 Mini	$0.0069	45.4s	86%	97	95	93	91	90	89	89	92%
105	ByteDance Seed 1.6	$0.0063	1.2m	88%	97	93	93	93	92	89	88	92%
85	Z.AI GLM 4.5	$0.0036	28.8s	88%	97	93	93	93	89	89	89	92%
127	Qwen 3.5 Plus (2026-04-20)	$0.014	1.5m	83%	97	96	93	89	89	89	89	91%
132	Qwen 3.6 27B	$0.019	1.3m	79%	99	96	96	89	89	89	81	91%
96	GPT-5.4	$0.0088	6.3s	82%	100	93	91	89	89	89	89	91%
74	Hermes 3 405B	$0.0011	21.1s	88%	93	93	93	93	93	89	85	91%
116	o4 Mini	$0.014	23.0s	82%	97	97	92	91	88	88	82	91%
130	GPT-5.5 (Reasoning)	$0.033	13.6s	85%	93	93	93	93	93	89	81	91%
97	DeepSeek V3.2	$0.0003	39.6s	81%	96	96	93	89	88	87	85	91%
109	Z.AI GLM 4.7 Flash	$0.0016	1.3m	84%	95	95	92	91	91	85	84	90%
71	GPT-5.6 Luna	$0.0035	2.3s	89%	91	91	91	91	91	88	88	90%
73	ByteDance Seed 1.6 Flash	$0.0008	14.5s	88%	92	92	92	92	91	88	85	90%
89	GPT-4.1	$0.0051	4.2s	84%	97	93	89	89	89	89	88	90%
86	DeepSeek V3 (2024-12-26)	$0.0006	15.4s	83%	96	95	95	95	88	81	81	90%
117	Qwen 3 32B	$0.0012	52.4s	77%	96	96	96	93	89	88	70	90%
77	GPT-5.4 Nano	$0.0007	2.8s	85%	91	91	91	88	88	88	88	90%
110	Gemini 2.5 Flash (Reasoning)	$0.0084	14.0s	79%	96	96	93	89	89	81	81	89%
123	GPT-5.6 Sol (Reasoning)	$0.025	8.0s	83%	93	93	93	89	88	88	81	89%
79	GPT-4o Mini (temp=1)	$0.0004	9.7s	86%	93	89	89	89	89	89	89	89%
118	GPT-5.4 (Reasoning)	$0.012	10.8s	79%	99	92	89	88	88	88	81	89%
104	WizardLM 2 8x22b	$0.0007	35.0s	80%	96	95	92	89	85	85	82	89%
93	GPT-5.2	$0.0079	5.7s	89%	89	89	89	89	89	89	89	89%
75	GPT-4o Mini (temp=0)	$0.0004	8.6s	89%	89	89	89	89	89	89	89	89%
99	GPT-5.6 Luna (Reasoning)	$0.0051	6.0s	82%	93	93	89	89	89	88	81	89%
88	Gemma 3 12B	$0.0001	8.8s	84%	92	92	92	92	89	81	81	89%
103	GPT-5 Nano	$0.0018	38.0s	84%	93	89	89	89	89	85	82	88%
113	Gemma 3 27B	$0.0002	19.4s	75%	96	91	91	85	84	81	80	87%
120	Gemini 2.5 Flash Lite (Reasoning)	$0.0033	28.4s	76%	97	93	89	85	82	81	81	87%
128	Cohere Command R+ (Aug. 2024)	$0.0066	11.3s	66%	99	97	96	93	85	68	60	85%
136	Qwen 3.5 9B	$0.0014	2.0m	72%	96	89	88	87	81	81	67	84%
137	DeepSeek V3 (2025-03-24)	$0.0005	22.6s	46%	96	96	96	95	95	89	20	84%
143	Arcee AI: Trinity Mini	$0.0002	7.4s	31%	99	99	93	93	92	86	0	80%
145	Qwen 3.6 Flash	$0.013	35.0s	32%	96	96	93	91	90	89	0	79%
141	Nemotron 3 Super	$0.0000	31.9s	47%	89	89	89	89	89	88	22	79%
144	Nemotron 3 Nano	$0.0004	29.5s	45%	93	92	89	82	82	75	22	76%
134	Ministral 3 3B	$0.0001	2.1s	66%	85	79	79	75	72	71	66	75%
142	Ministral 3B	$0.0000	2.2s	53%	97	78	78	72	64	61	54	72%
146	DeepSeek V3.1	$0.0005	57.1s	36%	88	87	81	81	81	43	9	67%
140	Gemma 3 4B	$0.0001	6.0s	67%	67	67	67	67	67	67	67	67%
93.85%

Median	Evaluator	Top 3	Flop 3
90.0%	Dialogue content preserved	100Claude Sonnet 4 100Qwen3 235B A22B Instruct 2507 100Claude Opus 4.8 (Reasoning)	49DeepSeek V3.1 61Nemotron 3 Super 61Nemotron 3 Nano
100.0%	Setting descriptions preserved	100GPT-5.1 100Gemini 2.5 Flash 100GPT-5.5 (Reasoning)	71Arcee AI: Trinity Mini 71Ministral 3B 77Ministral 3 3B
97.8%	Tense transformation accuracy	100GPT-4o, Aug. 6th (temp=0) 100Claude Sonnet 4.6 100Claude Haiku 4.5	0Gemma 3 4B 54Ministral 3 3B 55Ministral 3B

Text Replacement

Tense rewriting: past to present

Performance Score Distribution (Top 20)

Price-Performance Score Distribution (Top 20)

Most Stable Models (Top 20)

Top Overall Models (Top 20)