Preservation

Subcategory of Text Editing. 154 models scored.

Model Leaderboard

All models ranked by their Preservation subcategory score.

#	Model	Preservation	Text Editing	Overall
1	Claude Opus 4.6	100.00%	98.35%	92.35%
2	Claude Opus 4.8 (Reasoning)	100.00%	98.78%	92.22%
3	Claude Opus 4.5	100.00%	97.69%	89.69%
4	Claude Opus 4.8 (Reasoning, Low)	99.87%	98.71%	92.14%
5	Claude Opus 4.7 (Reasoning)	99.85%	97.58%	93.23%
6	Claude Opus 4.7	99.85%	97.55%	89.93%
7	Claude Sonnet 4.6	99.82%	96.37%	91.15%
8	Gemma 4 31B	99.80%	98.56%	86.91%
9	Claude Sonnet 4	99.69%	99.13%	88.72%
10	Gemma 4 26B	99.69%	97.04%	85.84%
11	Claude Haiku 4.5	99.64%	96.81%	85.14%
12	Qwen 3.5 Plus (2026-02-15)	99.38%	98.10%	85.96%
13	Claude Sonnet 4.5	99.31%	99.02%	88.03%
14	Grok 4	99.23%	98.76%	88.12%
15	Claude Sonnet 4.6 (Reasoning)	99.06%	98.30%	93.66%
16	Gemini 3.1 Pro (Preview)	99.03%	98.51%	94.37%
17	Gemma 4 26B (Reasoning)	99.01%	98.26%	91.49%
18	Gemma 4 31B (Reasoning)	98.98%	98.83%	91.71%
19	GPT-5.5 (Reasoning, Low)	98.93%	98.59%	92.59%
20	Grok 4.20 (Reasoning)	98.90%	98.83%	91.39%
21	Gemini 3 Flash (Preview)	98.90%	97.54%	85.35%
22	Z.AI GLM 5.1	98.88%	98.90%	94.37%
23	MoonshotAI: Kimi K2.6	98.83%	98.00%	92.31%
24	Grok 4.20 (Beta, Reasoning)	98.80%	98.69%	91.49%
25	Gemini 2.5 Pro	98.80%	98.58%	88.53%
26	Qwen3.6 Max Preview	98.78%	98.58%	94.54%
27	Claude Opus 4.6 (Reasoning)	98.72%	98.86%	95.02%
28	Z.AI GLM 4.7	98.67%	98.22%	88.69%
29	Gemini 2.5 Flash	98.67%	97.83%	80.60%
30	Z.AI GLM 5	98.62%	98.59%	91.23%
31	DeepSeek V4 Pro	98.62%	97.98%	82.63%
32	DeepSeek V4 Pro (Reasoning)	98.60%	98.56%	90.10%
33	Grok 4 Fast	98.60%	97.26%	86.15%
34	GPT-5.5	98.59%	98.20%	89.09%
35	GPT-5.5 (Reasoning)	98.57%	98.79%	92.98%
36	GPT-5	98.57%	98.90%	91.93%
37	MiniMax M3	98.55%	97.51%	90.88%
38	Qwen 3.5 27B	98.55%	98.69%	90.85%
39	Gemini 3.5 Flash (Reasoning)	98.47%	97.78%	94.08%
40	Gemini 3 Pro (Preview)	98.47%	98.86%	88.79%
41	Claude Opus 4	98.34%	97.25%	87.69%
42	Z.AI GLM 4.6	98.31%	97.78%	89.11%
43	Qwen 3.5 122B	98.28%	96.31%	91.53%
44	Qwen3.7 Max	98.27%	98.08%	95.75%
45	DeepSeek V4 Flash (Reasoning)	98.26%	96.15%	89.01%
46	MoonshotAI: Kimi K2.5	98.19%	97.79%	91.04%
47	Gemini 3 Flash (Preview, Reasoning)	98.19%	98.12%	90.50%
48	GPT-5.1	98.18%	98.54%	92.54%
49	Gemini 3.5 Flash (Reasoning, Minimal)	98.17%	98.26%	86.47%
50	GPT-4o, May 13th (temp=0)	98.16%	95.35%	85.36%
51	Gemini 3.1 Flash Lite (Preview)	98.12%	96.46%	85.87%
52	Gemini 3.1 Flash Lite	98.11%	97.35%	85.75%
53	Xiaomi MIMO v2.5	98.04%	95.34%	85.05%
54	MiniMax M2.5	98.01%	96.02%	88.71%
55	Gemini 3.1 Flash Lite (Reasoning)	97.98%	96.90%	86.41%
56	Stealth: Healer Alpha	97.86%	96.04%	85.93%
57	Z.AI GLM 5 Turbo	97.83%	98.17%	94.27%
58	Claude 3.7 Sonnet	97.73%	97.12%	83.39%
59	Grok 4.3 (Reasoning)	97.63%	97.64%	93.60%
60	GPT-4.1 Mini	97.60%	95.62%	83.20%
61	ByteDance Seed 2.0 Lite	97.58%	95.03%	84.80%
62	ByteDance Seed 1.6	97.58%	98.40%	90.70%
63	GPT-4o, May 13th (temp=1)	97.54%	92.41%	83.80%
64	Qwen 3.6 Flash	97.52%	96.09%	90.65%
65	GPT-5 Mini	97.51%	97.13%	92.62%
66	Hermes 3 405B	97.48%	89.14%	82.86%
67	GPT-5.4 (Reasoning)	97.42%	98.42%	93.24%
68	Gemini 2.5 Flash (Reasoning)	97.39%	98.12%	86.51%
69	Qwen 3.6 35B	97.30%	95.10%	89.05%
70	Grok 4.1 Fast	97.29%	97.87%	89.55%
71	GPT-5.4 (Reasoning, Low)	97.21%	98.01%	91.41%
72	Aion 2.0	97.21%	95.34%	89.21%
73	Grok 4.20	97.12%	95.63%	81.70%
74	o4 Mini High	97.09%	94.36%	90.29%
75	Z.AI GLM 4.5 Air	97.05%	94.38%	83.12%
76	Mistral Small Creative	96.98%	90.31%	73.27%
77	Claude 3.5 Sonnet	96.96%	96.57%	84.24%
78	Qwen 3.5 Plus (2026-04-20)	96.81%	97.70%	91.51%
79	GPT-4o, Aug. 6th (temp=0)	96.79%	93.77%	82.45%
80	Xiaomi MIMO v2.5 Pro	96.73%	95.96%	87.36%
81	GPT-4.1	96.66%	94.40%	88.68%
82	Mistral Medium 3.1	96.56%	93.77%	77.83%
83	Stealth: Hunter Alpha	96.45%	95.53%	87.34%
84	GPT-5.2	96.38%	97.54%	90.26%
85	Qwen 3.5 397B A17B	96.38%	98.05%	91.73%
86	o4 Mini	96.24%	90.61%	88.35%
87	GPT-5.4	96.10%	96.73%	84.32%
88	MiniMax M2.7	95.84%	92.14%	89.10%
89	Qwen 2.5 72B	95.78%	89.18%	75.46%
90	Qwen 3.6 27B	95.77%	93.97%	89.72%
91	Qwen3 235B A22B Instruct 2507	95.60%	91.75%	80.10%
92	GPT-5.4 Mini (Reasoning)	95.50%	95.78%	90.65%
93	Gemma 3 27B	95.49%	86.63%	77.85%
94	ByteDance Seed 1.6 Flash	95.32%	91.64%	73.27%
95	GPT-4o, Aug. 6th (temp=1)	95.01%	86.72%	82.62%
96	Llama 3.1 70B	94.67%	92.10%	78.40%
97	Mistral Large	94.67%	95.14%	80.15%
98	Grok 4.20 (Beta)	94.41%	95.49%	83.85%
99	Gemma 3 4B	94.35%	78.38%	68.57%
100	GPT-OSS 120B	94.34%	91.73%	86.44%
101	Writer: Palmyra X5	94.28%	91.20%	79.57%
102	Qwen 3.5 35B	94.21%	94.95%	88.00%
103	Qwen 3 32B	93.83%	89.95%	82.21%
104	Mistral Small 4 (Reasoning)	93.47%	90.58%	82.39%
105	Llama 3.1 Nemotron 70B	93.45%	87.26%	74.70%
106	Qwen 3.5 Flash	93.04%	92.80%	86.38%
107	Cydonia 24B V4.1	92.65%	86.15%	75.09%
108	DeepSeek V3 (2024-12-26)	92.60%	93.58%	83.68%
109	Qwen 3.5 9B	92.48%	85.35%	86.05%
110	Mistral Small 3.2 24B	92.38%	89.48%	78.58%
111	Gemini 2.5 Flash Lite	92.17%	92.13%	81.08%
112	GPT-5.4 Mini (Reasoning, Low)	91.94%	92.63%	85.75%
113	GPT-5 Nano	91.91%	82.74%	82.60%
114	DeepSeek V3.2	91.71%	95.78%	82.25%
115	Z.AI GLM 4.5	91.69%	95.32%	86.27%
116	Mistral Large 3	91.61%	94.09%	85.43%
117	Mistral Large 2	91.61%	94.16%	82.41%
118	DeepSeek-V2 Chat	91.53%	90.90%	84.83%
119	Mistral Small 4	91.45%	91.00%	76.46%
120	Inception Mercury 2	91.31%	85.26%	83.85%
121	DeepSeek V3 (2025-03-24)	90.56%	89.57%	81.99%
122	Z.AI GLM 4.7 Flash	90.40%	85.82%	84.82%
123	GPT-5.4 Nano (Reasoning)	90.10%	83.32%	81.36%
124	ByteDance Seed 2.0 Mini	90.10%	91.08%	86.91%
125	DeepSeek V4 Flash	90.06%	93.25%	82.02%
126	WizardLM 2 8x22b	89.67%	88.13%	71.06%
127	Nemotron 3 Super	89.49%	86.34%	84.56%
128	Ministral 3 14B	89.12%	86.20%	72.54%
129	Gemini 2.5 Flash Lite (Reasoning)	89.04%	94.54%	85.75%
130	Grok 4.3	88.90%	90.19%	78.66%
131	GPT-5.4 Mini	88.47%	90.60%	82.43%
132	Gemma 3 12B	87.66%	85.18%	78.41%
133	Inception Mercury	87.15%	79.53%	79.50%
134	GPT-4o Mini (temp=1)	86.94%	85.78%	79.08%
135	GPT-4o Mini (temp=0)	86.87%	84.62%	78.29%
136	Arcee AI: Trinity Mini	85.07%	73.88%	70.90%
137	GPT-5.4 Nano (Reasoning, Low)	85.00%	82.23%	79.48%
138	DeepSeek V3.1	84.70%	87.27%	82.39%
139	GPT-5.4 Nano	84.23%	79.22%	74.40%
140	Ministral 8B	83.69%	77.52%	64.87%
141	Ministral 3 8B	83.53%	78.52%	71.76%
142	Arcee AI: Trinity Large (Preview)	83.17%	86.62%	73.33%
143	Skyfall 36B V2	82.94%	76.69%	65.76%
144	Mistral NeMO	82.35%	73.69%	65.04%
145	Llama 3.1 8B	80.76%	75.45%	63.35%
146	Nemotron 3 Nano	78.40%	75.81%	77.73%
147	GPT-4.1 Nano	78.02%	76.06%	71.94%
148	Ministral 3B	74.55%	70.91%	61.29%
149	Ministral 3 3B	73.12%	69.80%	67.22%
150	Cohere Command R+ (Aug. 2024)	70.67%	68.40%	69.03%
151	Hermes 3 70B	66.87%	63.34%	72.57%
152	Rocinante 12B	63.96%	56.31%	54.54%
153	Claude 3 Haiku	63.45%	64.36%	71.19%
154	LFM2 24B	57.43%	71.56%	58.77%