Text Replacement
Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 97% | $0.0003 | 1.7s | |
| Gemini 3.1 Flash Lite (Preview) | 99% | $0.0010 | 1.8s | |
| Gemini 3.1 Flash Lite (Reasoning) | 99% | $0.0010 | 3.1s | |
| Mistral Small 4 | 96% | $0.0004 | 3.3s | |
| Gemini 3.1 Flash Lite | 99% | $0.0010 | 2.8s | |
| Mistral Small 3.2 24B | 97% | $0.0002 | 5.0s | |
| Gemini 2.5 Flash | 99% | $0.0015 | 2.2s | |
| DeepSeek V4 Flash | 97% | $0.0002 | 8.1s | |
| Grok 4 Fast | 99% | $0.0008 | 6.5s | |
| Gemini 3 Flash (Preview) | 99% | $0.0019 | 3.4s | |
| GPT-4.1 Mini | 98% | $0.0011 | 7.0s | |
| Mistral Large 3 | 98% | $0.0011 | 7.7s | |
| Gemma 3 12B | 95% | $0.0001 | 9.0s | |
| Inception Mercury 2 | 95% | $0.0017 | 2.3s | |
| Grok 4.20 | 98% | $0.0020 | 4.4s | |
| Qwen 2.5 72B | 98% | $0.0003 | 10.9s | |
| Qwen 3.5 Plus (2026-02-15) | 99% | $0.0015 | 7.2s | |
| GPT-4o Mini (temp=1) | 95% | $0.0004 | 9.5s | |
| Grok 4.3 | 95% | $0.0021 | 4.7s | |
| Stealth: Hunter Alpha | 98% | $0.0000 | 19.5s | |
Cost vs Performance
Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.
16 low-scoring outliers hidden: DeepSeek V3.1 (89.5%), GPT-4.1 Nano (89.3%), Gemma 3 4B (89.3%), Skyfall 36B V2 (87.3%), Ministral 3 8B (87.0%), Ministral 8B (86.7%), Mistral NeMO (86.6%), Arcee AI: Trinity Mini (85.7%), Nemotron 3 Nano (83.3%), Ministral 3 3B (81.2%), Ministral 3B (80.9%), Cohere Command R+ (Aug. 2024) (73.7%), LFM2 24B (71.7%), Hermes 3 70B (69.5%), Rocinante 12B (66.3%), Claude 3 Haiku (61.1%).
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 | 100% | 99% | 99% | |
| Gemma 4 31B | 100% | 98% | 98% | |
| Claude Opus 4.5 | 100% | 98% | 98% | |
| Gemma 4 31B (Reasoning) | 100% | 98% | 98% | |
| Claude Sonnet 4 | 100% | 98% | 98% | |
| Gemini 3 Pro (Preview) | 100% | 98% | 98% | |
| Claude Opus 4.6 (Reasoning) | 100% | 98% | 98% | |
| Claude Sonnet 4.5 | 100% | 98% | 98% | |
| Grok 4 | 100% | 98% | 98% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 98% | 98% | |
| Qwen3.6 Max Preview | 100% | 98% | 98% | |
| Z.AI GLM 5.1 | 100% | 98% | 98% | |
| Qwen 3.5 27B | 99% | 98% | 98% | |
| Z.AI GLM 5 | 99% | 98% | 98% | |
| Claude Opus 4.7 (Reasoning) | 99% | 98% | 98% | |
| Gemma 4 26B (Reasoning) | 99% | 97% | 97% | |
| Claude Opus 4.7 | 99% | 97% | 97% | |
| Grok 4.20 (Reasoning) | 99% | 97% | 97% | |
| Gemini 2.5 Pro | 99% | 97% | 97% | |
| Qwen3.7 Max | 99% | 97% | 97% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 3.1 Flash Lite (Preview) | 99% | $0.0010 | 1.8s | 96% | |
| Gemini 3.1 Flash Lite | 99% | $0.0010 | 2.8s | 96% | |
| Gemini 3.1 Flash Lite (Reasoning) | 99% | $0.0010 | 3.1s | 96% | |
| Gemini 3 Flash (Preview) | 99% | $0.0019 | 3.4s | 97% | |
| Qwen 3.5 Plus (2026-02-15) | 99% | $0.0015 | 7.2s | 97% | |
| Claude Haiku 4.5 | 99% | $0.0036 | 3.2s | 97% | |
| Grok 4 Fast | 99% | $0.0008 | 6.5s | 95% | |
| Gemini 2.5 Flash | 99% | $0.0015 | 2.2s | 92% | |
| Gemma 4 26B | 99% | $0.0003 | 17.1s | 97% | |
| GPT-4.1 Mini | 98% | $0.0011 | 7.0s | 94% | |
| Grok 4.20 | 98% | $0.0020 | 4.4s | 92% | |
| Stealth: Healer Alpha | 99% | $0.0000 | 14.3s | 94% | |
| DeepSeek V4 Pro | 99% | $0.0013 | 21.1s | 97% | |
| Grok 4.1 Fast | 99% | $0.0010 | 12.4s | 92% | |
| Gemma 4 31B | 100% | $0.0003 | 30.2s | 98% | |
| Mistral Medium 3.1 | 97% | $0.0013 | 5.9s | 91% | |
| Gemini 2.5 Flash Lite | 97% | $0.0003 | 1.7s | 86% | |
| Claude Sonnet 4.5 | 100% | $0.011 | 4.9s | 98% | |
| Claude Sonnet 4 | 100% | $0.011 | 6.1s | 98% | |
| GPT-4.1 | 98% | $0.0054 | 4.4s | 92% | |
| Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Total ▼ | Character rename: Elena->Mirabel, Gregor->Aldric | Character rename: Elena->Mirabel, Gregor->Aldric | Location rename: market square, outer ring, bridge, northern mines | Location rename: market square, outer ring, bridge, northern mines | Expand all contractions | Expand all contractions | Tense rewriting: past to present | Tense rewriting: past to present | POV shift: 3rd person to 1st person (Elena's perspective) | POV shift: 3rd person to 1st person (Elena's perspective) | Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged | Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged | Combined: 3rd person past → 1st person present | Combined: 3rd person past → 1st person present | Passive voice → active voice | Passive voice → active voice | Avoid said/asked/replied/answered | Avoid said/asked/replied/answered |
| Claude Sonnet 4 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 98% | 97% | 100% | 100% |
| Gemma 4 31B | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 98% | 100% | 100% | 100% | 100% | 100% | 99% | 99% | 98% | 100% | 100% |
| Claude Opus 4.6 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 99% | 100% | 100% | 100% | 99% | 98% | 98% | 100% | 100% |
| Claude Sonnet 4.5 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 99% | 99% | 96% | 100% | 100% |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 100% | 100% | 99% | 99% | 96% | 100% | 100% |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 99% | 100% | 100% | 100% | 100% | 100% | 99% | 99% | 97% | 100% | 100% |
| Grok 4 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 99% | 96% | 100% | 100% |
| Gemini 3 Pro (Preview) | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 100% | 100% | 100% | 99% | 99% | 98% | 96% | 100% | 100% |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 99% | 100% | 100% | 100% | 100% | 100% | 99% | 99% | 96% | 100% | 100% |
| Claude Opus 4.5 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 99% | 100% | 100% | 100% | 99% | 97% | 97% | 100% | 100% |
| Qwen3.6 Max Preview | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 98% | 100% | 100% | 100% | 100% | 100% | 99% | 98% | 96% | 100% | 100% |
| Z.AI GLM 5.1 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 97% | 100% | 100% | 100% | 100% | 100% | 99% | 99% | 97% | 100% | 100% |
| Gemma 4 26B (Reasoning) | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 100% | 100% | 100% | 99% | 99% | 98% | 95% | 100% | 100% |
| Z.AI GLM 5 | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 98% | 100% | 100% | 100% | 100% | 100% | 99% | 98% | 96% | 100% | 100% |
| Grok 4.20 (Reasoning) | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 97% | 100% | 100% | 100% | 100% | 100% | 99% | 99% | 97% | 100% | 100% |
Generic Prompt
Character rename: Elena->Mirabel, Gregor->Aldric
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Inception Mercury | 100% | $0.0004 | 815ms | |
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.6s | |
| Inception Mercury 2 | 100% | $0.0007 | 971ms | |
| Ministral 8B | 100% | $0.0001 | 3.2s | |
| Ministral 3 8B | 100% | $0.0002 | 2.9s | |
| Mistral Small Creative | 100% | $0.0002 | 2.9s | |
| Mistral Small 4 | 100% | $0.0004 | 2.8s | |
| Stealth: Healer Alpha | 100% | $0.0000 | 6.7s | |
| GPT-4.1 Nano | 100% | $0.0003 | 3.7s | |
| Ministral 3 14B | 100% | $0.0002 | 4.0s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.6s | |
| Grok 4 Fast | 100% | $0.0005 | 3.3s | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 4.3s | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 1.9s | |
| GPT-5.4 Nano (Reasoning, Low) | 100% | $0.0007 | 2.8s | |
| GPT-5.4 Nano | 100% | $0.0007 | 2.7s | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0007 | 3.2s | |
| Gemma 3 4B | 100% | $0.0001 | 6.1s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 5.7s | |
| Mistral NeMO | 93% | $0.0002 | 2.3s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Qwen3.7 Max | 100% | 100% | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Gemini 3.5 Flash (Reasoning) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| Grok 4.3 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Inception Mercury | 100% | $0.0004 | 815ms | 100% | |
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.6s | 100% | |
| Inception Mercury 2 | 100% | $0.0007 | 971ms | 100% | |
| Ministral 3 8B | 100% | $0.0002 | 2.9s | 100% | |
| Mistral Small Creative | 100% | $0.0002 | 2.9s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.6s | 100% | |
| Ministral 8B | 100% | $0.0001 | 3.2s | 100% | |
| Mistral Small 4 | 100% | $0.0004 | 2.8s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 1.9s | 100% | |
| GPT-5.4 Nano | 100% | $0.0007 | 2.7s | 100% | |
| GPT-5.4 Nano (Reasoning, Low) | 100% | $0.0007 | 2.8s | 100% | |
| GPT-4.1 Nano | 100% | $0.0003 | 3.7s | 100% | |
| Grok 4 Fast | 100% | $0.0005 | 3.3s | 100% | |
| Ministral 3 14B | 100% | $0.0002 | 4.0s | 100% | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0007 | 3.2s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0014 | 2.1s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 4.3s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0007 | 4.7s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 5.7s | 100% | |
| Gemma 3 4B | 100% | $0.0001 | 6.1s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Name replacement accuracy | ||
| 100.0% | No remaining old names | ||
| 100.0% | Non-name text preserved |
Location rename: market square, outer ring, bridge, northern mines
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.5s | |
| GPT-4.1 Nano | 98% | $0.0003 | 3.5s | |
| Mistral Small 4 | 99% | $0.0004 | 2.9s | |
| Stealth: Healer Alpha | 100% | $0.0000 | 6.9s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.6s | |
| Inception Mercury | 100% | $0.0004 | 4.0s | |
| Grok 4 Fast | 100% | $0.0005 | 3.8s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 10.3s | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 1.8s | |
| ByteDance Seed 1.6 Flash | 99% | $0.0003 | 5.3s | |
| DeepSeek V4 Flash | 100% | $0.0002 | 8.1s | |
| Inception Mercury 2 | 100% | $0.0010 | 1.4s | |
| Grok 4.1 Fast | 100% | $0.0006 | 5.2s | |
| Llama 3.1 70B | 100% | $0.0005 | 17.8s | |
| Gemma 3 12B | 100% | $0.0001 | 8.6s | |
| Gemini 2.5 Flash | 100% | $0.0014 | 2.1s | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0009 | 7.3s | |
| Gemma 4 26B | 100% | $0.0002 | 21.8s | |
| GPT-4.1 Mini | 100% | $0.0010 | 5.8s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Qwen3.7 Max | 100% | 100% | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Gemini 3.5 Flash (Reasoning) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| Grok 4.3 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Qwen 3.5 Plus (2026-04-20) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.5s | 100% | |
| Inception Mercury 2 | 100% | $0.0010 | 1.4s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 1.8s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0014 | 2.1s | 100% | |
| Grok 4 Fast | 100% | $0.0005 | 3.8s | 100% | |
| Inception Mercury | 100% | $0.0004 | 4.0s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.6s | 100% | |
| Grok 4.1 Fast | 100% | $0.0006 | 5.2s | 100% | |
| Stealth: Healer Alpha | 100% | $0.0000 | 6.9s | 100% | |
| GPT-5.4 Mini | 100% | $0.0027 | 2.2s | 100% | |
| Grok 4.3 | 100% | $0.0019 | 4.2s | 100% | |
| Grok 4.20 | 100% | $0.0020 | 4.1s | 100% | |
| GPT-4.1 Mini | 100% | $0.0010 | 5.8s | 100% | |
| Mistral Small 4 | 99% | $0.0004 | 2.9s | 96% | |
| DeepSeek V4 Flash | 100% | $0.0002 | 8.1s | 100% | |
| Gemma 3 12B | 100% | $0.0001 | 8.6s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0009 | 7.3s | 100% | |
| Claude Haiku 4.5 | 100% | $0.0034 | 2.8s | 100% | |
| ByteDance Seed 1.6 Flash | 99% | $0.0003 | 5.3s | 97% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Name replacement accuracy | ||
| 100.0% | No remaining old names | ||
| 100.0% | Non-name text preserved |
Expand all contractions
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.3s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 3.7s | |
| Gemini 3.1 Flash Lite | 100% | $0.0007 | 1.5s | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0007 | 1.5s | |
| Mistral Small 4 | 99% | $0.0003 | 2.4s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0007 | 1.5s | |
| DeepSeek V4 Flash | 100% | $0.0001 | 5.6s | |
| Stealth: Healer Alpha | 99% | $0.0000 | 7.8s | |
| Mistral NeMO | 98% | $0.0001 | 2.6s | |
| Grok 4 Fast | 100% | $0.0007 | 4.9s | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 7.0s | |
| Ministral 8B | 95% | $0.0001 | 2.4s | |
| Gemini 2.5 Flash | 100% | $0.0012 | 1.8s | |
| Stealth: Hunter Alpha | 100% | $0.0000 | 15.8s | |
| Inception Mercury | 98% | $0.0003 | 4.9s | |
| Llama 3.1 8B | 98% | $0.0000 | 6.1s | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 8.4s | |
| GPT-5.4 Nano (Reasoning, Low) | 98% | $0.0006 | 2.4s | |
| Ministral 3 14B | 98% | $0.0002 | 3.2s | |
| ByteDance Seed 1.6 Flash | 100% | $0.0004 | 8.4s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Qwen3.7 Max | 100% | 100% | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Gemini 3.5 Flash (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Gemma 4 26B (Reasoning) | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| Grok 4.20 (Reasoning) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.3s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0007 | 1.5s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0007 | 1.5s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0007 | 1.5s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 3.7s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0012 | 1.8s | 100% | |
| Mistral Small 4 | 99% | $0.0003 | 2.4s | 99% | |
| DeepSeek V4 Flash | 100% | $0.0001 | 5.6s | 100% | |
| Gemini 3 Flash (Preview) | 100% | $0.0015 | 2.7s | 100% | |
| Grok 4 Fast | 100% | $0.0007 | 4.9s | 99% | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 7.0s | 100% | |
| Mistral Large 3 | 100% | $0.0008 | 5.8s | 100% | |
| Mistral NeMO | 98% | $0.0001 | 2.6s | 97% | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0012 | 5.5s | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 8.4s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 99% | $0.0008 | 4.6s | 99% | |
| GPT-5.4 Nano (Reasoning) | 98% | $0.0006 | 2.5s | 97% | |
| Ministral 3 14B | 98% | $0.0002 | 3.2s | 97% | |
| GPT-5.4 Nano (Reasoning, Low) | 98% | $0.0006 | 2.4s | 97% | |
| ByteDance Seed 1.6 Flash | 100% | $0.0004 | 8.4s | 99% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 99.7% | Name replacement accuracy | ||
| 100.0% | Non-name text preserved | ||
| 100.0% | Possessive traps preserved |
Tense rewriting: past to present
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Qwen3 235B A22B Instruct 2507 | 100% | $0.0003 | 10.6s | |
| Gemini 2.5 Flash Lite | 98% | $0.0003 | 1.5s | |
| Mistral Small 3.2 24B | 99% | $0.0002 | 5.9s | |
| Mistral Small 4 | 99% | $0.0004 | 2.8s | |
| Ministral 3 14B | 99% | $0.0002 | 3.8s | |
| Mistral NeMO | 98% | $0.0002 | 3.0s | |
| DeepSeek V4 Flash | 98% | $0.0002 | 6.8s | |
| Ministral 3 8B | 98% | $0.0002 | 3.7s | |
| Gemma 4 26B | 99% | $0.0002 | 12.9s | |
| Ministral 8B | 97% | $0.0001 | 3.6s | |
| Qwen 2.5 72B | 99% | $0.0003 | 10.4s | |
| Stealth: Hunter Alpha | 97% | $0.0000 | 11.2s | |
| Mistral Large 3 | 99% | $0.0010 | 7.4s | |
| DeepSeek V4 Pro | 100% | $0.0009 | 11.9s | |
| Qwen 3.5 Plus (2026-02-15) | 99% | $0.0015 | 6.6s | |
| Cydonia 24B V4.1 | 98% | $0.0004 | 11.2s | |
| Skyfall 36B V2 | 84% | $0.0006 | 9.1s | |
| Stealth: Healer Alpha | 96% | $0.0000 | 13.0s | |
| Xiaomi MIMO v2.5 | 99% | $0.0021 | 9.3s | |
| Rocinante 12B | 69% | $0.0004 | 8.1s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Sonnet 4.5 | 100% | 100% | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | 99% | 99% | |
| Qwen3 235B A22B Instruct 2507 | 100% | 99% | 99% | |
| Claude Sonnet 4 | 100% | 99% | 99% | |
| Claude Opus 4.6 | 99% | 100% | 99% | |
| Gemma 4 31B (Reasoning) | 99% | 100% | 99% | |
| Gemini 3 Pro (Preview) | 99% | 100% | 99% | |
| Gemma 4 26B | 99% | 100% | 99% | |
| Mistral Large 3 | 99% | 100% | 99% | |
| Claude 3.7 Sonnet | 99% | 100% | 99% | |
| Mistral Large 2 | 99% | 100% | 99% | |
| Mistral Large | 99% | 100% | 99% | |
| Mistral Small 3.2 24B | 99% | 100% | 99% | |
| Arcee AI: Trinity Large (Preview) | 99% | 100% | 99% | |
| Gemini 3.5 Flash (Reasoning) | 99% | 100% | 99% | |
| Gemma 4 26B (Reasoning) | 99% | 100% | 99% | |
| Claude Opus 4.5 | 99% | 100% | 99% | |
| Qwen 3.5 Plus (2026-02-15) | 99% | 100% | 99% | |
| Writer: Palmyra X5 | 100% | 99% | 99% | |
| Claude 3.5 Sonnet | 99% | 99% | 99% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Mistral Small 3.2 24B | 99% | $0.0002 | 5.9s | 99% | |
| Ministral 3 14B | 99% | $0.0002 | 3.8s | 99% | |
| Qwen3 235B A22B Instruct 2507 | 100% | $0.0003 | 10.6s | 99% | |
| Ministral 3 8B | 98% | $0.0002 | 3.7s | 98% | |
| Mistral Large 3 | 99% | $0.0010 | 7.4s | 99% | |
| Gemma 4 26B | 99% | $0.0002 | 12.9s | 99% | |
| Qwen 3.5 Plus (2026-02-15) | 99% | $0.0015 | 6.6s | 99% | |
| Mistral Small 4 | 99% | $0.0004 | 2.8s | 97% | |
| Mistral NeMO | 98% | $0.0002 | 3.0s | 97% | |
| DeepSeek V4 Pro | 100% | $0.0009 | 11.9s | 99% | |
| Gemini 2.5 Flash Lite | 98% | $0.0003 | 1.5s | 96% | |
| DeepSeek V4 Flash | 98% | $0.0002 | 6.8s | 96% | |
| Arcee AI: Trinity Large (Preview) | 99% | $0.0000 | 21.2s | 99% | |
| Qwen 2.5 72B | 99% | $0.0003 | 10.4s | 96% | |
| Gemini 3.1 Flash Lite (Reasoning) | 96% | $0.0009 | 1.7s | 96% | |
| Gemini 3.1 Flash Lite (Preview) | 96% | $0.0009 | 1.7s | 96% | |
| Gemini 3.1 Flash Lite | 96% | $0.0009 | 1.8s | 96% | |
| Writer: Palmyra X5 | 100% | $0.0033 | 9.2s | 99% | |
| Xiaomi MIMO v2.5 | 99% | $0.0021 | 9.3s | 97% | |
| Mistral Large | 99% | $0.0041 | 7.1s | 99% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 90.0% | Dialogue content preserved | ||
| 97.8% | Name replacement accuracy | ||
| 100.0% | Non-name text preserved |
POV shift: 3rd person to 1st person (Elena's perspective)
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Healer Alpha | 100% | $0.0000 | 7.4s | |
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.7s | |
| Inception Mercury | 100% | $0.0004 | 1.8s | |
| GPT-4.1 Nano | 98% | $0.0003 | 3.5s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.2s | |
| Grok 4 Fast | 100% | $0.0004 | 2.6s | |
| Gemma 3 12B | 100% | $0.0001 | 7.7s | |
| DeepSeek V4 Flash | 100% | $0.0002 | 6.8s | |
| Inception Mercury 2 | 100% | $0.0007 | 1.0s | |
| Mistral Small 4 | 87% | $0.0004 | 3.3s | |
| GPT-5.4 Nano (Reasoning, Low) | 100% | $0.0007 | 2.9s | |
| Llama 3.1 8B | 99% | $0.0000 | 5.8s | |
| GPT-5.4 Nano | 100% | $0.0007 | 3.4s | |
| Stealth: Hunter Alpha | 100% | $0.0000 | 23.6s | |
| Qwen 2.5 72B | 100% | $0.0003 | 10.7s | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0008 | 3.2s | |
| GPT-4o Mini (temp=1) | 100% | $0.0004 | 9.2s | |
| Gemma 4 26B | 100% | $0.0003 | 14.2s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | |
| GPT-4o Mini (temp=0) | 100% | $0.0004 | 9.0s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Qwen3.7 Max | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Gemini 3.5 Flash (Reasoning) | 100% | 100% | 100% | |
| Grok 4.3 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Qwen 3.5 Plus (2026-04-20) | 100% | 100% | 100% | |
| Gemma 4 26B (Reasoning) | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.7s | 100% | |
| Inception Mercury 2 | 100% | $0.0007 | 1.0s | 100% | |
| Grok 4 Fast | 100% | $0.0004 | 2.6s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 1.7s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.2s | 100% | |
| Inception Mercury | 100% | $0.0004 | 1.8s | 99% | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0008 | 3.2s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0014 | 2.1s | 100% | |
| GPT-5.4 Nano (Reasoning, Low) | 100% | $0.0007 | 2.9s | 99% | |
| GPT-5.4 Nano | 100% | $0.0007 | 3.4s | 99% | |
| DeepSeek V4 Flash | 100% | $0.0002 | 6.8s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 3.3s | 99% | |
| Stealth: Healer Alpha | 100% | $0.0000 | 7.4s | 100% | |
| Gemma 3 12B | 100% | $0.0001 | 7.7s | 100% | |
| Ministral 3 14B | 99% | $0.0002 | 3.7s | 99% | |
| Gemini 3 Flash (Preview) | 100% | $0.0018 | 3.2s | 100% | |
| Grok 4.1 Fast | 100% | $0.0006 | 7.3s | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0004 | 9.0s | 100% | |
| Mistral Large 3 | 100% | $0.0010 | 7.3s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Name replacement accuracy | ||
| 100.0% | No remaining old names | ||
| 100.0% | Non-name text preserved |
Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Mistral Small Creative | 100% | $0.0002 | 3.2s | |
| Gemini 3.1 Flash Lite | 100% | $0.0010 | 5.9s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0010 | 1.9s | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0010 | 2.0s | |
| Grok 4 Fast | 100% | $0.0007 | 5.0s | |
| GPT-5.4 Nano (Reasoning, Low) | 94% | $0.0008 | 3.3s | |
| Gemma 4 26B | 100% | $0.0003 | 12.6s | |
| Stealth: Hunter Alpha | 100% | $0.0000 | 12.6s | |
| Gemini 2.5 Flash | 100% | $0.0017 | 2.4s | |
| Z.AI GLM 4.5 Air | 99% | $0.0016 | 43.4s | |
| Grok 4.1 Fast | 100% | $0.0007 | 10.2s | |
| Mistral Medium 3.1 | 100% | $0.0014 | 6.9s | |
| GPT-4.1 Mini | 100% | $0.0012 | 6.5s | |
| Inception Mercury 2 | 97% | $0.0014 | 2.0s | |
| Qwen 2.5 72B | 89% | $0.0003 | 11.6s | |
| Gemini 3 Flash (Preview) | 100% | $0.0021 | 3.5s | |
| Stealth: Healer Alpha | 99% | $0.0000 | 13.9s | |
| ByteDance Seed 1.6 Flash | 95% | $0.0007 | 12.0s | |
| DeepSeek V4 Flash (Reasoning) | 100% | $0.0004 | 16.6s | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0017 | 7.6s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Qwen3.7 Max | 100% | 100% | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Gemma 4 26B (Reasoning) | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| Grok 4.20 (Reasoning) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| ByteDance Seed 1.6 | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| DeepSeek V4 Pro (Reasoning) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Mistral Small Creative | 100% | $0.0002 | 3.2s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0010 | 1.9s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0010 | 2.0s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0017 | 2.4s | 100% | |
| Grok 4 Fast | 100% | $0.0007 | 5.0s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0010 | 5.9s | 100% | |
| Gemini 3 Flash (Preview) | 100% | $0.0021 | 3.5s | 100% | |
| GPT-4.1 Mini | 100% | $0.0012 | 6.5s | 100% | |
| Mistral Medium 3.1 | 100% | $0.0014 | 6.9s | 100% | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0017 | 7.6s | 100% | |
| Grok 4.1 Fast | 100% | $0.0007 | 10.2s | 100% | |
| Stealth: Hunter Alpha | 100% | $0.0000 | 12.6s | 100% | |
| Claude Haiku 4.5 | 100% | $0.0041 | 3.1s | 100% | |
| Gemma 4 26B | 100% | $0.0003 | 12.6s | 100% | |
| DeepSeek V4 Flash (Reasoning) | 100% | $0.0004 | 16.6s | 100% | |
| Gemini 3.5 Flash (Reasoning, Minimal) | 100% | $0.0063 | 2.7s | 100% | |
| Stealth: Healer Alpha | 99% | $0.0000 | 13.9s | 99% | |
| GPT-4.1 | 100% | $0.0059 | 4.6s | 100% | |
| Gemma 4 31B | 100% | $0.0004 | 19.1s | 100% | |
| Llama 3.1 Nemotron 70B | 100% | $0.0015 | 16.8s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Dialogue content preserved | ||
| 100.0% | Mara pronouns preserved (coreference test) | ||
| 99.2% | Name replacement accuracy | ||
| 100.0% | Non-name text preserved |
Combined: 3rd person past → 1st person present
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 99% | $0.0003 | 1.8s | |
| Grok 4 Fast | 98% | $0.0006 | 5.0s | |
| DeepSeek V4 Flash | 98% | $0.0002 | 8.1s | |
| GPT-5.4 Nano | 99% | $0.0007 | 3.2s | |
| Gemini 3.1 Flash Lite | 98% | $0.0009 | 1.8s | |
| Gemini 3.1 Flash Lite (Reasoning) | 99% | $0.0009 | 1.8s | |
| Stealth: Hunter Alpha | 99% | $0.0000 | 22.8s | |
| Gemini 2.5 Flash | 94% | $0.0015 | 2.1s | |
| Gemma 3 12B | 94% | $0.0001 | 9.3s | |
| GPT-4.1 Nano | 98% | $0.0003 | 3.6s | |
| GPT-5.4 Nano (Reasoning) | 97% | $0.0007 | 3.1s | |
| Stealth: Healer Alpha | 97% | $0.0000 | 11.0s | |
| Gemini 3.1 Flash Lite (Preview) | 98% | $0.0009 | 1.7s | |
| Mistral Small 3.2 24B | 98% | $0.0002 | 4.4s | |
| Gemma 4 26B | 99% | $0.0002 | 11.9s | |
| Mistral Small 4 | 85% | $0.0004 | 4.8s | |
| Gemma 4 31B | 99% | $0.0003 | 26.8s | |
| GPT-4.1 Mini | 99% | $0.0010 | 12.0s | |
| Qwen 2.5 72B | 98% | $0.0003 | 9.5s | |
| Gemma 3 27B | 98% | $0.0002 | 13.1s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Z.AI GLM 5 Turbo | 100% | 99% | 99% | |
| GPT-5.5 | 100% | 99% | 99% | |
| Claude Opus 4.6 (Reasoning) | 99% | 100% | 99% | |
| Qwen3.6 Max Preview | 99% | 100% | 99% | |
| Claude Opus 4.7 (Reasoning) | 99% | 100% | 99% | |
| Claude Opus 4.6 | 99% | 100% | 99% | |
| Gemma 4 31B (Reasoning) | 99% | 100% | 99% | |
| GPT-5.4 Mini (Reasoning) | 99% | 100% | 99% | |
| Claude Sonnet 4 | 99% | 100% | 99% | |
| Gemma 4 31B | 99% | 100% | 99% | |
| Grok 4.20 (Beta, Reasoning) | 99% | 100% | 99% | |
| Grok 4.20 (Reasoning) | 99% | 100% | 99% | |
| Aion 2.0 | 99% | 99% | 99% | |
| Gemini 2.5 Pro | 99% | 99% | 99% | |
| GPT-5.4 | 99% | 99% | 99% | |
| Gemini 3.1 Pro (Preview) | 99% | 100% | 99% | |
| Z.AI GLM 5.1 | 99% | 100% | 99% | |
| Qwen 3.5 397B A17B | 99% | 100% | 99% | |
| Claude Sonnet 4.6 | 99% | 100% | 99% | |
| Claude Opus 4.5 | 99% | 100% | 99% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 99% | $0.0003 | 1.8s | 98% | |
| Gemini 3.1 Flash Lite (Reasoning) | 99% | $0.0009 | 1.8s | 98% | |
| GPT-5.4 Nano | 99% | $0.0007 | 3.2s | 98% | |
| Gemini 3.1 Flash Lite | 98% | $0.0009 | 1.8s | 98% | |
| Gemini 3.1 Flash Lite (Preview) | 98% | $0.0009 | 1.7s | 98% | |
| Mistral Small 3.2 24B | 98% | $0.0002 | 4.4s | 97% | |
| Qwen 3.5 Plus (2026-02-15) | 99% | $0.0014 | 6.7s | 99% | |
| GPT-4.1 Nano | 98% | $0.0003 | 3.6s | 96% | |
| Grok 4 Fast | 98% | $0.0006 | 5.0s | 96% | |
| Gemma 4 26B | 99% | $0.0002 | 11.9s | 98% | |
| Qwen 2.5 72B | 98% | $0.0003 | 9.5s | 98% | |
| Gemini 3 Flash (Preview) | 98% | $0.0018 | 3.1s | 97% | |
| Claude Haiku 4.5 | 99% | $0.0033 | 2.8s | 99% | |
| Hermes 3 70B | 99% | $0.0003 | 13.7s | 98% | |
| Mistral Large 3 | 98% | $0.0010 | 7.1s | 98% | |
| Cydonia 24B V4.1 | 98% | $0.0004 | 11.5s | 98% | |
| GPT-4.1 Mini | 99% | $0.0010 | 12.0s | 98% | |
| Xiaomi MIMO v2.5 | 99% | $0.0021 | 9.1s | 97% | |
| GPT-5.4 Mini (Reasoning, Low) | 98% | $0.0027 | 3.9s | 96% | |
| DeepSeek V4 Pro | 99% | $0.0013 | 16.3s | 98% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Dialogue content preserved | ||
| 96.5% | Name replacement accuracy | ||
| 100.0% | Non-name text preserved |
Passive voice → active voice
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemma 4 31B | 98% | $0.0005 | 39.7s | |
| Grok 4.1 Fast | 96% | $0.0014 | 16.8s | |
| Qwen 3.5 Plus (2026-02-15) | 96% | $0.0021 | 10.9s | |
| DeepSeek V4 Flash | 89% | $0.0002 | 9.5s | |
| Gemma 4 26B | 95% | $0.0003 | 23.2s | |
| Gemini 2.5 Flash | 95% | $0.0021 | 2.9s | |
| Gemini 3.1 Flash Lite | 94% | $0.0013 | 2.5s | |
| Gemini 3.1 Flash Lite (Preview) | 93% | $0.0013 | 2.4s | |
| Gemini 3 Flash (Preview) | 94% | $0.0026 | 4.3s | |
| Grok 4 Fast | 93% | $0.0013 | 11.1s | |
| Gemini 2.5 Flash Lite | 90% | $0.0004 | 2.5s | |
| DeepSeek V3.1 | 89% | $0.0008 | 37.0s | |
| Grok 4.20 (Beta) | 95% | $0.0047 | 2.6s | |
| Gemini 3.1 Flash Lite (Reasoning) | 93% | $0.0013 | 2.5s | |
| DeepSeek V3.2 | 96% | $0.0008 | 52.0s | |
| Gemini 3.5 Flash (Reasoning, Minimal) | 96% | $0.0080 | 3.5s | |
| Claude Haiku 4.5 | 95% | $0.0050 | 5.3s | |
| DeepSeek V4 Flash (Reasoning) | 94% | $0.0009 | 3.2m | |
| Stealth: Hunter Alpha | 87% | $0.0000 | 33.4s | |
| DeepSeek V4 Pro | 93% | $0.0019 | 19.3s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.5 | 97% | 99% | 97% | |
| Claude Opus 4.6 | 98% | 99% | 96% | |
| GPT-5 | 97% | 99% | 96% | |
| Gemma 4 31B | 98% | 98% | 96% | |
| Claude Sonnet 4 | 97% | 99% | 96% | |
| GPT-5.4 (Reasoning) | 97% | 99% | 96% | |
| GPT-5.5 (Reasoning) | 97% | 99% | 96% | |
| Qwen 3.5 27B | 97% | 99% | 96% | |
| Grok 4.20 (Reasoning) | 97% | 98% | 96% | |
| Z.AI GLM 5 Turbo | 96% | 99% | 96% | |
| GPT-5.4 (Reasoning, Low) | 97% | 99% | 95% | |
| Z.AI GLM 5.1 | 97% | 99% | 95% | |
| Gemma 4 31B (Reasoning) | 97% | 98% | 95% | |
| Qwen 3.5 Plus (2026-04-20) | 97% | 98% | 95% | |
| GPT-5.5 (Reasoning, Low) | 97% | 98% | 95% | |
| Gemini 3.5 Flash (Reasoning) | 97% | 98% | 95% | |
| Qwen 3.5 397B A17B | 97% | 98% | 95% | |
| Gemini 3 Pro (Preview) | 96% | 99% | 95% | |
| Gemini 3.1 Pro (Preview) | 96% | 99% | 95% | |
| Z.AI GLM 4.7 | 96% | 99% | 95% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Qwen 3.5 Plus (2026-02-15) | 96% | $0.0021 | 10.9s | 94% | |
| Grok 4.1 Fast | 96% | $0.0014 | 16.8s | 94% | |
| Gemini 3.5 Flash (Reasoning, Minimal) | 96% | $0.0080 | 3.5s | 94% | |
| Gemini 2.5 Flash | 95% | $0.0021 | 2.9s | 92% | |
| Gemma 4 31B | 98% | $0.0005 | 39.7s | 96% | |
| Gemini 3.1 Flash Lite | 94% | $0.0013 | 2.5s | 92% | |
| Gemma 4 26B | 95% | $0.0003 | 23.2s | 94% | |
| GPT-5.4 (Reasoning, Low) | 97% | $0.013 | 8.3s | 95% | |
| Gemini 3 Flash (Preview) | 94% | $0.0026 | 4.3s | 92% | |
| Gemini 3.1 Flash Lite (Preview) | 93% | $0.0013 | 2.4s | 91% | |
| Claude Sonnet 4 | 97% | $0.015 | 9.5s | 96% | |
| Grok 4.20 (Beta) | 95% | $0.0047 | 2.6s | 91% | |
| Gemini 3.1 Flash Lite (Reasoning) | 93% | $0.0013 | 2.5s | 90% | |
| Claude Haiku 4.5 | 95% | $0.0050 | 5.3s | 90% | |
| Claude Sonnet 4.5 | 96% | $0.015 | 7.4s | 93% | |
| Claude Opus 4.5 | 97% | $0.025 | 8.2s | 97% | |
| Grok 4.20 | 93% | $0.0029 | 6.1s | 90% | |
| Claude Opus 4.6 | 98% | $0.025 | 8.7s | 96% | |
| DeepSeek V4 Pro | 93% | $0.0019 | 19.3s | 91% | |
| Mistral Large 3 | 91% | $0.0015 | 10.5s | 90% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 92.9% | Dialogue content preserved | ||
| 100.0% | No hallucinated or fabricated content | ||
| 87.5% | Non-passive narration preserved | ||
| 78.0% | Passive → active voice transformations | ||
| 100.0% | Structural similarity to original |
Avoid said/asked/replied/answered
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.6s | |
| Mistral Small Creative | 98% | $0.0002 | 3.0s | |
| Stealth: Healer Alpha | 98% | $0.0000 | 5.3s | |
| Mistral Small 4 | 100% | $0.0004 | 2.7s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.8s | |
| DeepSeek V4 Flash | 95% | $0.0002 | 8.4s | |
| Inception Mercury 2 | 100% | $0.0008 | 1.1s | |
| Gemma 3 12B | 100% | $0.0001 | 8.8s | |
| Stealth: Hunter Alpha | 100% | $0.0000 | 39.4s | |
| Gemma 4 26B | 100% | $0.0002 | 24.5s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 1.9s | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 2.9s | |
| Grok 4 Fast | 100% | $0.0006 | 5.3s | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0006 | 4.9s | |
| GPT-4o Mini (temp=1) | 100% | $0.0004 | 8.5s | |
| GPT-4o Mini (temp=0) | 100% | $0.0004 | 9.0s | |
| Qwen 2.5 72B | 98% | $0.0003 | 9.8s | |
| Inception Mercury | 92% | $0.0004 | 4.0s | |
| Gemma 3 27B | 100% | $0.0002 | 14.1s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Qwen3.7 Max | 100% | 100% | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Gemini 3.5 Flash (Reasoning) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| Grok 4.3 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 Plus (2026-04-20) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.6s | 100% | |
| Inception Mercury 2 | 100% | $0.0008 | 1.1s | 100% | |
| Mistral Small 4 | 100% | $0.0004 | 2.7s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 1.9s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.8s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 2.9s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0006 | 4.9s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0014 | 2.1s | 100% | |
| Grok 4 Fast | 100% | $0.0006 | 5.3s | 100% | |
| Mistral Medium 3.1 | 100% | $0.0012 | 4.7s | 100% | |
| Gemma 3 12B | 100% | $0.0001 | 8.8s | 100% | |
| GPT-4.1 Mini | 100% | $0.0010 | 6.2s | 100% | |
| GPT-4o Mini (temp=1) | 100% | $0.0004 | 8.5s | 100% | |
| Gemini 3 Flash (Preview) | 100% | $0.0018 | 3.8s | 100% | |
| Grok 4.20 | 100% | $0.0018 | 4.0s | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0004 | 9.0s | 100% | |
| Mistral Large 3 | 100% | $0.0010 | 7.2s | 100% | |
| GPT-5.4 Mini | 100% | $0.0027 | 1.8s | 100% | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0015 | 7.2s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Forbidden words eliminated | ||
| 100.0% | Non-name text preserved | ||
| 100.0% | Structural similarity to original |
Specific Prompt
Character rename: Elena->Mirabel, Gregor->Aldric
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Ministral 3B | 100% | $0.0000 | 2.0s | |
| Mistral NeMO | 100% | $0.0002 | 1.6s | |
| Ministral 3 3B | 100% | $0.0001 | 2.3s | |
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.6s | |
| Ministral 8B | 100% | $0.0001 | 3.4s | |
| Ministral 3 8B | 100% | $0.0002 | 3.2s | |
| Mistral Small Creative | 100% | $0.0002 | 3.0s | |
| Gemma 3 4B | 100% | $0.0001 | 5.2s | |
| Llama 3.1 8B | 100% | $0.0000 | 9.0s | |
| GPT-4.1 Nano | 100% | $0.0003 | 3.6s | |
| Mistral Small 4 | 100% | $0.0004 | 2.9s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.6s | |
| Ministral 3 14B | 95% | $0.0002 | 4.3s | |
| Inception Mercury | 100% | $0.0004 | 4.1s | |
| DeepSeek V4 Flash | 100% | $0.0002 | 6.8s | |
| Llama 3.1 70B | 100% | $0.0004 | 11.0s | |
| Gemma 3 12B | 100% | $0.0001 | 8.4s | |
| DeepSeek V4 Flash (Reasoning) | 100% | $0.0002 | 9.7s | |
| GPT-5.4 Nano (Reasoning, Low) | 100% | $0.0007 | 2.9s | |
| GPT-5.4 Nano | 100% | $0.0007 | 2.9s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Qwen3.7 Max | 100% | 100% | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Gemini 3.5 Flash (Reasoning) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Mistral NeMO | 100% | $0.0002 | 1.6s | 100% | |
| Ministral 3B | 100% | $0.0000 | 2.0s | 100% | |
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.6s | 100% | |
| Ministral 3 3B | 100% | $0.0001 | 2.3s | 100% | |
| Mistral Small Creative | 100% | $0.0002 | 3.0s | 100% | |
| Ministral 3 8B | 100% | $0.0002 | 3.2s | 100% | |
| Ministral 8B | 100% | $0.0001 | 3.4s | 100% | |
| Mistral Small 4 | 100% | $0.0004 | 2.9s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.8s | 100% | |
| GPT-4.1 Nano | 100% | $0.0003 | 3.6s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 1.8s | 100% | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0007 | 2.7s | 100% | |
| GPT-5.4 Nano | 100% | $0.0007 | 2.9s | 100% | |
| GPT-5.4 Nano (Reasoning, Low) | 100% | $0.0007 | 2.9s | 100% | |
| Inception Mercury | 100% | $0.0004 | 4.1s | 100% | |
| Inception Mercury 2 | 100% | $0.0013 | 1.5s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.6s | 100% | |
| Gemma 3 4B | 100% | $0.0001 | 5.2s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 2.9s | 100% | |
| Grok 4 Fast | 100% | $0.0005 | 4.3s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Name replacement accuracy | ||
| 100.0% | No remaining old names | ||
| 100.0% | Non-name text preserved |
Location rename: market square, outer ring, bridge, northern mines
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Ministral 3B | 100% | $0.0000 | 1.9s | |
| Ministral 3 3B | 100% | $0.0001 | 1.9s | |
| Ministral 8B | 100% | $0.0001 | 3.2s | |
| Ministral 3 8B | 100% | $0.0002 | 3.1s | |
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.5s | |
| Gemma 3 4B | 100% | $0.0001 | 5.8s | |
| Mistral Small Creative | 100% | $0.0002 | 3.0s | |
| Llama 3.1 8B | 100% | $0.0001 | 9.7s | |
| Ministral 3 14B | 100% | $0.0002 | 3.8s | |
| Stealth: Healer Alpha | 100% | $0.0000 | 12.4s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.5s | |
| Gemma 3 12B | 100% | $0.0001 | 8.5s | |
| DeepSeek V4 Flash | 100% | $0.0002 | 8.2s | |
| Stealth: Hunter Alpha | 100% | $0.0000 | 17.9s | |
| Mistral Small 4 | 100% | $0.0004 | 2.8s | |
| Gemma 4 26B | 100% | $0.0003 | 15.4s | |
| Rocinante 12B | 99% | $0.0003 | 8.4s | |
| Grok 4 Fast | 100% | $0.0006 | 3.6s | |
| Inception Mercury | 100% | $0.0004 | 5.5s | |
| Qwen 2.5 72B | 100% | $0.0003 | 10.4s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Qwen3.7 Max | 100% | 100% | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Gemini 3.5 Flash (Reasoning) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| Grok 4.3 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Ministral 3B | 100% | $0.0000 | 1.9s | 100% | |
| Ministral 3 3B | 100% | $0.0001 | 1.9s | 100% | |
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.5s | 100% | |
| Ministral 8B | 100% | $0.0001 | 3.2s | 100% | |
| Ministral 3 8B | 100% | $0.0002 | 3.1s | 100% | |
| Mistral Small Creative | 100% | $0.0002 | 3.0s | 100% | |
| Mistral Small 4 | 100% | $0.0004 | 2.8s | 100% | |
| Ministral 3 14B | 100% | $0.0002 | 3.8s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.5s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.6s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 1.7s | 100% | |
| Gemma 3 4B | 100% | $0.0001 | 5.8s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 1.9s | 100% | |
| GPT-5.4 Nano (Reasoning, Low) | 100% | $0.0007 | 2.8s | 100% | |
| Grok 4 Fast | 100% | $0.0006 | 3.6s | 100% | |
| GPT-5.4 Nano | 100% | $0.0007 | 3.0s | 100% | |
| Inception Mercury | 100% | $0.0004 | 5.5s | 100% | |
| Grok 4.1 Fast | 100% | $0.0005 | 5.3s | 100% | |
| Claude 3 Haiku | 100% | $0.0009 | 4.0s | 100% | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0009 | 4.3s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Name replacement accuracy | ||
| 100.0% | No remaining old names | ||
| 100.0% | Non-name text preserved |
Expand all contractions
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Ministral 8B | 100% | $0.0001 | 2.4s | |
| Ministral 3 8B | 100% | $0.0001 | 3.0s | |
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.3s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 3.5s | |
| Mistral Small 4 | 100% | $0.0003 | 2.6s | |
| Gemma 3 12B | 100% | $0.0001 | 5.9s | |
| DeepSeek V4 Flash | 100% | $0.0001 | 7.1s | |
| Mistral NeMO | 98% | $0.0001 | 1.8s | |
| LFM2 24B | 100% | $0.0001 | 8.1s | |
| Qwen 2.5 72B | 100% | $0.0002 | 8.5s | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 7.1s | |
| Gemini 3.1 Flash Lite | 100% | $0.0008 | 1.9s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0008 | 1.6s | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0008 | 1.7s | |
| Llama 3.1 8B | 99% | $0.0000 | 7.8s | |
| Arcee AI: Trinity Mini | 99% | $0.0002 | 4.6s | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 7.6s | |
| Z.AI GLM 4.5 Air | 100% | $0.0012 | 28.2s | |
| GPT-4.1 Nano | 99% | $0.0002 | 3.1s | |
| Qwen3 235B A22B Instruct 2507 | 100% | $0.0003 | 9.9s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Qwen3.7 Max | 100% | 100% | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Gemini 3.5 Flash (Reasoning) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Qwen 3.5 Plus (2026-04-20) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Ministral 8B | 100% | $0.0001 | 2.4s | 100% | |
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.3s | 100% | |
| Ministral 3 8B | 100% | $0.0001 | 3.0s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 3.5s | 100% | |
| Mistral Small 4 | 100% | $0.0003 | 2.6s | 100% | |
| DeepSeek V4 Flash | 100% | $0.0001 | 7.1s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0008 | 1.7s | 100% | |
| LFM2 24B | 100% | $0.0001 | 8.1s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0008 | 1.9s | 100% | |
| Gemma 3 12B | 100% | $0.0001 | 5.9s | 99% | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 7.1s | 100% | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 7.6s | 100% | |
| Qwen 2.5 72B | 100% | $0.0002 | 8.5s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0008 | 1.6s | 99% | |
| Qwen3 235B A22B Instruct 2507 | 100% | $0.0003 | 9.9s | 100% | |
| GPT-4.1 Mini | 100% | $0.0008 | 4.7s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0012 | 1.7s | 100% | |
| Mistral Medium 3.1 | 100% | $0.0010 | 4.3s | 100% | |
| Claude 3 Haiku | 100% | $0.0007 | 3.7s | 99% | |
| Mistral Large 3 | 100% | $0.0009 | 5.9s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Name replacement accuracy | ||
| 100.0% | Non-name text preserved | ||
| 100.0% | Possessive traps preserved |
Tense rewriting: past to present
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Ministral 3B | 99% | $0.0000 | 1.9s | |
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.6s | |
| Mistral Small Creative | 100% | $0.0002 | 2.9s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.6s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | |
| Llama 3.1 8B | 100% | $0.0001 | 10.6s | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 1.8s | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 2.0s | |
| Gemma 4 26B | 100% | $0.0003 | 18.6s | |
| DeepSeek V4 Flash | 100% | $0.0002 | 9.8s | |
| Claude 3 Haiku | 100% | $0.0009 | 4.5s | |
| Gemini 2.5 Flash | 100% | $0.0015 | 2.1s | |
| Gemma 3 12B | 100% | $0.0001 | 8.4s | |
| Mistral Medium 3.1 | 100% | $0.0013 | 5.3s | |
| GPT-4.1 Mini | 100% | $0.0011 | 6.5s | |
| Qwen 2.5 72B | 100% | $0.0003 | 10.7s | |
| Mistral Small 4 | 99% | $0.0004 | 5.1s | |
| Ministral 3 14B | 99% | $0.0002 | 3.9s | |
| Mistral Large 3 | 100% | $0.0011 | 7.8s | |
| Grok 4.20 | 100% | $0.0018 | 4.3s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Qwen3.7 Max | 100% | 100% | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Gemini 3.5 Flash (Reasoning) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Qwen 3.5 Plus (2026-04-20) | 100% | 100% | 100% | |
| Gemma 4 26B (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.6s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.6s | 100% | |
| Mistral Small Creative | 100% | $0.0002 | 2.9s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 1.8s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 2.0s | 100% | |
| Claude 3 Haiku | 100% | $0.0009 | 4.5s | 100% | |
| Gemma 3 12B | 100% | $0.0001 | 8.4s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0015 | 2.1s | 100% | |
| DeepSeek V4 Flash | 100% | $0.0002 | 9.8s | 100% | |
| Ministral 3 14B | 99% | $0.0002 | 3.9s | 99% | |
| Llama 3.1 8B | 100% | $0.0001 | 10.6s | 100% | |
| Mistral Medium 3.1 | 100% | $0.0013 | 5.3s | 100% | |
| GPT-4.1 Mini | 100% | $0.0011 | 6.5s | 100% | |
| Qwen 2.5 72B | 100% | $0.0003 | 10.7s | 100% | |
| Mistral Large 3 | 100% | $0.0011 | 7.8s | 100% | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0015 | 6.4s | 100% | |
| Grok 4.20 | 100% | $0.0018 | 4.3s | 100% | |
| Mistral NeMO | 99% | $0.0002 | 2.7s | 99% | |
| GPT-5.4 Mini | 100% | $0.0027 | 2.3s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Dialogue content preserved | ||
| 100.0% | Name replacement accuracy | ||
| 100.0% | Non-name text preserved |
POV shift: 3rd person to 1st person (Elena's perspective)
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Ministral 3B | 99% | $0.0000 | 2.2s | |
| Mistral NeMO | 100% | $0.0002 | 2.0s | |
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.6s | |
| Ministral 3 3B | 100% | $0.0001 | 2.4s | |
| Ministral 8B | 100% | $0.0001 | 3.5s | |
| Mistral Small Creative | 100% | $0.0002 | 3.2s | |
| Ministral 3 8B | 100% | $0.0002 | 3.0s | |
| Mistral Small 4 | 100% | $0.0004 | 3.3s | |
| Ministral 3 14B | 100% | $0.0002 | 3.8s | |
| GPT-4.1 Nano | 100% | $0.0003 | 3.5s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.8s | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 1.7s | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 3.1s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.8s | |
| Gemma 3 4B | 100% | $0.0001 | 6.4s | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0009 | 3.5s | |
| Inception Mercury | 100% | $0.0005 | 3.9s | |
| Gemini 2.5 Flash | 100% | $0.0015 | 2.1s | |
| DeepSeek V4 Flash | 100% | $0.0002 | 6.5s | |
| Inception Mercury 2 | 100% | $0.0018 | 2.3s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Qwen3.7 Max | 100% | 100% | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Gemini 3.5 Flash (Reasoning) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| Grok 4.3 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.6s | 100% | |
| Mistral NeMO | 100% | $0.0002 | 2.0s | 100% | |
| Ministral 3 8B | 100% | $0.0002 | 3.0s | 100% | |
| Mistral Small Creative | 100% | $0.0002 | 3.2s | 100% | |
| Ministral 8B | 100% | $0.0001 | 3.5s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 1.7s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.8s | 100% | |
| Mistral Small 4 | 100% | $0.0004 | 3.3s | 100% | |
| Ministral 3 14B | 100% | $0.0002 | 3.8s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.8s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 3.1s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0015 | 2.1s | 100% | |
| Ministral 3 3B | 100% | $0.0001 | 2.4s | 99% | |
| Inception Mercury 2 | 100% | $0.0018 | 2.3s | 100% | |
| Gemma 3 4B | 100% | $0.0001 | 6.4s | 100% | |
| DeepSeek V4 Flash | 100% | $0.0002 | 6.5s | 100% | |
| Inception Mercury | 100% | $0.0005 | 3.9s | 99% | |
| Gemini 3 Flash (Preview) | 100% | $0.0018 | 3.1s | 100% | |
| GPT-5.4 Mini | 100% | $0.0027 | 1.9s | 100% | |
| Llama 3.1 8B | 100% | $0.0000 | 8.6s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Name replacement accuracy | ||
| 100.0% | No remaining old names | ||
| 100.0% | Non-name text preserved |
Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Ministral 3B | 99% | $0.0001 | 2.5s | |
| Ministral 3 3B | 100% | $0.0001 | 2.2s | |
| Mistral NeMO | 100% | $0.0002 | 3.6s | |
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.9s | |
| Ministral 3 8B | 100% | $0.0002 | 3.9s | |
| Ministral 8B | 100% | $0.0001 | 3.8s | |
| Mistral Small Creative | 100% | $0.0003 | 3.2s | |
| Ministral 3 14B | 100% | $0.0003 | 4.6s | |
| GPT-4.1 Nano | 95% | $0.0003 | 3.9s | |
| Mistral Small 4 | 100% | $0.0005 | 3.3s | |
| Mistral Small 3.2 24B | 86% | $0.0003 | 4.9s | |
| Gemma 3 4B | 100% | $0.0001 | 6.6s | |
| DeepSeek V4 Flash | 100% | $0.0002 | 9.5s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0011 | 1.9s | |
| Gemini 3.1 Flash Lite | 100% | $0.0011 | 1.9s | |
| GPT-5.4 Nano | 100% | $0.0009 | 3.3s | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0011 | 2.0s | |
| Llama 3.1 8B | 100% | $0.0000 | 13.0s | |
| Qwen 2.5 72B | 100% | $0.0003 | 11.4s | |
| Claude 3 Haiku | 100% | $0.0011 | 5.2s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Qwen3.7 Max | 100% | 100% | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Gemini 3.5 Flash (Reasoning) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| Grok 4.3 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Qwen 3.5 Plus (2026-04-20) | 100% | 100% | 100% | |
| Gemma 4 26B (Reasoning) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Ministral 3 3B | 100% | $0.0001 | 2.2s | 100% | |
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.9s | 100% | |
| Mistral Small Creative | 100% | $0.0003 | 3.2s | 100% | |
| Ministral 8B | 100% | $0.0001 | 3.8s | 100% | |
| Mistral NeMO | 100% | $0.0002 | 3.6s | 100% | |
| Ministral 3 8B | 100% | $0.0002 | 3.9s | 100% | |
| Mistral Small 4 | 100% | $0.0005 | 3.3s | 100% | |
| Ministral 3 14B | 100% | $0.0003 | 4.6s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0011 | 1.9s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0011 | 2.0s | 100% | |
| Gemma 3 4B | 100% | $0.0001 | 6.6s | 100% | |
| GPT-5.4 Nano | 100% | $0.0009 | 3.3s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0017 | 2.4s | 100% | |
| Claude 3 Haiku | 100% | $0.0011 | 5.2s | 100% | |
| DeepSeek V4 Flash | 100% | $0.0002 | 9.5s | 100% | |
| Grok 4 Fast | 100% | $0.0010 | 7.0s | 100% | |
| Qwen 2.5 72B | 100% | $0.0003 | 11.4s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0011 | 1.9s | 98% | |
| Gemini 3 Flash (Preview) | 100% | $0.0022 | 3.6s | 100% | |
| Grok 4.20 | 100% | $0.0024 | 5.0s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Dialogue content preserved | ||
| 100.0% | Mara pronouns preserved (coreference test) | ||
| 100.0% | Name replacement accuracy | ||
| 100.0% | Non-name text preserved |
Combined: 3rd person past → 1st person present
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | ||
|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | |
| Gemini 3.5 Flash (Reasoning) | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | |
| GPT-5.4 (Reasoning) | 100% | |
| GPT-5 Mini | 100% | |
| Claude Opus 4.6 | 100% | |
| Claude Opus 4.5 | 100% | |
| GPT-4.1 | 100% | |
| Claude Opus 4 | 100% | |
| Gemma 4 31B | 100% | |
| GPT-4o, Aug. 6th (temp=0) | 100% | |
| Gemini 2.5 Flash | 100% | |
| Qwen3.7 Max | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | |
| GPT-5 | 100% | |
| Aion 2.0 | 100% | |
| Z.AI GLM 4.6 | 100% | |
| Claude Sonnet 4 | 100% | |
| Z.AI GLM 4.5 | 100% | |
| Gemma 4 26B | 100% | |
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 99% | $0.0003 | 1.6s | |
| Gemini 2.5 Flash | 100% | $0.0015 | 2.0s | |
| Gemma 4 26B | 100% | $0.0002 | 13.3s | |
| Gemini 3.1 Flash Lite (Preview) | 99% | $0.0009 | 1.8s | |
| Gemini 3.1 Flash Lite | 99% | $0.0009 | 1.8s | |
| Gemini 3.1 Flash Lite (Reasoning) | 99% | $0.0009 | 8.0s | |
| Qwen3 235B A22B Instruct 2507 | 99% | $0.0003 | 8.6s | |
| Mistral NeMO | 98% | $0.0002 | 2.6s | |
| Ministral 8B | 97% | $0.0001 | 2.9s | |
| Mistral Small Creative | 99% | $0.0002 | 2.8s | |
| Ministral 3 8B | 99% | $0.0002 | 3.1s | |
| Ministral 3 14B | 99% | $0.0003 | 3.8s | |
| Mistral Small 3.2 24B | 99% | $0.0002 | 4.3s | |
| Mistral Medium 3.1 | 99% | $0.0013 | 7.1s | |
| Mistral Small 4 | 97% | $0.0004 | 2.8s | |
| GPT-4.1 Mini | 99% | $0.0011 | 6.4s | |
| Stealth: Hunter Alpha | 99% | $0.0000 | 13.6s | |
| Llama 3.1 8B | 98% | $0.0001 | 9.4s | |
| Mistral Large 3 | 99% | $0.0011 | 7.8s | |
| DeepSeek V4 Flash | 98% | $0.0002 | 6.5s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.5 Flash (Reasoning) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| Claude Opus 4.5 | 100% | 100% | 100% | |
| GPT-4.1 | 100% | 100% | 100% | |
| Claude Opus 4 | 100% | 100% | 100% | |
| Gemma 4 31B | 100% | 100% | 100% | |
| GPT-4o, Aug. 6th (temp=0) | 100% | 100% | 100% | |
| Gemini 2.5 Flash | 100% | 100% | 100% | |
| Qwen3.7 Max | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Aion 2.0 | 100% | 100% | 100% | |
| Z.AI GLM 4.6 | 100% | 100% | 100% | |
| Claude Sonnet 4 | 100% | 100% | 100% | |
| Z.AI GLM 4.5 | 100% | 100% | 100% | |
| Gemma 4 26B | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash | 100% | $0.0015 | 2.0s | 100% | |
| Gemini 2.5 Flash Lite | 99% | $0.0003 | 1.6s | 99% | |
| Gemini 3.1 Flash Lite | 99% | $0.0009 | 1.8s | 99% | |
| Gemini 3.1 Flash Lite (Preview) | 99% | $0.0009 | 1.8s | 99% | |
| Qwen3 235B A22B Instruct 2507 | 99% | $0.0003 | 8.6s | 99% | |
| Mistral Small Creative | 99% | $0.0002 | 2.8s | 99% | |
| Ministral 3 8B | 99% | $0.0002 | 3.1s | 99% | |
| Gemma 4 26B | 100% | $0.0002 | 13.3s | 100% | |
| Ministral 3 14B | 99% | $0.0003 | 3.8s | 99% | |
| Mistral Small 3.2 24B | 99% | $0.0002 | 4.3s | 99% | |
| Gemini 3.1 Flash Lite (Reasoning) | 99% | $0.0009 | 8.0s | 99% | |
| Mistral Large 3 | 99% | $0.0011 | 7.8s | 99% | |
| Stealth: Hunter Alpha | 99% | $0.0000 | 13.6s | 99% | |
| Mistral Medium 3.1 | 99% | $0.0013 | 7.1s | 99% | |
| Qwen 2.5 72B | 99% | $0.0003 | 10.1s | 99% | |
| DeepSeek-V2 Chat | 99% | $0.0008 | 14.6s | 99% | |
| Cydonia 24B V4.1 | 99% | $0.0005 | 10.4s | 98% | |
| Grok 4.3 | 99% | $0.0022 | 4.8s | 98% | |
| GPT-4.1 | 100% | $0.0055 | 3.8s | 100% | |
| Qwen 3.5 Plus (2026-02-15) | 99% | $0.0015 | 6.6s | 98% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Dialogue content preserved | ||
| 98.2% | Name replacement accuracy | ||
| 100.0% | Non-name text preserved |
Passive voice → active voice
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 3 Flash (Preview) | 99% | $0.0027 | 4.4s | |
| DeepSeek V4 Flash | 98% | $0.0003 | 11.7s | |
| Grok 4 Fast | 98% | $0.0015 | 12.2s | |
| Gemini 3.5 Flash (Reasoning, Minimal) | 99% | $0.0082 | 3.4s | |
| Grok 4.20 | 97% | $0.0029 | 6.9s | |
| Grok 4.20 (Beta) | 98% | $0.0048 | 2.4s | |
| Stealth: Healer Alpha | 98% | $0.0000 | 31.8s | |
| Gemini 2.5 Flash | 97% | $0.0021 | 3.1s | |
| Qwen 3.5 Plus (2026-02-15) | 97% | $0.0022 | 9.5s | |
| Grok 4.1 Fast | 98% | $0.0020 | 28.2s | |
| Gemini 3.1 Flash Lite (Preview) | 95% | $0.0014 | 2.4s | |
| Gemma 4 31B | 99% | $0.0004 | 48.7s | |
| Gemini 2.5 Flash Lite (Reasoning) | 95% | $0.0031 | 25.5s | |
| Mistral Large 3 | 96% | $0.0016 | 10.4s | |
| DeepSeek V4 Pro | 96% | $0.0021 | 25.4s | |
| Gemini 3.1 Flash Lite | 95% | $0.0014 | 2.4s | |
| Claude Haiku 4.5 | 96% | $0.0052 | 6.2s | |
| DeepSeek V3.2 | 99% | $0.0006 | 54.4s | |
| Gemini 3.1 Flash Lite (Reasoning) | 95% | $0.0014 | 13.9s | |
| DeepSeek V3.1 | 90% | $0.0009 | 33.9s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Gemini 3.1 Pro (Preview) | 99% | 100% | 99% | |
| Gemini 3.5 Flash (Reasoning, Minimal) | 99% | 99% | 99% | |
| Claude Opus 4.6 | 98% | 100% | 98% | |
| GPT-5.5 (Reasoning) | 99% | 99% | 98% | |
| Grok 4.20 (Beta, Reasoning) | 99% | 99% | 98% | |
| Grok 4 | 99% | 99% | 98% | |
| Claude Sonnet 4.6 | 98% | 99% | 98% | |
| Claude Opus 4.7 | 99% | 99% | 98% | |
| Gemma 4 31B | 99% | 99% | 98% | |
| Gemini 2.5 Pro | 99% | 98% | 97% | |
| Claude Sonnet 4.5 | 99% | 98% | 97% | |
| DeepSeek V3.2 | 99% | 98% | 97% | |
| Grok 4.20 (Reasoning) | 99% | 99% | 97% | |
| Gemini 3 Flash (Preview) | 99% | 99% | 97% | |
| Z.AI GLM 5 Turbo | 98% | 99% | 97% | |
| GPT-5.5 | 98% | 99% | 97% | |
| Z.AI GLM 5.1 | 99% | 98% | 97% | |
| Gemma 4 31B (Reasoning) | 99% | 98% | 97% | |
| GPT-5.5 (Reasoning, Low) | 99% | 99% | 97% | |
| DeepSeek V4 Flash | 98% | 99% | 97% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 3 Flash (Preview) | 99% | $0.0027 | 4.4s | 97% | |
| Gemini 3.5 Flash (Reasoning, Minimal) | 99% | $0.0082 | 3.4s | 99% | |
| DeepSeek V4 Flash | 98% | $0.0003 | 11.7s | 97% | |
| Grok 4 Fast | 98% | $0.0015 | 12.2s | 97% | |
| Grok 4.20 (Beta) | 98% | $0.0048 | 2.4s | 97% | |
| Grok 4.20 | 97% | $0.0029 | 6.9s | 97% | |
| Gemini 2.5 Flash | 97% | $0.0021 | 3.1s | 95% | |
| Qwen 3.5 Plus (2026-02-15) | 97% | $0.0022 | 9.5s | 95% | |
| Grok 4.1 Fast | 98% | $0.0020 | 28.2s | 97% | |
| Gemma 4 31B | 99% | $0.0004 | 48.7s | 98% | |
| Claude Sonnet 4.5 | 99% | $0.016 | 7.0s | 97% | |
| Mistral Large 3 | 96% | $0.0016 | 10.4s | 95% | |
| Stealth: Healer Alpha | 98% | $0.0000 | 31.8s | 95% | |
| Claude Sonnet 4.6 | 98% | $0.016 | 7.4s | 98% | |
| DeepSeek V3.2 | 99% | $0.0006 | 54.4s | 97% | |
| Gemini 3.1 Flash Lite | 95% | $0.0014 | 2.4s | 93% | |
| Gemini 3.1 Flash Lite (Preview) | 95% | $0.0014 | 2.4s | 92% | |
| DeepSeek V4 Pro | 96% | $0.0021 | 25.4s | 94% | |
| Mistral Large 2 | 96% | $0.0064 | 10.4s | 95% | |
| Claude Sonnet 4 | 98% | $0.016 | 9.2s | 96% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 97.1% | Dialogue content preserved | ||
| 100.0% | No hallucinated or fabricated content | ||
| 94.6% | Non-passive narration preserved | ||
| 87.9% | Passive → active voice transformations | ||
| 100.0% | Structural similarity to original |
Avoid said/asked/replied/answered
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.5s | |
| Gemma 3 4B | 98% | $0.0001 | 5.5s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 6.2s | |
| Mistral Small 4 | 99% | $0.0004 | 3.1s | |
| Gemma 3 12B | 98% | $0.0001 | 7.4s | |
| DeepSeek V4 Flash | 98% | $0.0002 | 9.5s | |
| Inception Mercury | 95% | $0.0004 | 3.4s | |
| Stealth: Hunter Alpha | 95% | $0.0000 | 11.2s | |
| Grok 4 Fast | 100% | $0.0008 | 6.0s | |
| Qwen 2.5 72B | 100% | $0.0003 | 10.4s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | |
| GPT-4o Mini (temp=1) | 100% | $0.0004 | 9.5s | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 1.8s | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 2.6s | |
| GPT-4o Mini (temp=0) | 100% | $0.0004 | 9.7s | |
| Inception Mercury 2 | 100% | $0.0012 | 1.7s | |
| Claude 3 Haiku | 85% | $0.0008 | 4.3s | |
| Gemma 4 26B | 100% | $0.0003 | 14.4s | |
| Gemma 3 27B | 100% | $0.0002 | 17.0s | |
| Stealth: Healer Alpha | 100% | $0.0000 | 16.4s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Qwen3.7 Max | 100% | 100% | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| Grok 4.3 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.5s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 1.8s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0009 | 2.6s | 100% | |
| Inception Mercury 2 | 100% | $0.0012 | 1.7s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 6.2s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0014 | 2.0s | 100% | |
| Grok 4 Fast | 100% | $0.0008 | 6.0s | 100% | |
| Mistral Medium 3.1 | 100% | $0.0012 | 5.1s | 100% | |
| Gemini 3 Flash (Preview) | 100% | $0.0018 | 3.2s | 100% | |
| GPT-4.1 Mini | 100% | $0.0010 | 6.3s | 100% | |
| GPT-4o Mini (temp=1) | 100% | $0.0004 | 9.5s | 100% | |
| Mistral Large 3 | 100% | $0.0011 | 7.2s | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0004 | 9.7s | 100% | |
| Grok 4.20 | 100% | $0.0019 | 4.4s | 100% | |
| Qwen 2.5 72B | 100% | $0.0003 | 10.4s | 100% | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0015 | 6.9s | 100% | |
| Grok 4.20 (Beta) | 100% | $0.0032 | 1.7s | 100% | |
| Claude Haiku 4.5 | 100% | $0.0034 | 2.7s | 100% | |
| Gemma 4 26B | 100% | $0.0003 | 14.4s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Forbidden words eliminated | ||
| 100.0% | Non-name text preserved | ||
| 100.0% | Structural similarity to original |