Text Replacement
Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 97% | $0.0003 | 1.7s | |
| Gemini 3.1 Flash Lite (Preview) | 99% | $0.0010 | 1.8s | |
| Mistral Small 4 | 96% | $0.0004 | 3.3s | |
| Mistral Small 3.2 24B | 97% | $0.0002 | 5.0s | |
| Gemini 2.5 Flash | 99% | $0.0015 | 2.2s | |
| Grok 4 Fast | 99% | $0.0008 | 6.5s | |
| Gemini 3 Flash (Preview) | 99% | $0.0019 | 3.4s | |
| GPT-4.1 Mini | 98% | $0.0011 | 7.0s | |
| Inception Mercury 2 | 95% | $0.0017 | 2.3s | |
| Mistral Large 3 | 98% | $0.0011 | 7.7s | |
| Gemma 3 12B | 95% | $0.0001 | 9.0s | |
| Qwen 3.5 Plus (2026-02-15) | 99% | $0.0015 | 7.2s | |
| Qwen 2.5 72B | 98% | $0.0003 | 10.9s | |
| GPT-4o Mini (temp=1) | 95% | $0.0004 | 9.5s | |
| Grok 4.20 (Beta) | 98% | $0.0034 | 1.8s | |
| Claude Haiku 4.5 | 99% | $0.0036 | 3.2s | |
| Stealth: Hunter Alpha | 98% | $0.0000 | 19.5s | |
| Mistral Medium 3.1 | 97% | $0.0013 | 5.9s | |
| Grok 4.1 Fast | 99% | $0.0010 | 12.4s | |
| Mistral Small Creative | 96% | $0.0002 | 3.1s | |
Cost vs Performance
Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.
12 low-scoring outliers hidden: Ministral 3 8B (87.0%), Ministral 8B (86.7%), Mistral NeMO (86.6%), Arcee AI: Trinity Mini (85.7%), Nemotron 3 Nano (83.3%), Ministral 3 3B (81.2%), Ministral 3B (80.9%), Cohere Command R+ (Aug. 2024) (73.7%), LFM2 24B (71.7%), Hermes 3 70B (69.5%), Rocinante 12B (66.3%), Claude 3 Haiku (61.1%).
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 | 100% | 99% | 99% | |
| Claude Opus 4.5 | 100% | 98% | 98% | |
| Claude Sonnet 4 | 100% | 98% | 98% | |
| Gemini 3 Pro (Preview) | 100% | 98% | 98% | |
| Claude Opus 4.6 (Reasoning) | 100% | 98% | 98% | |
| Claude Sonnet 4.5 | 100% | 98% | 98% | |
| Grok 4 | 100% | 98% | 98% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 98% | 98% | |
| Qwen 3.5 27B | 99% | 98% | 98% | |
| Z.AI GLM 5 | 99% | 98% | 98% | |
| Gemini 2.5 Pro | 99% | 97% | 97% | |
| Qwen 3.5 Plus (2026-02-15) | 99% | 97% | 97% | |
| Gemini 3 Flash (Preview, Reasoning) | 99% | 97% | 97% | |
| Z.AI GLM 4.7 | 99% | 97% | 97% | |
| Claude Sonnet 4.6 | 99% | 97% | 97% | |
| Grok 4.20 (Beta, Reasoning) | 99% | 97% | 97% | |
| Claude Haiku 4.5 | 99% | 97% | 97% | |
| Gemini 3 Flash (Preview) | 99% | 97% | 97% | |
| GPT-5 | 99% | 96% | 96% | |
| GPT-5.1 | 99% | 96% | 96% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 3.1 Flash Lite (Preview) | 99% | $0.0010 | 1.8s | 96% | |
| Gemini 3 Flash (Preview) | 99% | $0.0019 | 3.4s | 97% | |
| Qwen 3.5 Plus (2026-02-15) | 99% | $0.0015 | 7.2s | 97% | |
| Claude Haiku 4.5 | 99% | $0.0036 | 3.2s | 97% | |
| Grok 4 Fast | 99% | $0.0008 | 6.5s | 95% | |
| Gemini 2.5 Flash | 99% | $0.0015 | 2.2s | 92% | |
| GPT-4.1 Mini | 98% | $0.0011 | 7.0s | 94% | |
| Stealth: Healer Alpha | 99% | $0.0000 | 14.3s | 94% | |
| Grok 4.1 Fast | 99% | $0.0010 | 12.4s | 92% | |
| Mistral Medium 3.1 | 97% | $0.0013 | 5.9s | 91% | |
| Gemini 2.5 Flash Lite | 97% | $0.0003 | 1.7s | 86% | |
| Claude Sonnet 4.5 | 100% | $0.011 | 4.9s | 98% | |
| Claude Sonnet 4 | 100% | $0.011 | 6.1s | 98% | |
| GPT-4.1 | 98% | $0.0054 | 4.4s | 92% | |
| Grok 4.20 (Beta) | 98% | $0.0034 | 1.8s | 89% | |
| Qwen 2.5 72B | 98% | $0.0003 | 10.9s | 89% | |
| Mistral Large 3 | 98% | $0.0011 | 7.7s | 88% | |
| Claude Sonnet 4.6 | 99% | $0.011 | 4.7s | 97% | |
| Mistral Large | 98% | $0.0044 | 7.6s | 90% | |
| Stealth: Hunter Alpha | 98% | $0.0000 | 19.5s | 89% | |
| Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | Specific Prompt | Generic Prompt | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Total ▼ | Character rename: Elena->Mirabel, Gregor->Aldric | Character rename: Elena->Mirabel, Gregor->Aldric | Location rename: market square, outer ring, bridge, northern mines | Location rename: market square, outer ring, bridge, northern mines | Expand all contractions | Expand all contractions | Tense rewriting: past to present | Tense rewriting: past to present | POV shift: 3rd person to 1st person (Elena's perspective) | POV shift: 3rd person to 1st person (Elena's perspective) | Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged | Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged | Combined: 3rd person past → 1st person present | Combined: 3rd person past → 1st person present | Passive voice → active voice | Passive voice → active voice | Avoid said/asked/replied/answered | Avoid said/asked/replied/answered |
| Claude Sonnet 4 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 98% | 97% | 100% | 100% |
| Claude Opus 4.6 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 99% | 100% | 100% | 100% | 99% | 98% | 98% | 100% | 100% |
| Claude Sonnet 4.5 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 99% | 99% | 96% | 100% | 100% |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 100% | 100% | 99% | 99% | 96% | 100% | 100% |
| Grok 4 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 99% | 96% | 100% | 100% |
| Gemini 3 Pro (Preview) | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 100% | 100% | 100% | 99% | 99% | 98% | 96% | 100% | 100% |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 99% | 100% | 100% | 100% | 100% | 100% | 99% | 99% | 96% | 100% | 100% |
| Claude Opus 4.5 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 99% | 100% | 100% | 100% | 99% | 97% | 97% | 100% | 100% |
| Z.AI GLM 5 | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 98% | 100% | 100% | 100% | 100% | 100% | 99% | 98% | 96% | 100% | 100% |
| Gemini 2.5 Pro | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 96% | 100% | 100% | 100% | 100% | 100% | 99% | 99% | 95% | 100% | 100% |
| GPT-5 | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 96% | 100% | 100% | 100% | 100% | 100% | 98% | 98% | 97% | 100% | 100% |
| Qwen 3.5 27B | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 97% | 100% | 100% | 100% | 99% | 100% | 99% | 98% | 97% | 100% | 100% |
| Z.AI GLM 4.7 | 99% | 100% | 100% | 100% | 99% | 100% | 100% | 100% | 97% | 100% | 100% | 100% | 100% | 100% | 99% | 98% | 96% | 100% | 100% |
| Qwen 3.5 Plus (2026-02-15) | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 99% | 100% | 100% | 99% | 100% | 99% | 99% | 97% | 96% | 100% | 100% |
| Grok 4.20 (Beta, Reasoning) | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 94% | 100% | 100% | 100% | 100% | 99% | 99% | 99% | 96% | 100% | 100% |
Generic Prompt
Character rename: Elena->Mirabel, Gregor->Aldric
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Inception Mercury | 100% | $0.0004 | 815ms | |
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.6s | |
| Inception Mercury 2 | 100% | $0.0007 | 971ms | |
| Ministral 8B | 100% | $0.0001 | 3.2s | |
| Ministral 3 8B | 100% | $0.0002 | 2.9s | |
| Mistral Small Creative | 100% | $0.0002 | 2.9s | |
| Mistral Small 4 | 100% | $0.0004 | 2.8s | |
| Stealth: Healer Alpha | 100% | $0.0000 | 6.7s | |
| GPT-4.1 Nano | 100% | $0.0003 | 3.7s | |
| Ministral 3 14B | 100% | $0.0002 | 4.0s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.6s | |
| Grok 4 Fast | 100% | $0.0005 | 3.3s | |
| GPT-5.4 Nano (Reasoning, Low) | 100% | $0.0007 | 2.8s | |
| GPT-5.4 Nano | 100% | $0.0007 | 2.7s | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0007 | 3.2s | |
| Gemma 3 4B | 100% | $0.0001 | 6.1s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 5.7s | |
| Mistral NeMO | 93% | $0.0002 | 2.3s | |
| Llama 3.1 8B | 100% | $0.0000 | 9.9s | |
| Gemini 2.5 Flash | 100% | $0.0014 | 2.1s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| ByteDance Seed 1.6 | 100% | 100% | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Inception Mercury | 100% | $0.0004 | 815ms | 100% | |
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.6s | 100% | |
| Inception Mercury 2 | 100% | $0.0007 | 971ms | 100% | |
| Ministral 3 8B | 100% | $0.0002 | 2.9s | 100% | |
| Mistral Small Creative | 100% | $0.0002 | 2.9s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.6s | 100% | |
| Ministral 8B | 100% | $0.0001 | 3.2s | 100% | |
| Mistral Small 4 | 100% | $0.0004 | 2.8s | 100% | |
| GPT-5.4 Nano | 100% | $0.0007 | 2.7s | 100% | |
| GPT-5.4 Nano (Reasoning, Low) | 100% | $0.0007 | 2.8s | 100% | |
| GPT-4.1 Nano | 100% | $0.0003 | 3.7s | 100% | |
| Grok 4 Fast | 100% | $0.0005 | 3.3s | 100% | |
| Ministral 3 14B | 100% | $0.0002 | 4.0s | 100% | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0007 | 3.2s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0014 | 2.1s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0007 | 4.7s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 5.7s | 100% | |
| Gemma 3 4B | 100% | $0.0001 | 6.1s | 100% | |
| Gemini 3 Flash (Preview) | 100% | $0.0018 | 3.2s | 100% | |
| GPT-5.4 Mini | 100% | $0.0027 | 1.9s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Name replacement accuracy | ||
| 100.0% | No remaining old names | ||
| 100.0% | Non-name text preserved |
Location rename: market square, outer ring, bridge, northern mines
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.5s | |
| GPT-4.1 Nano | 98% | $0.0003 | 3.5s | |
| Mistral Small 4 | 99% | $0.0004 | 2.9s | |
| Stealth: Healer Alpha | 100% | $0.0000 | 6.9s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | |
| Inception Mercury | 100% | $0.0004 | 4.0s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.6s | |
| Grok 4 Fast | 100% | $0.0005 | 3.8s | |
| Inception Mercury 2 | 100% | $0.0010 | 1.4s | |
| Llama 3.1 70B | 100% | $0.0005 | 17.8s | |
| ByteDance Seed 1.6 Flash | 99% | $0.0003 | 5.3s | |
| Grok 4.1 Fast | 100% | $0.0006 | 5.2s | |
| Gemini 2.5 Flash | 100% | $0.0014 | 2.1s | |
| Gemma 3 12B | 100% | $0.0001 | 8.6s | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0009 | 7.3s | |
| GPT-4.1 Mini | 100% | $0.0010 | 5.8s | |
| Mistral Large 3 | 100% | $0.0010 | 7.3s | |
| Mistral Small Creative | 96% | $0.0002 | 2.9s | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0015 | 6.8s | |
| Stealth: Hunter Alpha | 100% | $0.0000 | 16.3s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| ByteDance Seed 1.6 | 100% | 100% | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| o4 Mini High | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.5s | 100% | |
| Inception Mercury 2 | 100% | $0.0010 | 1.4s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0014 | 2.1s | 100% | |
| Grok 4 Fast | 100% | $0.0005 | 3.8s | 100% | |
| Inception Mercury | 100% | $0.0004 | 4.0s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 4.6s | 100% | |
| Grok 4.1 Fast | 100% | $0.0006 | 5.2s | 100% | |
| Stealth: Healer Alpha | 100% | $0.0000 | 6.9s | 100% | |
| GPT-5.4 Mini | 100% | $0.0027 | 2.2s | 100% | |
| GPT-4.1 Mini | 100% | $0.0010 | 5.8s | 100% | |
| Mistral Small 4 | 99% | $0.0004 | 2.9s | 96% | |
| Gemma 3 12B | 100% | $0.0001 | 8.6s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0009 | 7.3s | 100% | |
| Claude Haiku 4.5 | 100% | $0.0034 | 2.8s | 100% | |
| ByteDance Seed 1.6 Flash | 99% | $0.0003 | 5.3s | 97% | |
| Mistral Large 3 | 100% | $0.0010 | 7.3s | 100% | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0015 | 6.8s | 100% | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0030 | 4.2s | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | $0.0034 | 3.7s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Name replacement accuracy | ||
| 100.0% | No remaining old names | ||
| 100.0% | Non-name text preserved |
Expand all contractions
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | ||
|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | |
| Z.AI GLM 5 Turbo | 100% | |
| GPT-5.1 | 100% | |
| Claude Opus 4.6 | 100% | |
| GPT-5 | 100% | |
| Qwen 3.5 397B A17B | 100% | |
| Qwen 3.5 122B | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | |
| Z.AI GLM 5 | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | |
| Qwen 3.5 27B | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | |
| GPT-5.2 | 100% | |
| Claude Opus 4.5 | 100% | |
| Z.AI GLM 4.6 | 100% | |
| Gemini 3 Pro (Preview) | 100% | |
| Claude Sonnet 4 | 100% | |
| Z.AI GLM 4.7 | 100% | |
| Gemini 2.5 Pro | 100% | |
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.3s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 3.7s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0007 | 1.5s | |
| Mistral Small 4 | 99% | $0.0003 | 2.4s | |
| Stealth: Healer Alpha | 99% | $0.0000 | 7.8s | |
| Mistral NeMO | 98% | $0.0001 | 2.6s | |
| Grok 4 Fast | 100% | $0.0007 | 4.9s | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 7.0s | |
| Gemini 2.5 Flash | 100% | $0.0012 | 1.8s | |
| Ministral 8B | 95% | $0.0001 | 2.4s | |
| Inception Mercury | 98% | $0.0003 | 4.9s | |
| GPT-5.4 Nano (Reasoning, Low) | 98% | $0.0006 | 2.4s | |
| Ministral 3 14B | 98% | $0.0002 | 3.2s | |
| Stealth: Hunter Alpha | 100% | $0.0000 | 15.8s | |
| Llama 3.1 8B | 98% | $0.0000 | 6.1s | |
| GPT-5.4 Nano (Reasoning) | 98% | $0.0006 | 2.5s | |
| Inception Mercury 2 | 97% | $0.0012 | 1.6s | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 8.4s | |
| ByteDance Seed 1.6 Flash | 100% | $0.0004 | 8.4s | |
| Qwen 2.5 72B | 100% | $0.0002 | 7.7s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| GPT-5.2 | 100% | 100% | 100% | |
| Claude Opus 4.5 | 100% | 100% | 100% | |
| Z.AI GLM 4.6 | 100% | 100% | 100% | |
| Gemini 3 Pro (Preview) | 100% | 100% | 100% | |
| Claude Sonnet 4 | 100% | 100% | 100% | |
| Z.AI GLM 4.7 | 100% | 100% | 100% | |
| Gemini 2.5 Pro | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.3s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0007 | 1.5s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 3.7s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0012 | 1.8s | 100% | |
| Mistral Small 4 | 99% | $0.0003 | 2.4s | 99% | |
| Gemini 3 Flash (Preview) | 100% | $0.0015 | 2.7s | 100% | |
| Grok 4 Fast | 100% | $0.0007 | 4.9s | 99% | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 7.0s | 100% | |
| Mistral Large 3 | 100% | $0.0008 | 5.8s | 100% | |
| Mistral NeMO | 98% | $0.0001 | 2.6s | 97% | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0012 | 5.5s | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 8.4s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 99% | $0.0008 | 4.6s | 99% | |
| GPT-5.4 Nano (Reasoning) | 98% | $0.0006 | 2.5s | 97% | |
| Ministral 3 14B | 98% | $0.0002 | 3.2s | 97% | |
| GPT-5.4 Nano (Reasoning, Low) | 98% | $0.0006 | 2.4s | 97% | |
| ByteDance Seed 1.6 Flash | 100% | $0.0004 | 8.4s | 99% | |
| Claude Haiku 4.5 | 100% | $0.0027 | 2.2s | 100% | |
| GPT-4.1 Mini | 99% | $0.0008 | 5.1s | 99% | |
| Mistral Small Creative | 97% | $0.0002 | 2.3s | 97% | |