Bad Writing Habits
Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| GPT-5.4 Mini | 87% | $0.015 | 16.8s | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.015 | 16.8s | |
| GPT-5.4 Mini (Reasoning) | 88% | $0.022 | 28.1s | |
| GPT-5.4 | 90% | $0.049 | 1.4m | |
| GPT-5.4 (Reasoning, Low) | 90% | $0.055 | 1.4m | |
| Writer: Palmyra X5 | 84% | $0.011 | 22.0s | |
| Z.AI GLM 5 Turbo | 84% | $0.0081 | 33.2s | |
| Qwen3 235B A22B Instruct 2507 | 85% | $0.0011 | 59.2s | |
| Grok 4.20 (Beta) | 82% | $0.018 | 15.8s | |
| Mistral Small 4 (Reasoning) | 82% | $0.0022 | 30.2s | |
| Claude Sonnet 4.5 | 84% | $0.035 | 38.1s | |
| Z.AI GLM 5 | 83% | $0.0084 | 1.2m | |
| Mistral Small 4 | 81% | $0.0014 | 18.2s | |
| Rocinante 12B | 82% | $0.0014 | 38.4s | |
| Mistral Medium 3.1 | 81% | $0.0048 | 36.5s | |
| Grok 4.20 (Beta, Reasoning) | 83% | $0.039 | 34.0s | |
| DeepSeek V3 (2025-03-24) | 82% | $0.0014 | 39.4s | |
| Grok 4.1 Fast | 81% | $0.0018 | 37.8s | |
| Qwen 3.5 Flash | 81% | $0.0025 | 47.5s | |
| GPT-5.4 Nano (Reasoning, Low) | 81% | $0.0055 | 20.6s | |
Cost vs Performance
Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.
5 low-scoring outliers hidden: Inception Mercury 2 (67.7%), GPT-5 Nano (67.7%), Inception Mercury (67.5%), Stealth: Aurora Alpha (66.9%), Nemotron 3 Nano (65.7%).
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 | 90% | 94% | 85% | |
| GPT-5.4 (Reasoning, Low) | 90% | 94% | 85% | |
| GPT-5.4 (Reasoning) | 90% | 94% | 85% | |
| GPT-5.4 Mini | 87% | 95% | 83% | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | 95% | 82% | |
| GPT-5.4 Mini (Reasoning) | 88% | 94% | 82% | |
| GPT-5.1 | 86% | 93% | 80% | |
| Qwen 3.5 397B A17B | 85% | 92% | 79% | |
| GPT-5 | 84% | 93% | 78% | |
| Gemini 3.1 Pro (Preview) | 83% | 92% | 77% | |
| Qwen3 235B A22B Instruct 2507 | 85% | 91% | 77% | |
| Z.AI GLM 5 Turbo | 84% | 90% | 76% | |
| Claude Opus 4.6 (Reasoning) | 84% | 91% | 76% | |
| Writer: Palmyra X5 | 84% | 89% | 76% | |
| Qwen 3.5 Flash | 81% | 93% | 76% | |
| Claude Opus 4 | 84% | 90% | 76% | |
| Qwen 3.5 9B | 81% | 93% | 75% | |
| Grok 4.20 (Beta, Reasoning) | 83% | 89% | 75% | |
| Grok 4.1 Fast | 81% | 92% | 75% | |
| Qwen 3.5 35B | 81% | 92% | 75% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini | 87% | $0.015 | 16.8s | 83% | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.015 | 16.8s | 82% | |
| GPT-5.4 Mini (Reasoning) | 88% | $0.022 | 28.1s | 82% | |
| GPT-5.4 | 90% | $0.049 | 1.4m | 85% | |
| GPT-5.4 (Reasoning, Low) | 90% | $0.055 | 1.4m | 85% | |
| Qwen3 235B A22B Instruct 2507 | 85% | $0.0011 | 59.2s | 77% | |
| Writer: Palmyra X5 | 84% | $0.011 | 22.0s | 76% | |
| Z.AI GLM 5 Turbo | 84% | $0.0081 | 33.2s | 76% | |
| Mistral Small 4 (Reasoning) | 82% | $0.0022 | 30.2s | 75% | |
| Mistral Small 4 | 81% | $0.0014 | 18.2s | 74% | |
| Grok 4.20 (Beta) | 82% | $0.018 | 15.8s | 74% | |
| Grok 4.1 Fast | 81% | $0.0018 | 37.8s | 75% | |
| Qwen 3.5 Flash | 81% | $0.0025 | 47.5s | 76% | |
| GPT-5.4 Nano (Reasoning, Low) | 81% | $0.0055 | 20.6s | 74% | |
| Mistral Medium 3.1 | 81% | $0.0048 | 36.5s | 75% | |
| DeepSeek V3 (2025-03-24) | 82% | $0.0014 | 39.4s | 74% | |
| GPT-5.4 Nano | 80% | $0.0057 | 26.3s | 75% | |
| GPT-5.4 Nano (Reasoning) | 80% | $0.0061 | 24.5s | 75% | |
| Claude Sonnet 4.5 | 84% | $0.035 | 38.1s | 75% | |
| GPT-5.4 (Reasoning) | 90% | $0.089 | 2.6m | 85% | |
| genre | Novelcrafter Default Prompt | Detailed Writing Rules | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Total ▼ | Literary fiction: old friends reunite | Thriller: chase through city streets | Romance: separated couple reunites | Fantasy: entering an ancient ruin | Mystery: examining a crime scene | Horror: alone in an eerie place at night | Literary fiction: old friends reunite | Thriller: chase through city streets | Romance: separated couple reunites | Fantasy: entering an ancient ruin | Mystery: examining a crime scene | Horror: alone in an eerie place at night | Literary fiction: old friends reunite | Thriller: chase through city streets | Romance: separated couple reunites | Fantasy: entering an ancient ruin | Mystery: examining a crime scene | Horror: alone in an eerie place at night |
| GPT-5.4 | 90% | 89% | 92% | 92% | 84% | 92% | 88% | 88% | 92% | 89% | 90% | 91% | 90% | 91% | 93% | 92% | 89% | 92% | 90% |
| GPT-5.4 (Reasoning) | 90% | 88% | 91% | 91% | 89% | 92% | 88% | 86% | 93% | 89% | 90% | 93% | 92% | 87% | 91% | 88% | 92% | 91% | 93% |
| GPT-5.4 (Reasoning, Low) | 90% | 89% | 91% | 90% | 86% | 91% | 86% | 86% | 93% | 90% | 90% | 89% | 90% | 88% | 91% | 92% | 89% | 93% | 90% |
| GPT-5.4 Mini (Reasoning) | 88% | 86% | 90% | 89% | 84% | 89% | 85% | 84% | 90% | 87% | 85% | 89% | 86% | 87% | 90% | 87% | 87% | 90% | 90% |
| GPT-5.4 Mini | 87% | 88% | 89% | 88% | 83% | 89% | 86% | 87% | 87% | 90% | 86% | 87% | 85% | 88% | 88% | 89% | 86% | 89% | 86% |
| GPT-5.4 Mini (Reasoning, Low) | 87% | 87% | 87% | 90% | 83% | 87% | 86% | 84% | 88% | 86% | 84% | 87% | 86% | 86% | 89% | 88% | 87% | 88% | 88% |
| GPT-5.1 | 86% | 84% | 83% | 83% | 80% | 85% | 83% | 84% | 88% | 87% | 87% | 90% | 88% | 85% | 90% | 86% | 86% | 90% | 89% |
| Qwen 3.5 397B A17B | 85% | 78% | 87% | 83% | 80% | 84% | 86% | 85% | 86% | 85% | 86% | 85% | 86% | 87% | 91% | 87% | 84% | 88% | 87% |
| Qwen3 235B A22B Instruct 2507 | 85% | 83% | 86% | 83% | 78% | 81% | 81% | 86% | 87% | 83% | 80% | 85% | 85% | 89% | 91% | 87% | 84% | 87% | 87% |
| GPT-5 | 84% | 81% | 82% | 80% | 79% | 85% | 83% | 84% | 85% | 85% | 82% | 88% | 86% | 86% | 86% | 84% | 85% | 87% | 89% |
| Writer: Palmyra X5 | 84% | 83% | 86% | 80% | 75% | 81% | 81% | 85% | 88% | 83% | 78% | 83% | 84% | 90% | 90% | 90% | 84% | 89% | 85% |
| Claude Sonnet 4.5 | 84% | 81% | 85% | 80% | 77% | 85% | 79% | 84% | 85% | 83% | 74% | 83% | 86% | 86% | 90% | 88% | 87% | 90% | 89% |
| Z.AI GLM 5 Turbo | 84% | 81% | 85% | 84% | 77% | 80% | 83% | 83% | 87% | 84% | 77% | 86% | 84% | 84% | 90% | 85% | 85% | 88% | 88% |
| Claude Opus 4.6 (Reasoning) | 84% | 78% | 81% | 78% | 78% | 81% | 83% | 81% | 82% | 85% | 84% | 86% | 82% | 85% | 90% | 89% | 85% | 92% | 88% |
| Claude Opus 4 | 84% | 82% | 81% | 85% | 74% | 83% | 79% | 83% | 85% | 86% | 77% | 82% | 83% | 92% | 86% | 88% | 84% | 88% | 87% |
Detailed Writing Rules
Literary fiction: old friends reunite
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Grok 4.20 (Beta) | 88% | $0.017 | 14.1s | |
| Writer: Palmyra X5 | 90% | $0.014 | 23.9s | |
| Qwen3 235B A22B Instruct 2507 | 89% | $0.0013 | 1.2m | |
| Mistral Medium 3.1 | 87% | $0.0052 | 38.3s | |
| Mistral Small 4 (Reasoning) | 85% | $0.0025 | 29.1s | |
| Mistral Large | 86% | $0.018 | 33.2s | |
| DeepSeek V3 (2025-03-24) | 85% | $0.0018 | 38.5s | |
| Qwen 3 32B | 86% | $0.0019 | 1.7m | |
| GPT-5.4 Mini | 88% | $0.015 | 16.1s | |
| Z.AI GLM 5 | 88% | $0.011 | 1.1m | |
| Hermes 3 405B | 85% | $0.0054 | 35.9s | |
| MiniMax M2.7 | 86% | $0.0040 | 1.3m | |
| ByteDance Seed 1.6 Flash | 84% | $0.0014 | 27.7s | |
| Mistral Small 4 | 84% | $0.0022 | 26.2s | |
| GPT-5.4 Nano (Reasoning, Low) | 84% | $0.0044 | 18.2s | |
| Grok 4.1 Fast | 85% | $0.0026 | 47.0s | |
| GPT-5.4 Nano | 84% | $0.0051 | 18.4s | |
| GPT-5.4 Mini (Reasoning) | 87% | $0.030 | 32.6s | |
| GPT-5.4 Mini (Reasoning, Low) | 86% | $0.015 | 15.7s | |
| Mistral Large 2 | 86% | $0.018 | 32.8s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4 | 92% | 96% | 89% | |
| Qwen3 235B A22B Instruct 2507 | 89% | 96% | 86% | |
| Writer: Palmyra X5 | 90% | 96% | 86% | |
| GPT-5.4 | 91% | 94% | 86% | |
| GPT-5.4 (Reasoning, Low) | 88% | 97% | 85% | |
| GPT-5.4 Mini (Reasoning) | 87% | 97% | 85% | |
| Z.AI GLM 5 | 88% | 97% | 85% | |
| GPT-5.4 (Reasoning) | 87% | 96% | 84% | |
| Qwen 3.5 397B A17B | 87% | 95% | 84% | |
| Claude Sonnet 4.6 (Reasoning) | 85% | 97% | 83% | |
| GPT-5.4 Mini | 88% | 96% | 83% | |
| o4 Mini | 84% | 99% | 83% | |
| MiniMax M2.7 | 86% | 96% | 83% | |
| GPT-5.2 | 84% | 98% | 82% | |
| WizardLM 2 8x22b | 84% | 97% | 82% | |
| MiniMax M2.5 | 85% | 95% | 82% | |
| Grok 4.1 Fast | 85% | 96% | 82% | |
| Mistral Large 2 | 86% | 96% | 82% | |
| GPT-5 | 86% | 96% | 82% | |
| Qwen 3 32B | 86% | 94% | 82% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Writer: Palmyra X5 | 90% | $0.014 | 23.9s | 86% | |
| Qwen3 235B A22B Instruct 2507 | 89% | $0.0013 | 1.2m | 86% | |
| GPT-5.4 Mini | 88% | $0.015 | 16.1s | 83% | |
| Grok 4.20 (Beta) | 88% | $0.017 | 14.1s | 82% | |
| Z.AI GLM 5 | 88% | $0.011 | 1.1m | 85% | |
| GPT-5.4 | 91% | $0.049 | 1.4m | 86% | |
| GPT-5.4 Mini (Reasoning) | 87% | $0.030 | 32.6s | 85% | |
| Mistral Medium 3.1 | 87% | $0.0052 | 38.3s | 81% | |
| Mistral Small 4 (Reasoning) | 85% | $0.0025 | 29.1s | 82% | |
| GPT-5.4 Mini (Reasoning, Low) | 86% | $0.015 | 15.7s | 81% | |
| Mistral Large 2 | 86% | $0.018 | 32.8s | 82% | |
| Grok 4.1 Fast | 85% | $0.0026 | 47.0s | 82% | |
| GPT-5.4 Nano | 84% | $0.0051 | 18.4s | 81% | |
| DeepSeek V3 (2025-03-24) | 85% | $0.0018 | 38.5s | 81% | |
| o4 Mini | 84% | $0.014 | 25.2s | 83% | |
| MiniMax M2.7 | 86% | $0.0040 | 1.3m | 83% | |
| Mistral Large | 86% | $0.018 | 33.2s | 81% | |
| GPT-5.4 Nano (Reasoning, Low) | 84% | $0.0044 | 18.2s | 80% | |
| Claude Sonnet 4 | 88% | $0.043 | 51.6s | 81% | |
| GPT-5.4 Nano (Reasoning) | 84% | $0.0059 | 25.8s | 81% | |
Thriller: chase through city streets
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Z.AI GLM 5 Turbo | 90% | $0.0078 | 26.6s | |
| Writer: Palmyra X5 | 90% | $0.011 | 18.7s | |
| Qwen3 235B A22B Instruct 2507 | 91% | $0.0014 | 59.9s | |
| Z.AI GLM 5 | 91% | $0.0075 | 44.3s | |
| GPT-5.4 Mini (Reasoning) | 90% | $0.023 | 28.1s | |
| GPT-5.4 Mini (Reasoning, Low) | 89% | $0.014 | 16.8s | |
| GPT-5.4 (Reasoning, Low) | 91% | $0.050 | 1.2m | |
| Rocinante 12B | 87% | $0.0015 | 36.4s | |
| GPT-5.4 | 93% | $0.046 | 1.4m | |
| GPT-5.4 Mini | 88% | $0.015 | 16.7s | |
| Claude Sonnet 4.6 | 90% | $0.036 | 40.3s | |
| Claude Sonnet 4.5 | 90% | $0.041 | 37.3s | |
| DeepSeek V3 (2025-03-24) | 84% | $0.0016 | 14.5s | |
| ByteDance Seed 1.6 Flash | 84% | $0.0012 | 24.5s | |
| Claude Sonnet 4 | 86% | $0.038 | 44.7s | |
| MiniMax M2.5 | 86% | $0.0034 | 1.6m | |
| Qwen 3.5 397B A17B | 91% | $0.0049 | 3.3m | |
| Hermes 3 70B | 82% | $0.0015 | 21.9s | |
| Hermes 3 405B | 84% | $0.0054 | 37.5s | |
| Claude Sonnet 4.6 (Reasoning) | 90% | $0.065 | 1.1m | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 | 93% | 99% | 91% | |
| Qwen3 235B A22B Instruct 2507 | 91% | 98% | 90% | |
| GPT-5.4 (Reasoning, Low) | 91% | 97% | 89% | |
| GPT-5.4 (Reasoning) | 91% | 98% | 89% | |
| Qwen 3.5 397B A17B | 91% | 96% | 87% | |
| Writer: Palmyra X5 | 90% | 96% | 87% | |
| GPT-5.4 Mini | 88% | 99% | 87% | |
| GPT-5.4 Mini (Reasoning) | 90% | 97% | 87% | |
| Claude Sonnet 4.6 (Reasoning) | 90% | 96% | 86% | |
| Z.AI GLM 5 Turbo | 90% | 93% | 86% | |
| GPT-5.1 | 90% | 96% | 86% | |
| Z.AI GLM 5 | 91% | 95% | 86% | |
| GPT-5.4 Mini (Reasoning, Low) | 89% | 97% | 86% | |
| Claude Sonnet 4.5 | 90% | 95% | 85% | |
| Claude Opus 4.6 (Reasoning) | 90% | 95% | 85% | |
| Claude Opus 4.5 | 87% | 96% | 85% | |
| Claude Opus 4.6 | 89% | 95% | 84% | |
| GPT-5 | 86% | 97% | 83% | |
| MiniMax M2.5 | 86% | 95% | 83% | |
| Claude Sonnet 4.6 | 90% | 92% | 82% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Qwen3 235B A22B Instruct 2507 | 91% | $0.0014 | 59.9s | 90% | |
| GPT-5.4 | 93% | $0.046 | 1.4m | 91% | |
| Writer: Palmyra X5 | 90% | $0.011 | 18.7s | 87% | |
| Z.AI GLM 5 Turbo | 90% | $0.0078 | 26.6s | 86% | |
| Z.AI GLM 5 | 91% | $0.0075 | 44.3s | 86% | |
| GPT-5.4 Mini (Reasoning) | 90% | $0.023 | 28.1s | 87% | |
| GPT-5.4 (Reasoning, Low) | 91% | $0.050 | 1.2m | 89% | |
| GPT-5.4 Mini | 88% | $0.015 | 16.7s | 87% | |
| GPT-5.4 Mini (Reasoning, Low) | 89% | $0.014 | 16.8s | 86% | |
| Qwen 3.5 397B A17B | 91% | $0.0049 | 3.3m | 87% | |
| Claude Sonnet 4.5 | 90% | $0.041 | 37.3s | 85% | |
| Claude Sonnet 4.6 (Reasoning) | 90% | $0.065 | 1.1m | 86% | |
| Claude Sonnet 4.6 | 90% | $0.036 | 40.3s | 82% | |
| Rocinante 12B | 87% | $0.0015 | 36.4s | 80% | |
| Mistral Medium 3.1 | 86% | $0.0059 | 40.6s | 82% | |
| GPT-5.1 | 90% | $0.052 | 2.2m | 86% | |
| MiniMax M2.5 | 86% | $0.0034 | 1.6m | 83% | |
| MiniMax M2.7 | 87% | $0.0035 | 1.1m | 80% | |
| Claude Opus 4.6 (Reasoning) | 90% | $0.091 | 1.2m | 85% | |
| Claude Opus 4.5 | 87% | $0.069 | 43.9s | 85% | |
Romance: separated couple reunites
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Writer: Palmyra X5 | 90% | $0.013 | 22.8s | |
| GPT-5.4 (Reasoning, Low) | 92% | $0.056 | 1.3m | |
| GPT-5.4 Mini | 89% | $0.014 | 15.7s | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | $0.014 | 16.1s | |
| GPT-5.4 | 92% | $0.051 | 1.3m | |
| Grok 4.20 (Beta) | 87% | $0.019 | 15.1s | |
| Claude Sonnet 4 | 88% | $0.045 | 54.2s | |
| Hermes 3 405B | 84% | $0.0054 | 49.2s | |
| Qwen 3.5 Flash | 85% | $0.0024 | 35.9s | |
| Qwen3 235B A22B Instruct 2507 | 87% | $0.0011 | 49.4s | |
| MiniMax M2.5 | 85% | $0.0043 | 1.8m | |
| Z.AI GLM 5 | 88% | $0.012 | 1.8m | |
| Claude Sonnet 4.5 | 88% | $0.045 | 41.1s | |
| GPT-5.4 Nano (Reasoning) | 83% | $0.0055 | 23.1s | |
| Grok 4.1 Fast | 86% | $0.0021 | 39.6s | |
| GPT-5.4 Mini (Reasoning) | 87% | $0.022 | 26.8s | |
| Z.AI GLM 5 Turbo | 85% | $0.013 | 40.7s | |
| Stealth: Hunter Alpha | 84% | $0.0000 | 48.4s | |
| Hermes 3 70B | 83% | $0.0015 | 38.9s | |
| Mistral Small 4 (Reasoning) | 83% | $0.0027 | 32.6s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 (Reasoning, Low) | 92% | 97% | 89% | |
| GPT-5.4 | 92% | 95% | 88% | |
| GPT-5.4 Mini | 89% | 97% | 86% | |
| Writer: Palmyra X5 | 90% | 95% | 85% | |
| Qwen 3.5 397B A17B | 87% | 98% | 85% | |
| Claude Opus 4.6 (Reasoning) | 89% | 95% | 85% | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | 96% | 85% | |
| GPT-5.4 Mini (Reasoning) | 87% | 98% | 85% | |
| Grok 4.20 (Beta, Reasoning) | 85% | 99% | 84% | |
| Grok 4.20 (Beta) | 87% | 96% | 84% | |
| GPT-5.4 (Reasoning) | 88% | 95% | 83% | |
| Claude Opus 4.6 | 87% | 96% | 83% | |
| Z.AI GLM 5 | 88% | 93% | 83% | |
| Qwen3 235B A22B Instruct 2507 | 87% | 96% | 83% | |
| MoonshotAI: Kimi K2.5 | 86% | 93% | 82% | |
| Claude Sonnet 4.5 | 88% | 93% | 82% | |
| Claude Sonnet 4.6 (Reasoning) | 87% | 93% | 82% | |
| GPT-5.1 | 86% | 95% | 82% | |
| GPT-5.4 Nano (Reasoning) | 83% | 96% | 81% | |
| Stealth: Hunter Alpha | 84% | 96% | 81% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Writer: Palmyra X5 | 90% | $0.013 | 22.8s | 85% | |
| GPT-5.4 Mini | 89% | $0.014 | 15.7s | 86% | |
| GPT-5.4 (Reasoning, Low) | 92% | $0.056 | 1.3m | 89% | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | $0.014 | 16.1s | 85% | |
| GPT-5.4 | 92% | $0.051 | 1.3m | 88% | |
| Grok 4.20 (Beta) | 87% | $0.019 | 15.1s | 84% | |
| GPT-5.4 Mini (Reasoning) | 87% | $0.022 | 26.8s | 85% | |
| Qwen3 235B A22B Instruct 2507 | 87% | $0.0011 | 49.4s | 83% | |
| Claude Sonnet 4.5 | 88% | $0.045 | 41.1s | 82% | |
| Grok 4.20 (Beta, Reasoning) | 85% | $0.034 | 25.6s | 84% | |
| Grok 4.1 Fast | 86% | $0.0021 | 39.6s | 79% | |
| Z.AI GLM 5 | 88% | $0.012 | 1.8m | 83% | |
| Qwen 3.5 Flash | 85% | $0.0024 | 35.9s | 80% | |
| GPT-5.4 Nano (Reasoning) | 83% | $0.0055 | 23.1s | 81% | |
| Claude Opus 4.6 (Reasoning) | 89% | $0.098 | 1.3m | 85% | |
| GPT-5.4 Nano | 83% | $0.0050 | 19.6s | 81% | |
| Stealth: Hunter Alpha | 84% | $0.0000 | 48.4s | 81% | |
| Claude Sonnet 4 | 88% | $0.045 | 54.2s | 79% | |
| Mistral Medium 3.1 | 84% | $0.0058 | 42.8s | 81% | |
| Mistral Small 4 (Reasoning) | 83% | $0.0027 | 32.6s | 81% | |
Fantasy: entering an ancient ruin
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning) | 87% | $0.018 | 22.3s | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.014 | 16.2s | |
| GPT-5.4 Mini | 86% | $0.016 | 17.1s | |
| GPT-5.4 | 89% | $0.039 | 1.2m | |
| Z.AI GLM 5 Turbo | 85% | $0.0090 | 25.9s | |
| GPT-5.4 (Reasoning, Low) | 89% | $0.056 | 1.3m | |
| Claude Sonnet 4.6 | 87% | $0.040 | 37.2s | |
| Writer: Palmyra X5 | 84% | $0.013 | 22.1s | |
| Claude Sonnet 4.5 | 87% | $0.045 | 39.5s | |
| Qwen 3.5 35B | 81% | $0.043 | 2.3m | |
| Z.AI GLM 5 | 83% | $0.0095 | 1.5m | |
| DeepSeek V3 (2024-12-26) | 78% | $0.0029 | 35.6s | |
| MiniMax M2.7 | 82% | $0.0032 | 59.3s | |
| Qwen 3.5 Flash | 81% | $0.0033 | 45.4s | |
| Qwen 3.5 122B | 80% | $0.016 | 39.5s | |
| Qwen3 235B A22B Instruct 2507 | 84% | $0.0017 | 1.1m | |
| GPT-5.4 (Reasoning) | 92% | $0.081 | 2.3m | |
| Qwen 3.5 397B A17B | 84% | $0.0048 | 3.3m | |
| GPT-4.1 | 79% | $0.021 | 40.8s | |
| DeepSeek-V2 Chat | 78% | $0.0029 | 42.0s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 (Reasoning) | 92% | 98% | 90% | |
| GPT-5.4 | 89% | 97% | 88% | |
| GPT-5.1 | 86% | 98% | 85% | |
| GPT-5.4 (Reasoning, Low) | 89% | 92% | 84% | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | 96% | 84% | |
| Claude Sonnet 4.6 | 87% | 96% | 84% | |
| Claude Sonnet 4.5 | 87% | 95% | 83% | |
| Claude Opus 4.6 | 86% | 96% | 82% | |
| Claude Sonnet 4.6 (Reasoning) | 87% | 95% | 82% | |
| Claude Opus 4.6 (Reasoning) | 85% | 97% | 82% | |
| GPT-5.4 Mini | 86% | 93% | 81% | |
| GPT-5.4 Mini (Reasoning) | 87% | 92% | 81% | |
| Qwen 3.5 397B A17B | 84% | 94% | 81% | |
| Z.AI GLM 5 Turbo | 85% | 93% | 80% | |
| GPT-5 | 85% | 95% | 80% | |
| Gemini 3.1 Pro (Preview) | 83% | 93% | 78% | |
| Qwen 3.5 35B | 81% | 96% | 78% | |
| Claude Opus 4.5 | 83% | 91% | 77% | |
| Qwen 3.5 27B | 79% | 97% | 77% | |
| Qwen 3.5 122B | 80% | 95% | 77% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.014 | 16.2s | 84% | |
| GPT-5.4 | 89% | $0.039 | 1.2m | 88% | |
| GPT-5.4 Mini (Reasoning) | 87% | $0.018 | 22.3s | 81% | |
| GPT-5.4 (Reasoning) | 92% | $0.081 | 2.3m | 90% | |
| GPT-5.4 Mini | 86% | $0.016 | 17.1s | 81% | |
| Claude Sonnet 4.6 | 87% | $0.040 | 37.2s | 84% | |
| Claude Sonnet 4.5 | 87% | $0.045 | 39.5s | 83% | |
| Z.AI GLM 5 Turbo | 85% | $0.0090 | 25.9s | 80% | |
| GPT-5.4 (Reasoning, Low) | 89% | $0.056 | 1.3m | 84% |