Bad Writing Habits
Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| GPT-5.4 Mini | 87% | $0.015 | 16.8s | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.015 | 16.8s | |
| GPT-5.4 Mini (Reasoning) | 88% | $0.022 | 28.1s | |
| GPT-5.4 | 90% | $0.049 | 1.4m | |
| GPT-5.4 (Reasoning, Low) | 90% | $0.055 | 1.4m | |
| Writer: Palmyra X5 | 84% | $0.011 | 22.0s | |
| Z.AI GLM 5 Turbo | 84% | $0.0081 | 33.2s | |
| Qwen3 235B A22B Instruct 2507 | 85% | $0.0011 | 59.2s | |
| Grok 4.20 (Beta) | 82% | $0.018 | 15.8s | |
| Mistral Small 4 (Reasoning) | 82% | $0.0022 | 30.2s | |
| Claude Sonnet 4.5 | 84% | $0.035 | 38.1s | |
| Z.AI GLM 5 | 83% | $0.0084 | 1.2m | |
| Mistral Small 4 | 81% | $0.0014 | 18.2s | |
| Rocinante 12B | 82% | $0.0014 | 38.4s | |
| Mistral Medium 3.1 | 81% | $0.0048 | 36.5s | |
| Grok 4.20 (Beta, Reasoning) | 83% | $0.039 | 34.0s | |
| DeepSeek V3 (2025-03-24) | 82% | $0.0014 | 39.4s | |
| Grok 4.1 Fast | 81% | $0.0018 | 37.8s | |
| Qwen 3.5 Flash | 81% | $0.0025 | 47.5s | |
| GPT-5.4 Nano (Reasoning, Low) | 81% | $0.0055 | 20.6s | |
Cost vs Performance
Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.
5 low-scoring outliers hidden: Inception Mercury 2 (67.7%), GPT-5 Nano (67.7%), Inception Mercury (67.5%), Stealth: Aurora Alpha (66.9%), Nemotron 3 Nano (65.7%).
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 | 90% | 94% | 85% | |
| GPT-5.4 (Reasoning, Low) | 90% | 94% | 85% | |
| GPT-5.4 (Reasoning) | 90% | 94% | 85% | |
| GPT-5.4 Mini | 87% | 95% | 83% | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | 95% | 82% | |
| GPT-5.4 Mini (Reasoning) | 88% | 94% | 82% | |
| GPT-5.1 | 86% | 93% | 80% | |
| Qwen 3.5 397B A17B | 85% | 92% | 79% | |
| GPT-5 | 84% | 93% | 78% | |
| Gemini 3.1 Pro (Preview) | 83% | 92% | 77% | |
| Qwen3 235B A22B Instruct 2507 | 85% | 91% | 77% | |
| Z.AI GLM 5 Turbo | 84% | 90% | 76% | |
| Claude Opus 4.6 (Reasoning) | 84% | 91% | 76% | |
| Writer: Palmyra X5 | 84% | 89% | 76% | |
| Qwen 3.5 Flash | 81% | 93% | 76% | |
| Claude Opus 4 | 84% | 90% | 76% | |
| Qwen 3.5 9B | 81% | 93% | 75% | |
| Grok 4.20 (Beta, Reasoning) | 83% | 89% | 75% | |
| Grok 4.1 Fast | 81% | 92% | 75% | |
| Qwen 3.5 35B | 81% | 92% | 75% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini | 87% | $0.015 | 16.8s | 83% | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.015 | 16.8s | 82% | |
| GPT-5.4 Mini (Reasoning) | 88% | $0.022 | 28.1s | 82% | |
| GPT-5.4 | 90% | $0.049 | 1.4m | 85% | |
| GPT-5.4 (Reasoning, Low) | 90% | $0.055 | 1.4m | 85% | |
| Qwen3 235B A22B Instruct 2507 | 85% | $0.0011 | 59.2s | 77% | |
| Writer: Palmyra X5 | 84% | $0.011 | 22.0s | 76% | |
| Z.AI GLM 5 Turbo | 84% | $0.0081 | 33.2s | 76% | |
| Mistral Small 4 (Reasoning) | 82% | $0.0022 | 30.2s | 75% | |
| Mistral Small 4 | 81% | $0.0014 | 18.2s | 74% | |
| Grok 4.20 (Beta) | 82% | $0.018 | 15.8s | 74% | |
| Grok 4.1 Fast | 81% | $0.0018 | 37.8s | 75% | |
| Qwen 3.5 Flash | 81% | $0.0025 | 47.5s | 76% | |
| GPT-5.4 Nano (Reasoning, Low) | 81% | $0.0055 | 20.6s | 74% | |
| Mistral Medium 3.1 | 81% | $0.0048 | 36.5s | 75% | |
| DeepSeek V3 (2025-03-24) | 82% | $0.0014 | 39.4s | 74% | |
| GPT-5.4 Nano | 80% | $0.0057 | 26.3s | 75% | |
| GPT-5.4 Nano (Reasoning) | 80% | $0.0061 | 24.5s | 75% | |
| Claude Sonnet 4.5 | 84% | $0.035 | 38.1s | 75% | |
| GPT-5.4 (Reasoning) | 90% | $0.089 | 2.6m | 85% | |
| genre | Novelcrafter Default Prompt | Detailed Writing Rules | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Total ▼ | Literary fiction: old friends reunite | Thriller: chase through city streets | Romance: separated couple reunites | Fantasy: entering an ancient ruin | Mystery: examining a crime scene | Horror: alone in an eerie place at night | Literary fiction: old friends reunite | Thriller: chase through city streets | Romance: separated couple reunites | Fantasy: entering an ancient ruin | Mystery: examining a crime scene | Horror: alone in an eerie place at night | Literary fiction: old friends reunite | Thriller: chase through city streets | Romance: separated couple reunites | Fantasy: entering an ancient ruin | Mystery: examining a crime scene | Horror: alone in an eerie place at night |
| GPT-5.4 | 90% | 89% | 92% | 92% | 84% | 92% | 88% | 88% | 92% | 89% | 90% | 91% | 90% | 91% | 93% | 92% | 89% | 92% | 90% |
| GPT-5.4 (Reasoning) | 90% | 88% | 91% | 91% | 89% | 92% | 88% | 86% | 93% | 89% | 90% | 93% | 92% | 87% | 91% | 88% | 92% | 91% | 93% |
| GPT-5.4 (Reasoning, Low) | 90% | 89% | 91% | 90% | 86% | 91% | 86% | 86% | 93% | 90% | 90% | 89% | 90% | 88% | 91% | 92% | 89% | 93% | 90% |
| GPT-5.4 Mini (Reasoning) | 88% | 86% | 90% | 89% | 84% | 89% | 85% | 84% | 90% | 87% | 85% | 89% | 86% | 87% | 90% | 87% | 87% | 90% | 90% |
| GPT-5.4 Mini | 87% | 88% | 89% | 88% | 83% | 89% | 86% | 87% | 87% | 90% | 86% | 87% | 85% | 88% | 88% | 89% | 86% | 89% | 86% |
| GPT-5.4 Mini (Reasoning, Low) | 87% | 87% | 87% | 90% | 83% | 87% | 86% | 84% | 88% | 86% | 84% | 87% | 86% | 86% | 89% | 88% | 87% | 88% | 88% |
| GPT-5.1 | 86% | 84% | 83% | 83% | 80% | 85% | 83% | 84% | 88% | 87% | 87% | 90% | 88% | 85% | 90% | 86% | 86% | 90% | 89% |
| Qwen 3.5 397B A17B | 85% | 78% | 87% | 83% | 80% | 84% | 86% | 85% | 86% | 85% | 86% | 85% | 86% | 87% | 91% | 87% | 84% | 88% | 87% |
| Qwen3 235B A22B Instruct 2507 | 85% | 83% | 86% | 83% | 78% | 81% | 81% | 86% | 87% | 83% | 80% | 85% | 85% | 89% | 91% | 87% | 84% | 87% | 87% |
| GPT-5 | 84% | 81% | 82% | 80% | 79% | 85% | 83% | 84% | 85% | 85% | 82% | 88% | 86% | 86% | 86% | 84% | 85% | 87% | 89% |
| Writer: Palmyra X5 | 84% | 83% | 86% | 80% | 75% | 81% | 81% | 85% | 88% | 83% | 78% | 83% | 84% | 90% | 90% | 90% | 84% | 89% | 85% |
| Claude Sonnet 4.5 | 84% | 81% | 85% | 80% | 77% | 85% | 79% | 84% | 85% | 83% | 74% | 83% | 86% | 86% | 90% | 88% | 87% | 90% | 89% |
| Z.AI GLM 5 Turbo | 84% | 81% | 85% | 84% | 77% | 80% | 83% | 83% | 87% | 84% | 77% | 86% | 84% | 84% | 90% | 85% | 85% | 88% | 88% |
| Claude Opus 4.6 (Reasoning) | 84% | 78% | 81% | 78% | 78% | 81% | 83% | 81% | 82% | 85% | 84% | 86% | 82% | 85% | 90% | 89% | 85% | 92% | 88% |
| Claude Opus 4 | 84% | 82% | 81% | 85% | 74% | 83% | 79% | 83% | 85% | 86% | 77% | 82% | 83% | 92% | 86% | 88% | 84% | 88% | 87% |
Detailed Writing Rules
Literary fiction: old friends reunite
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Grok 4.20 (Beta) | 88% | $0.017 | 14.1s | |
| Writer: Palmyra X5 | 90% | $0.014 | 23.9s | |
| Qwen3 235B A22B Instruct 2507 | 89% | $0.0013 | 1.2m | |
| Mistral Medium 3.1 | 87% | $0.0052 | 38.3s | |
| Mistral Small 4 (Reasoning) | 85% | $0.0025 | 29.1s | |
| Mistral Large | 86% | $0.018 | 33.2s | |
| DeepSeek V3 (2025-03-24) | 85% | $0.0018 | 38.5s | |
| Qwen 3 32B | 86% | $0.0019 | 1.7m | |
| GPT-5.4 Mini | 88% | $0.015 | 16.1s | |
| Z.AI GLM 5 | 88% | $0.011 | 1.1m | |
| Hermes 3 405B | 85% | $0.0054 | 35.9s | |
| MiniMax M2.7 | 86% | $0.0040 | 1.3m | |
| ByteDance Seed 1.6 Flash | 84% | $0.0014 | 27.7s | |
| Mistral Small 4 | 84% | $0.0022 | 26.2s | |
| GPT-5.4 Nano (Reasoning, Low) | 84% | $0.0044 | 18.2s | |
| Grok 4.1 Fast | 85% | $0.0026 | 47.0s | |
| GPT-5.4 Nano | 84% | $0.0051 | 18.4s | |
| GPT-5.4 Mini (Reasoning) | 87% | $0.030 | 32.6s | |
| GPT-5.4 Mini (Reasoning, Low) | 86% | $0.015 | 15.7s | |
| Mistral Large 2 | 86% | $0.018 | 32.8s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4 | 92% | 96% | 89% | |
| Qwen3 235B A22B Instruct 2507 | 89% | 96% | 86% | |
| Writer: Palmyra X5 | 90% | 96% | 86% | |
| GPT-5.4 | 91% | 94% | 86% | |
| GPT-5.4 (Reasoning, Low) | 88% | 97% | 85% | |
| GPT-5.4 Mini (Reasoning) | 87% | 97% | 85% | |
| Z.AI GLM 5 | 88% | 97% | 85% | |
| GPT-5.4 (Reasoning) | 87% | 96% | 84% | |
| Qwen 3.5 397B A17B | 87% | 95% | 84% | |
| Claude Sonnet 4.6 (Reasoning) | 85% | 97% | 83% | |
| GPT-5.4 Mini | 88% | 96% | 83% | |
| o4 Mini | 84% | 99% | 83% | |
| MiniMax M2.7 | 86% | 96% | 83% | |
| GPT-5.2 | 84% | 98% | 82% | |
| WizardLM 2 8x22b | 84% | 97% | 82% | |
| MiniMax M2.5 | 85% | 95% | 82% | |
| Grok 4.1 Fast | 85% | 96% | 82% | |
| Mistral Large 2 | 86% | 96% | 82% | |
| GPT-5 | 86% | 96% | 82% | |
| Qwen 3 32B | 86% | 94% | 82% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Writer: Palmyra X5 | 90% | $0.014 | 23.9s | 86% | |
| Qwen3 235B A22B Instruct 2507 | 89% | $0.0013 | 1.2m | 86% | |
| GPT-5.4 Mini | 88% | $0.015 | 16.1s | 83% | |
| Grok 4.20 (Beta) | 88% | $0.017 | 14.1s | 82% | |
| Z.AI GLM 5 | 88% | $0.011 | 1.1m | 85% | |
| GPT-5.4 | 91% | $0.049 | 1.4m | 86% | |
| GPT-5.4 Mini (Reasoning) | 87% | $0.030 | 32.6s | 85% | |
| Mistral Medium 3.1 | 87% | $0.0052 | 38.3s | 81% | |
| Mistral Small 4 (Reasoning) | 85% | $0.0025 | 29.1s | 82% | |
| GPT-5.4 Mini (Reasoning, Low) | 86% | $0.015 | 15.7s | 81% | |
| Mistral Large 2 | 86% | $0.018 | 32.8s | 82% | |
| Grok 4.1 Fast | 85% | $0.0026 | 47.0s | 82% | |
| GPT-5.4 Nano | 84% | $0.0051 | 18.4s | 81% | |
| DeepSeek V3 (2025-03-24) | 85% | $0.0018 | 38.5s | 81% | |
| o4 Mini | 84% | $0.014 | 25.2s | 83% | |
| MiniMax M2.7 | 86% | $0.0040 | 1.3m | 83% | |
| Mistral Large | 86% | $0.018 | 33.2s | 81% | |
| GPT-5.4 Nano (Reasoning, Low) | 84% | $0.0044 | 18.2s | 80% | |
| Claude Sonnet 4 | 88% | $0.043 | 51.6s | 81% | |
| GPT-5.4 Nano (Reasoning) | 84% | $0.0059 | 25.8s | 81% | |
Thriller: chase through city streets
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Z.AI GLM 5 Turbo | 90% | $0.0078 | 26.6s | |
| Writer: Palmyra X5 | 90% | $0.011 | 18.7s | |
| Qwen3 235B A22B Instruct 2507 | 91% | $0.0014 | 59.9s | |
| Z.AI GLM 5 | 91% | $0.0075 | 44.3s | |
| GPT-5.4 Mini (Reasoning) | 90% | $0.023 | 28.1s | |
| GPT-5.4 Mini (Reasoning, Low) | 89% | $0.014 | 16.8s | |
| GPT-5.4 (Reasoning, Low) | 91% | $0.050 | 1.2m | |
| Rocinante 12B | 87% | $0.0015 | 36.4s | |
| GPT-5.4 | 93% | $0.046 | 1.4m | |
| GPT-5.4 Mini | 88% | $0.015 | 16.7s | |
| Claude Sonnet 4.6 | 90% | $0.036 | 40.3s | |
| Claude Sonnet 4.5 | 90% | $0.041 | 37.3s | |
| DeepSeek V3 (2025-03-24) | 84% | $0.0016 | 14.5s | |
| ByteDance Seed 1.6 Flash | 84% | $0.0012 | 24.5s | |
| Claude Sonnet 4 | 86% | $0.038 | 44.7s | |
| MiniMax M2.5 | 86% | $0.0034 | 1.6m | |
| Qwen 3.5 397B A17B | 91% | $0.0049 | 3.3m | |
| Hermes 3 70B | 82% | $0.0015 | 21.9s | |
| Hermes 3 405B | 84% | $0.0054 | 37.5s | |
| Claude Sonnet 4.6 (Reasoning) | 90% | $0.065 | 1.1m | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 | 93% | 99% | 91% | |
| Qwen3 235B A22B Instruct 2507 | 91% | 98% | 90% | |
| GPT-5.4 (Reasoning, Low) | 91% | 97% | 89% | |
| GPT-5.4 (Reasoning) | 91% | 98% | 89% | |
| Qwen 3.5 397B A17B | 91% | 96% | 87% | |
| Writer: Palmyra X5 | 90% | 96% | 87% | |
| GPT-5.4 Mini | 88% | 99% | 87% | |
| GPT-5.4 Mini (Reasoning) | 90% | 97% | 87% | |
| Claude Sonnet 4.6 (Reasoning) | 90% | 96% | 86% | |
| Z.AI GLM 5 Turbo | 90% | 93% | 86% | |
| GPT-5.1 | 90% | 96% | 86% | |
| Z.AI GLM 5 | 91% | 95% | 86% | |
| GPT-5.4 Mini (Reasoning, Low) | 89% | 97% | 86% | |
| Claude Sonnet 4.5 | 90% | 95% | 85% | |
| Claude Opus 4.6 (Reasoning) | 90% | 95% | 85% | |
| Claude Opus 4.5 | 87% | 96% | 85% | |
| Claude Opus 4.6 | 89% | 95% | 84% | |
| GPT-5 | 86% | 97% | 83% | |
| MiniMax M2.5 | 86% | 95% | 83% | |
| Claude Sonnet 4.6 | 90% | 92% | 82% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Qwen3 235B A22B Instruct 2507 | 91% | $0.0014 | 59.9s | 90% | |
| GPT-5.4 | 93% | $0.046 | 1.4m | 91% | |
| Writer: Palmyra X5 | 90% | $0.011 | 18.7s | 87% | |
| Z.AI GLM 5 Turbo | 90% | $0.0078 | 26.6s | 86% | |
| Z.AI GLM 5 | 91% | $0.0075 | 44.3s | 86% | |
| GPT-5.4 Mini (Reasoning) | 90% | $0.023 | 28.1s | 87% | |
| GPT-5.4 (Reasoning, Low) | 91% | $0.050 | 1.2m | 89% | |
| GPT-5.4 Mini | 88% | $0.015 | 16.7s | 87% | |
| GPT-5.4 Mini (Reasoning, Low) | 89% | $0.014 | 16.8s | 86% | |
| Qwen 3.5 397B A17B | 91% | $0.0049 | 3.3m | 87% | |
| Claude Sonnet 4.5 | 90% | $0.041 | 37.3s | 85% | |
| Claude Sonnet 4.6 (Reasoning) | 90% | $0.065 | 1.1m | 86% | |
| Claude Sonnet 4.6 | 90% | $0.036 | 40.3s | 82% | |
| Rocinante 12B | 87% | $0.0015 | 36.4s | 80% | |
| Mistral Medium 3.1 | 86% | $0.0059 | 40.6s | 82% | |
| GPT-5.1 | 90% | $0.052 | 2.2m | 86% | |
| MiniMax M2.5 | 86% | $0.0034 | 1.6m | 83% | |
| MiniMax M2.7 | 87% | $0.0035 | 1.1m | 80% | |
| Claude Opus 4.6 (Reasoning) | 90% | $0.091 | 1.2m | 85% | |
| Claude Opus 4.5 | 87% | $0.069 | 43.9s | 85% | |
Romance: separated couple reunites
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Writer: Palmyra X5 | 90% | $0.013 | 22.8s | |
| GPT-5.4 (Reasoning, Low) | 92% | $0.056 | 1.3m | |
| GPT-5.4 Mini | 89% | $0.014 | 15.7s | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | $0.014 | 16.1s | |
| GPT-5.4 | 92% | $0.051 | 1.3m | |
| Grok 4.20 (Beta) | 87% | $0.019 | 15.1s | |
| Claude Sonnet 4 | 88% | $0.045 | 54.2s | |
| Hermes 3 405B | 84% | $0.0054 | 49.2s | |
| Qwen 3.5 Flash | 85% | $0.0024 | 35.9s | |
| Qwen3 235B A22B Instruct 2507 | 87% | $0.0011 | 49.4s | |
| MiniMax M2.5 | 85% | $0.0043 | 1.8m | |
| Z.AI GLM 5 | 88% | $0.012 | 1.8m | |
| Claude Sonnet 4.5 | 88% | $0.045 | 41.1s | |
| GPT-5.4 Nano (Reasoning) | 83% | $0.0055 | 23.1s | |
| Grok 4.1 Fast | 86% | $0.0021 | 39.6s | |
| GPT-5.4 Mini (Reasoning) | 87% | $0.022 | 26.8s | |
| Z.AI GLM 5 Turbo | 85% | $0.013 | 40.7s | |
| Stealth: Hunter Alpha | 84% | $0.0000 | 48.4s | |
| Hermes 3 70B | 83% | $0.0015 | 38.9s | |
| Mistral Small 4 (Reasoning) | 83% | $0.0027 | 32.6s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 (Reasoning, Low) | 92% | 97% | 89% | |
| GPT-5.4 | 92% | 95% | 88% | |
| GPT-5.4 Mini | 89% | 97% | 86% | |
| Writer: Palmyra X5 | 90% | 95% | 85% | |
| Qwen 3.5 397B A17B | 87% | 98% | 85% | |
| Claude Opus 4.6 (Reasoning) | 89% | 95% | 85% | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | 96% | 85% | |
| GPT-5.4 Mini (Reasoning) | 87% | 98% | 85% | |
| Grok 4.20 (Beta, Reasoning) | 85% | 99% | 84% | |
| Grok 4.20 (Beta) | 87% | 96% | 84% | |
| GPT-5.4 (Reasoning) | 88% | 95% | 83% | |
| Claude Opus 4.6 | 87% | 96% | 83% | |
| Z.AI GLM 5 | 88% | 93% | 83% | |
| Qwen3 235B A22B Instruct 2507 | 87% | 96% | 83% | |
| MoonshotAI: Kimi K2.5 | 86% | 93% | 82% | |
| Claude Sonnet 4.5 | 88% | 93% | 82% | |
| Claude Sonnet 4.6 (Reasoning) | 87% | 93% | 82% | |
| GPT-5.1 | 86% | 95% | 82% | |
| GPT-5.4 Nano (Reasoning) | 83% | 96% | 81% | |
| Stealth: Hunter Alpha | 84% | 96% | 81% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Writer: Palmyra X5 | 90% | $0.013 | 22.8s | 85% | |
| GPT-5.4 Mini | 89% | $0.014 | 15.7s | 86% | |
| GPT-5.4 (Reasoning, Low) | 92% | $0.056 | 1.3m | 89% | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | $0.014 | 16.1s | 85% | |
| GPT-5.4 | 92% | $0.051 | 1.3m | 88% | |
| Grok 4.20 (Beta) | 87% | $0.019 | 15.1s | 84% | |
| GPT-5.4 Mini (Reasoning) | 87% | $0.022 | 26.8s | 85% | |
| Qwen3 235B A22B Instruct 2507 | 87% | $0.0011 | 49.4s | 83% | |
| Claude Sonnet 4.5 | 88% | $0.045 | 41.1s | 82% | |
| Grok 4.20 (Beta, Reasoning) | 85% | $0.034 | 25.6s | 84% | |
| Grok 4.1 Fast | 86% | $0.0021 | 39.6s | 79% | |
| Z.AI GLM 5 | 88% | $0.012 | 1.8m | 83% | |
| Qwen 3.5 Flash | 85% | $0.0024 | 35.9s | 80% | |
| GPT-5.4 Nano (Reasoning) | 83% | $0.0055 | 23.1s | 81% | |
| Claude Opus 4.6 (Reasoning) | 89% | $0.098 | 1.3m | 85% | |
| GPT-5.4 Nano | 83% | $0.0050 | 19.6s | 81% | |
| Stealth: Hunter Alpha | 84% | $0.0000 | 48.4s | 81% | |
| Claude Sonnet 4 | 88% | $0.045 | 54.2s | 79% | |
| Mistral Medium 3.1 | 84% | $0.0058 | 42.8s | 81% | |
| Mistral Small 4 (Reasoning) | 83% | $0.0027 | 32.6s | 81% | |
Fantasy: entering an ancient ruin
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning) | 87% | $0.018 | 22.3s | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.014 | 16.2s | |
| GPT-5.4 Mini | 86% | $0.016 | 17.1s | |
| GPT-5.4 | 89% | $0.039 | 1.2m | |
| Z.AI GLM 5 Turbo | 85% | $0.0090 | 25.9s | |
| GPT-5.4 (Reasoning, Low) | 89% | $0.056 | 1.3m | |
| Claude Sonnet 4.6 | 87% | $0.040 | 37.2s | |
| Writer: Palmyra X5 | 84% | $0.013 | 22.1s | |
| Claude Sonnet 4.5 | 87% | $0.045 | 39.5s | |
| Qwen 3.5 35B | 81% | $0.043 | 2.3m | |
| Z.AI GLM 5 | 83% | $0.0095 | 1.5m | |
| DeepSeek V3 (2024-12-26) | 78% | $0.0029 | 35.6s | |
| MiniMax M2.7 | 82% | $0.0032 | 59.3s | |
| Qwen 3.5 Flash | 81% | $0.0033 | 45.4s | |
| Qwen 3.5 122B | 80% | $0.016 | 39.5s | |
| Qwen3 235B A22B Instruct 2507 | 84% | $0.0017 | 1.1m | |
| GPT-5.4 (Reasoning) | 92% | $0.081 | 2.3m | |
| Qwen 3.5 397B A17B | 84% | $0.0048 | 3.3m | |
| GPT-4.1 | 79% | $0.021 | 40.8s | |
| DeepSeek-V2 Chat | 78% | $0.0029 | 42.0s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 (Reasoning) | 92% | 98% | 90% | |
| GPT-5.4 | 89% | 97% | 88% | |
| GPT-5.1 | 86% | 98% | 85% | |
| GPT-5.4 (Reasoning, Low) | 89% | 92% | 84% | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | 96% | 84% | |
| Claude Sonnet 4.6 | 87% | 96% | 84% | |
| Claude Sonnet 4.5 | 87% | 95% | 83% | |
| Claude Opus 4.6 | 86% | 96% | 82% | |
| Claude Sonnet 4.6 (Reasoning) | 87% | 95% | 82% | |
| Claude Opus 4.6 (Reasoning) | 85% | 97% | 82% | |
| GPT-5.4 Mini | 86% | 93% | 81% | |
| GPT-5.4 Mini (Reasoning) | 87% | 92% | 81% | |
| Qwen 3.5 397B A17B | 84% | 94% | 81% | |
| Z.AI GLM 5 Turbo | 85% | 93% | 80% | |
| GPT-5 | 85% | 95% | 80% | |
| Gemini 3.1 Pro (Preview) | 83% | 93% | 78% | |
| Qwen 3.5 35B | 81% | 96% | 78% | |
| Claude Opus 4.5 | 83% | 91% | 77% | |
| Qwen 3.5 27B | 79% | 97% | 77% | |
| Qwen 3.5 122B | 80% | 95% | 77% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.014 | 16.2s | 84% | |
| GPT-5.4 | 89% | $0.039 | 1.2m | 88% | |
| GPT-5.4 Mini (Reasoning) | 87% | $0.018 | 22.3s | 81% | |
| GPT-5.4 (Reasoning) | 92% | $0.081 | 2.3m | 90% | |
| GPT-5.4 Mini | 86% | $0.016 | 17.1s | 81% | |
| Claude Sonnet 4.6 | 87% | $0.040 | 37.2s | 84% | |
| Claude Sonnet 4.5 | 87% | $0.045 | 39.5s | 83% | |
| Z.AI GLM 5 Turbo | 85% | $0.0090 | 25.9s | 80% | |
| GPT-5.4 (Reasoning, Low) | 89% | $0.056 | 1.3m | 84% | |
| GPT-5.1 | 86% | $0.053 | 1.8m | 85% | |
| Writer: Palmyra X5 | 84% | $0.013 | 22.1s | 76% | |
| Claude Opus 4.6 | 86% | $0.087 | 1.1m | 82% | |
| Qwen 3.5 Flash | 81% | $0.0033 | 45.4s | 76% | |
| Qwen 3.5 122B | 80% | $0.016 | 39.5s | 77% | |
| Grok 4.20 (Beta, Reasoning) | 81% | $0.030 | 24.6s | 77% | |
| Qwen 3.5 9B | 81% | $0.0011 | 43.6s | 75% | |
| Qwen3 235B A22B Instruct 2507 | 84% | $0.0017 | 1.1m | 73% | |
| GPT-5.4 Nano (Reasoning) | 79% | $0.0060 | 26.5s | 76% | |
| MiniMax M2.7 | 82% | $0.0032 | 59.3s | 74% | |
| o4 Mini | 79% | $0.019 | 34.0s | 77% | |
Mystery: examining a crime scene
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| GPT-5.4 Mini | 89% | $0.014 | 15.4s | |
| GPT-5.4 Mini (Reasoning) | 90% | $0.026 | 33.4s | |
| Rocinante 12B | 87% | $0.0022 | 45.1s | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | $0.014 | 15.7s | |
| Stealth: Hunter Alpha | 86% | $0.0000 | 49.6s | |
| Writer: Palmyra X5 | 89% | $0.013 | 21.6s | |
| Grok 4.1 Fast | 86% | $0.0017 | 49.2s | |
| Z.AI GLM 5 Turbo | 88% | $0.0088 | 33.3s | |
| Qwen3 235B A22B Instruct 2507 | 87% | $0.0011 | 47.2s | |
| Claude Haiku 4.5 | 87% | $0.015 | 23.7s | |
| Mistral Large | 86% | $0.018 | 30.5s | |
| GPT-5.4 (Reasoning, Low) | 93% | $0.054 | 1.2m | |
| MiniMax M2.7 | 85% | $0.0043 | 1.3m | |
| Grok 4.20 (Beta) | 86% | $0.015 | 12.7s | |
| Claude Sonnet 4.5 | 90% | $0.042 | 39.8s | |
| GPT-5.4 Nano (Reasoning, Low) | 82% | $0.0050 | 22.8s | |
| Mistral Small 4 | 83% | $0.0017 | 17.4s | |
| GPT-5.4 Nano | 83% | $0.0053 | 18.9s | |
| Hermes 3 405B | 84% | $0.0049 | 25.9s | |
| GPT-5.4 | 92% | $0.044 | 1.3m | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 (Reasoning, Low) | 93% | 98% | 91% | |
| GPT-5.4 Mini (Reasoning) | 90% | 99% | 90% | |
| GPT-5.1 | 90% | 99% | 89% | |
| Claude Opus 4.6 (Reasoning) | 92% | 96% | 88% | |
| GPT-5.4 Mini | 89% | 98% | 87% | |
| GPT-5.4 | 92% | 96% | 87% | |
| GPT-5.4 (Reasoning) | 91% | 94% | 86% | |
| GPT-5 | 87% | 98% | 86% | |
| Claude Sonnet 4.5 | 90% | 96% | 85% | |
| Writer: Palmyra X5 | 89% | 96% | 85% | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | 95% | 84% | |
| Grok 4.20 (Beta, Reasoning) | 86% | 97% | 84% | |
| Claude Opus 4 | 88% | 95% | 84% | |
| Grok 4.1 Fast | 86% | 96% | 83% | |
| Claude Opus 4.6 | 87% | 95% | 83% | |
| Rocinante 12B | 87% | 95% | 83% | |
| Qwen3 235B A22B Instruct 2507 | 87% | 96% | 83% | |
| Claude Haiku 4.5 | 87% | 96% | 83% | |
| Z.AI GLM 5 | 88% | 95% | 82% | |
| Z.AI GLM 5 Turbo | 88% | 94% | 82% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning) | 90% | $0.026 | 33.4s | 90% | |
| GPT-5.4 Mini | 89% | $0.014 | 15.4s | 87% | |
| GPT-5.4 (Reasoning, Low) | 93% | $0.054 | 1.2m | 91% | |
| Writer: Palmyra X5 | 89% | $0.013 | 21.6s | 85% | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | $0.014 | 15.7s | 84% | |
| GPT-5.4 | 92% | $0.044 | 1.3m | 87% | |
| Claude Sonnet 4.5 | 90% | $0.042 | 39.8s | 85% | |
| Z.AI GLM 5 Turbo | 88% | $0.0088 | 33.3s | 82% | |
| Claude Haiku 4.5 | 87% | $0.015 | 23.7s | 83% | |
| Qwen3 235B A22B Instruct 2507 | 87% | $0.0011 | 47.2s | 83% | |
| GPT-5.1 | 90% | $0.052 | 1.6m | 89% | |
| Rocinante 12B | 87% | $0.0022 | 45.1s | 83% | |
| Grok 4.1 Fast | 86% | $0.0017 | 49.2s | 83% | |
| Grok 4.20 (Beta) | 86% | $0.015 | 12.7s | 81% | |
| Stealth: Hunter Alpha | 86% | $0.0000 | 49.6s | 81% | |
| Claude Opus 4.6 (Reasoning) | 92% | $0.091 | 1.2m | 88% | |
| Claude Sonnet 4.6 | 89% | $0.037 | 40.6s | 82% | |
| GPT-5.4 Nano | 83% | $0.0053 | 18.9s | 82% | |
| Grok 4.20 (Beta, Reasoning) | 86% | $0.040 | 35.2s | 84% | |
| o4 Mini | 84% | $0.017 | 30.1s | 81% | |
Horror: alone in an eerie place at night
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning) | 90% | $0.017 | 22.1s | |
| Z.AI GLM 5 Turbo | 88% | $0.0072 | 27.5s | |
| GPT-5.4 Nano (Reasoning, Low) | 85% | $0.0045 | 19.4s | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | $0.013 | 14.9s | |
| Z.AI GLM 5 | 88% | $0.0075 | 50.7s | |
| Qwen3 235B A22B Instruct 2507 | 87% | $0.0015 | 48.5s | |
| GPT-5.4 Mini | 86% | $0.014 | 15.8s | |
| Mistral Large | 86% | $0.015 | 24.6s | |
| Claude 3.5 Haiku | 83% | $0.0050 | 9.0s | |
| GPT-5.4 | 90% | $0.039 | 1.1m | |
| Writer: Palmyra X5 | 85% | $0.012 | 19.1s | |
| Claude Sonnet 4.5 | 89% | $0.040 | 37.3s | |
| Mistral Large 3 | 83% | $0.0037 | 29.1s | |
| Qwen 3.5 397B A17B | 87% | $0.012 | 1.3m | |
| GPT-5.4 Nano | 84% | $0.0047 | 16.7s | |
| MiniMax M2.7 | 87% | $0.0029 | 1.0m | |
| Hermes 3 405B | 81% | $0.0050 | 26.4s | |
| Grok 4.20 (Beta) | 86% | $0.017 | 15.6s | |
| GPT-5.4 (Reasoning, Low) | 90% | $0.048 | 1.1m | |
| Claude Sonnet 4.6 | 88% | $0.034 | 35.1s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 (Reasoning) | 93% | 98% | 91% | |
| GPT-5.4 | 90% | 97% | 87% | |
| GPT-5.4 Mini (Reasoning) | 90% | 95% | 87% | |
| GPT-5.4 (Reasoning, Low) | 90% | 95% | 86% | |
| Z.AI GLM 5 Turbo | 88% | 95% | 86% | |
| Claude Opus 4.6 (Reasoning) | 88% | 97% | 86% | |
| Claude Sonnet 4.6 (Reasoning) | 90% | 95% | 85% | |
| GPT-5 | 89% | 96% | 85% | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | 96% | 84% | |
| Gemini 3.1 Pro (Preview) | 86% | 97% | 84% | |
| Grok 4.20 (Beta) | 86% | 98% | 84% | |
| GPT-5.1 | 89% | 94% | 84% | |
| Claude Opus 4.5 | 87% | 95% | 84% | |
| Qwen 3.5 397B A17B | 87% | 97% | 83% | |
| Claude Sonnet 4.5 | 89% | 94% | 83% | |
| Claude Opus 4 | 87% | 95% | 83% | |
| Claude Sonnet 4.6 | 88% | 94% | 82% | |
| GPT-5.4 Nano | 84% | 97% | 82% | |
| Mistral Large | 86% | 94% | 82% | |
| GPT-5.4 Mini | 86% | 93% | 81% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning) | 90% | $0.017 | 22.1s | 87% | |
| Z.AI GLM 5 Turbo | 88% | $0.0072 | 27.5s | 86% | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | $0.013 | 14.9s | 84% | |
| GPT-5.4 | 90% | $0.039 | 1.1m | 87% | |
| GPT-5.4 (Reasoning) | 93% | $0.075 | 2.1m | 91% | |
| Grok 4.20 (Beta) | 86% | $0.017 | 15.6s | 84% | |
| GPT-5.4 (Reasoning, Low) | 90% | $0.048 | 1.1m | 86% | |
| Claude Sonnet 4.5 | 89% | $0.040 | 37.3s | 83% | |
| Qwen 3.5 397B A17B | 87% | $0.012 | 1.3m | 83% | |
| GPT-5.4 Mini | 86% | $0.014 | 15.8s | 81% | |
| Claude Sonnet 4.6 | 88% | $0.034 | 35.1s | 82% | |
| Mistral Large | 86% | $0.015 | 24.6s | 82% | |
| Qwen3 235B A22B Instruct 2507 | 87% | $0.0015 | 48.5s | 80% | |
| GPT-5.4 Nano | 84% | $0.0047 | 16.7s | 82% | |
| MiniMax M2.7 | 87% | $0.0029 | 1.0m | 80% | |
| Z.AI GLM 5 | 88% | $0.0075 | 50.7s | 79% | |
| GPT-5.4 Nano (Reasoning, Low) | 85% | $0.0045 | 19.4s | 80% | |
| Writer: Palmyra X5 | 85% | $0.012 | 19.1s | 80% | |
| Claude Sonnet 4.6 (Reasoning) | 90% | $0.074 | 1.2m | 85% | |
| Claude Haiku 4.5 | 84% | $0.014 | 22.8s | 81% | |
genre
Literary fiction: old friends reunite
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| GPT-5.4 Mini | 88% | $0.018 | 20.0s | |
| DeepSeek V3 (2025-03-24) | 85% | $0.0012 | 39.8s | |
| Mistral Small Creative | 83% | $0.0007 | 10.0s | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.018 | 21.3s | |
| Ministral 8B | 83% | $0.0003 | 12.7s | |
| Ministral 3 14B | 82% | $0.0006 | 17.7s | |
| Mistral Large | 85% | $0.012 | 33.8s | |
| GPT-5.4 Mini (Reasoning) | 86% | $0.030 | 33.3s | |
| Writer: Palmyra X5 | 83% | $0.011 | 23.1s | |
| Mistral Medium 3.1 | 83% | $0.0047 | 41.0s | |
| ByteDance Seed 1.6 Flash | 83% | $0.0013 | 30.7s | |
| Grok 4.20 (Beta) | 83% | $0.016 | 14.2s | |
| Ministral 3B | 78% | $0.0001 | 7.0s | |
| o4 Mini | 81% | $0.017 | 27.0s | |
| Mistral Small 4 | 81% | $0.0015 | 23.5s | |
| Mistral Large 3 | 79% | $0.0028 | 31.2s | |
| Grok 4.1 Fast | 81% | $0.0019 | 38.6s | |
| Qwen3 235B A22B Instruct 2507 | 83% | $0.0011 | 1.2m | |
| GPT-5.4 Nano (Reasoning, Low) | 78% | $0.0054 | 19.1s | |
| Mistral Small 3.2 24B | 78% | $0.0006 | 22.9s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 (Reasoning) | 88% | 98% | 86% | |
| GPT-5.4 Mini | 88% | 96% | 86% | |
| GPT-5.4 | 89% | 96% | 85% | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | 97% | 85% | |
| GPT-5.4 (Reasoning, Low) | 89% | 95% | 84% | |
| GPT-5.4 Mini (Reasoning) | 86% | 96% | 84% | |
| GPT-5.1 | 84% | 97% | 83% | |
| Grok 4.1 Fast | 81% | 98% | 80% | |
| Grok 4.20 (Beta, Reasoning) | 83% | 96% | 80% | |
| Ministral 3 14B | 82% | 96% | 79% | |
| Mistral Medium 3.1 | 83% | 96% | 79% | |
| Grok 4 | 81% | 97% | 79% | |
| Mistral Small Creative | 83% | 93% | 79% | |
| DeepSeek V3 (2025-03-24) | 85% | 91% | 79% | |
| Mistral Large | 85% | 92% | 78% | |
| Ministral 8B | 83% | 93% | 78% | |
| Qwen 3 32B | 81% | 97% | 78% | |
| GPT-5 | 81% | 94% | 78% | |
| o4 Mini | 81% | 94% | 78% | |
| Grok 4.20 (Beta) | 83% | 95% | 78% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini | 88% | $0.018 | 20.0s | 86% | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.018 | 21.3s | 85% | |
| GPT-5.4 Mini (Reasoning) | 86% | $0.030 | 33.3s | 84% | |
| Mistral Small Creative | 83% | $0.0007 | 10.0s | 79% | |
| Ministral 3 14B | 82% | $0.0006 | 17.7s | 79% | |
| DeepSeek V3 (2025-03-24) | 85% | $0.0012 | 39.8s | 79% | |
| Ministral 8B | 83% | $0.0003 | 12.7s | 78% | |
| Mistral Large | 85% | $0.012 | 33.8s | 78% | |
| Mistral Medium 3.1 | 83% | $0.0047 | 41.0s | 79% | |
| Grok 4.20 (Beta) | 83% | $0.016 | 14.2s | 78% | |
| Grok 4.1 Fast | 81% | $0.0019 | 38.6s | 80% | |
| Writer: Palmyra X5 | 83% | $0.011 | 23.1s | 77% | |
| ByteDance Seed 1.6 Flash | 83% | $0.0013 | 30.7s | 77% | |
| Mistral Small 4 | 81% | $0.0015 | 23.5s | 75% | |
| GPT-5.4 | 89% | $0.070 | 2.0m | 85% | |
| o4 Mini | 81% | $0.017 | 27.0s | 78% | |
| Qwen3 235B A22B Instruct 2507 | 83% | $0.0011 | 1.2m | 78% | |
| GPT-5.4 (Reasoning, Low) | 89% | $0.070 | 1.9m | 84% | |
| GPT-4.1 | 81% | $0.017 | 34.0s | 77% | |
| Mistral Small 4 (Reasoning) | 81% | $0.0024 | 37.2s | 75% | |
Thriller: chase through city streets
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning) | 90% | $0.022 | 26.3s | |
| GPT-5.4 Mini | 89% | $0.015 | 18.3s | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.013 | 16.0s | |
| Writer: Palmyra X5 | 86% | $0.011 | 22.5s | |
| Gemini 2.5 Flash | 81% | $0.0039 | 8.1s | |
| Qwen3 235B A22B Instruct 2507 | 86% | $0.0011 | 1.1m | |
| GPT-5.4 | 92% | $0.050 | 1.4m | |
| GPT-5.4 (Reasoning, Low) | 91% | $0.051 | 1.4m | |
| Mistral Small 4 (Reasoning) | 83% | $0.0023 | 32.8s | |
| Qwen 3.5 397B A17B | 87% | $0.016 | 1.5m | |
| Z.AI GLM 5 Turbo | 85% | $0.0068 | 30.7s | |
| GPT-4.1 | 83% | $0.018 | 46.3s | |
| Qwen 3.5 35B | 84% | $0.0068 | 26.9s | |
| Mistral Small 4 | 79% | $0.0011 | 17.2s | |
| Gemini 2.5 Flash Lite (Reasoning) | 81% | $0.0027 | 28.7s | |
| Ministral 8B | 79% | $0.0002 | 8.4s | |
| MiniMax M2.5 | 81% | $0.0026 | 35.8s | |
| GPT-4.1 Mini | 81% | $0.0025 | 14.4s | |
| GPT-5.4 Nano (Reasoning, Low) | 80% | $0.0049 | 17.8s | |
| Claude Sonnet 4.5 | 85% | $0.027 | 34.3s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 | 92% | 98% | 90% | |
| GPT-5.4 (Reasoning) | 91% | 95% | 88% | |
| GPT-5.4 (Reasoning, Low) | 91% | 94% | 86% | |
| GPT-5.4 Mini | 89% | 96% | 86% | |
| GPT-5.4 Mini (Reasoning) | 90% | 93% | 85% | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | 94% | 83% | |
| Qwen 3.5 397B A17B | 87% | 94% | 82% | |
| Writer: Palmyra X5 | 86% | 94% | 81% | |
| Gemini 3.1 Pro (Preview) | 86% | 91% | 80% | |
| GPT-5 | 82% | 96% | 80% | |
| o4 Mini High | 81% | 97% | 79% | |
| Qwen3 235B A22B Instruct 2507 | 86% | 93% | 79% | |
| GPT-5.1 | 83% | 93% | 78% | |
| MiniMax M2.5 | 81% | 95% | 78% | |
| GPT-4.1 | 83% | 92% | 78% | |
| Claude Sonnet 4.5 | 85% | 93% | 78% | |
| Z.AI GLM 5 Turbo | 85% | 94% | 78% | |
| Gemini 2.5 Flash Lite (Reasoning) | 81% | 96% | 78% | |
| GPT-5.4 Nano (Reasoning, Low) | 80% | 96% | 78% | |
| Qwen 3.5 Flash | 80% | 95% | 77% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini | 89% | $0.015 | 18.3s | 86% | |
| GPT-5.4 Mini (Reasoning) | 90% | $0.022 | 26.3s | 85% | |
| GPT-5.4 | 92% | $0.050 | 1.4m | 90% | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.013 | 16.0s | 83% | |
| Writer: Palmyra X5 | 86% | $0.011 | 22.5s | 81% | |
| GPT-5.4 (Reasoning, Low) | 91% | $0.051 | 1.4m | 86% | |
| Qwen3 235B A22B Instruct 2507 | 86% | $0.0011 | 1.1m | 79% | |
| Z.AI GLM 5 Turbo | 85% | $0.0068 | 30.7s | 78% | |
| Qwen 3.5 397B A17B | 87% | $0.016 | 1.5m | 82% | |
| Qwen 3.5 35B | 84% | $0.0068 | 26.9s | 76% | |
| Claude Sonnet 4.5 | 85% | $0.027 | 34.3s | 78% | |
| MiniMax M2.5 | 81% | $0.0026 | 35.8s | 78% | |
| GPT-4.1 Mini | 81% | $0.0025 | 14.4s | 77% | |
| Gemini 2.5 Flash Lite (Reasoning) | 81% | $0.0027 | 28.7s | 78% | |
| GPT-5.4 Nano (Reasoning, Low) | 80% | $0.0049 | 17.8s | 78% | |
| Mistral Small 4 (Reasoning) | 83% | $0.0023 | 32.8s | 75% | |
| GPT-4.1 | 83% | $0.018 | 46.3s | 78% | |
| Gemini 2.5 Flash (Reasoning) | 81% | $0.0100 | 19.8s | 76% | |
| Ministral 8B | 79% | $0.0002 | 8.4s | 75% | |
| o4 Mini High | 81% | $0.022 | 37.6s | 79% | |
Romance: separated couple reunites
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Rocinante 12B | 84% | $0.0007 | 17.7s | |
| GPT-5.4 Mini (Reasoning) | 89% | $0.019 | 21.8s | |
| GPT-5.4 Mini (Reasoning, Low) | 90% | $0.017 | 19.1s | |
| GPT-5.4 Mini | 88% | $0.018 | 19.0s | |
| Mistral Small Creative | 80% | $0.0007 | 11.0s | |
| DeepSeek V3 (2025-03-24) | 85% | $0.0011 | 38.6s | |
| Mistral Large | 82% | $0.011 | 28.6s | |
| Mistral Small 4 | 81% | $0.0016 | 23.0s | |
| Claude 3.5 Haiku | 82% | $0.0021 | 8.5s | |
| Mistral Small 4 (Reasoning) | 82% | $0.0025 | 37.5s | |
| Writer: Palmyra X5 | 80% | $0.011 | 23.9s | |
| ByteDance Seed 1.6 Flash | 81% | $0.0012 | 26.6s | |
| Ministral 3 3B | 79% | $0.0002 | 4.1s | |
| Ministral 3 14B | 79% | $0.0005 | 12.6s | |
| Z.AI GLM 5 Turbo | 84% | $0.0082 | 37.6s | |
| MiniMax M2.7 | 83% | $0.0039 | 1.1m | |
| Qwen 3.5 35B | 82% | $0.0096 | 39.3s | |
| Qwen3 235B A22B Instruct 2507 | 83% | $0.0006 | 52.1s | |
| Grok 4 Fast | 79% | $0.0016 | 22.9s | |
| Mistral Large 3 | 81% | $0.0028 | 32.4s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 | 92% | 97% | 90% | |
| GPT-5.4 (Reasoning, Low) | 90% | 97% | 88% | |
| GPT-5.4 Mini (Reasoning, Low) | 90% | 98% | 88% | |
| GPT-5.4 (Reasoning) | 91% | 97% | 87% | |
| GPT-5.4 Mini (Reasoning) | 89% | 96% | 86% | |
| GPT-5.4 Mini | 88% | 96% | 85% | |
| DeepSeek V3 (2025-03-24) | 85% | 96% | 81% | |
| Claude Opus 4 | 85% | 96% | 80% | |
| Qwen 3.5 397B A17B | 83% | 96% | 79% | |
| Mistral Small 4 | 81% | 97% | 79% | |
| Mistral Small 4 (Reasoning) | 82% | 96% | 79% | |
| Qwen 3.5 35B | 82% | 97% | 79% | |
| Qwen3 235B A22B Instruct 2507 | 83% | 93% | 78% | |
| MiniMax M2.7 | 83% | 93% | 78% | |
| Mistral Medium 3.1 | 81% | 96% | 78% | |
| Qwen 3.5 122B | 80% | 98% | 78% | |
| ByteDance Seed 1.6 Flash | 81% | 96% | 77% | |
| GPT-5.1 | 83% | 93% | 77% | |
| Qwen 3.5 Flash | 80% | 97% | 77% | |
| Claude Sonnet 4 | 79% | 97% | 77% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning, Low) | 90% | $0.017 | 19.1s | 88% | |
| GPT-5.4 Mini (Reasoning) | 89% | $0.019 | 21.8s | 86% | |
| GPT-5.4 Mini | 88% | $0.018 | 19.0s | 85% | |
| DeepSeek V3 (2025-03-24) | 85% | $0.0011 | 38.6s | 81% | |
| GPT-5.4 | 92% | $0.065 | 1.9m | 90% | |
| GPT-5.4 (Reasoning, Low) | 90% | $0.067 | 1.7m | 88% | |
| Mistral Small 4 | 81% | $0.0016 | 23.0s | 79% | |
| Mistral Small 4 (Reasoning) | 82% | $0.0025 | 37.5s | 79% | |
| ByteDance Seed 1.6 Flash | 81% | $0.0012 | 26.6s | 77% | |
| Qwen3 235B A22B Instruct 2507 | 83% | $0.0006 | 52.1s | 78% | |
| Qwen 3.5 35B | 82% | $0.0096 | 39.3s | 79% | |
| Z.AI GLM 5 Turbo | 84% | $0.0082 | 37.6s | 76% | |
| Mistral Large | 82% | $0.011 | 28.6s | 77% | |
| Rocinante 12B | 84% | $0.0007 | 17.7s | 72% | |
| Ministral 3 3B | 79% | $0.0002 | 4.1s | 75% | |
| Mistral Small Creative | 80% | $0.0007 | 11.0s | 75% | |
| Claude 3.5 Haiku | 82% | $0.0021 | 8.5s | 73% | |
| MiniMax M2.7 | 83% | $0.0039 | 1.1m | 78% | |
| Mistral Large 3 | 81% | $0.0028 | 32.4s | 76% | |
| Qwen 3.5 Flash | 80% | $0.0020 | 36.1s | 77% | |
Fantasy: entering an ancient ruin
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning, Low) | 83% | $0.016 | 18.6s | |
| GPT-5.4 Mini | 83% | $0.016 | 18.8s | |
| Rocinante 12B | 78% | $0.0009 | 26.7s | |
| GPT-5.4 Mini (Reasoning) | 84% | $0.024 | 29.9s | |
| ByteDance Seed 1.6 Flash | 76% | $0.0014 | 29.6s | |
| Mistral Small 4 | 76% | $0.0015 | 19.7s | |
| Qwen 3.5 9B | 79% | $0.0008 | 56.9s | |
| GPT-5.4 Nano | 77% | $0.0050 | 17.8s | |
| Qwen3 235B A22B Instruct 2507 | 78% | $0.0010 | 56.7s | |
| Z.AI GLM 5 | 79% | $0.0078 | 45.3s | |
| Gemini 3.1 Flash Lite (Preview) | 76% | $0.0029 | 8.8s | |
| Hermes 3 405B | 80% | $0.0030 | 1.6m | |
| GPT-4o Mini (temp=1) | 77% | $0.0012 | 28.4s | |
| Claude 3 Haiku | 74% | $0.0025 | 16.9s | |
| o4 Mini | 77% | $0.015 | 23.0s | |
| Grok 4.20 (Beta) | 78% | $0.018 | 19.1s | |
| Z.AI GLM 5 Turbo | 77% | $0.0074 | 34.2s | |
| Mistral Medium 3.1 | 74% | $0.0048 | 33.3s | |
| Mistral NeMO | 73% | $0.0005 | 7.9s | |
| DeepSeek-V2 Chat | 76% | $0.0023 | 46.2s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 (Reasoning) | 89% | 95% | 85% | |
| GPT-5.4 (Reasoning, Low) | 86% | 98% | 85% | |
| GPT-5.4 | 84% | 96% | 81% | |
| GPT-5.4 Mini (Reasoning, Low) | 83% | 97% | 80% | |
| GPT-5.4 Mini | 83% | 96% | 80% | |
| GPT-5.4 Mini (Reasoning) | 84% | 92% | 76% | |
| Qwen 3.5 9B | 79% | 96% | 75% | |
| GPT-5.4 Nano | 77% | 96% | 75% | |
| Grok 4.20 (Beta) | 78% | 96% | 75% | |
| Qwen3 235B A22B Instruct 2507 | 78% | 95% | 75% | |
| GPT-5.1 | 80% | 94% | 74% | |
| Qwen 3.5 397B A17B | 80% | 92% | 74% | |
| GPT-5.2 | 77% | 97% | 74% | |
| o4 Mini | 77% | 95% | 74% | |
| Rocinante 12B | 78% | 93% | 74% | |
| Claude Opus 4.6 (Reasoning) | 78% | 94% | 74% | |
| DeepSeek-V2 Chat | 76% | 97% | 73% | |
| GPT-5 Mini | 77% | 96% | 73% | |
| GPT-5 | 79% | 95% | 73% | |
| Grok 4.1 Fast | 75% | 97% | 73% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini | 83% | $0.016 | 18.8s | 80% | |
| GPT-5.4 Mini (Reasoning, Low) | 83% | $0.016 | 18.6s | 80% | |
| GPT-5.4 (Reasoning, Low) | 86% | $0.056 | 1.4m | 85% | |
| GPT-5.4 Mini (Reasoning) | 84% | $0.024 | 29.9s | 76% | |
| GPT-5.4 (Reasoning) | 89% | $0.079 | 2.2m | 85% | |
| GPT-5.4 | 84% | $0.048 | 1.4m | 81% | |
| Qwen 3.5 9B | 79% | $0.0008 | 56.9s | 75% | |
| Rocinante 12B | 78% | $0.0009 | 26.7s | 74% | |
| GPT-5.4 Nano | 77% | $0.0050 | 17.8s | 75% | |
| Qwen3 235B A22B Instruct 2507 | 78% | $0.0010 | 56.7s | 75% | |
| Grok 4.20 (Beta) | 78% | $0.018 | 19.1s | 75% | |
| Gemini 3.1 Flash Lite (Preview) | 76% | $0.0029 | 8.8s | 72% | |
| Hermes 3 405B | 80% | $0.0030 | 1.6m | 73% | |
| o4 Mini | 77% | $0.015 | 23.0s | 74% | |
| Z.AI GLM 5 | 79% | $0.0078 | 45.3s | 72% | |
| GPT-5 Mini | 77% | $0.010 | 41.6s | 73% | |
| DeepSeek-V2 Chat | 76% | $0.0023 | 46.2s | 73% | |
| ByteDance Seed 1.6 Flash | 76% | $0.0014 | 29.6s | 72% | |
| Mistral Small 4 | 76% | $0.0015 | 19.7s | 71% | |
| Mistral Large | 76% | $0.014 | 34.6s | 73% | |
Mystery: examining a crime scene
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Rocinante 12B | 85% | $0.0006 | 25.4s | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.016 | 18.8s | |
| GPT-5.4 Mini | 89% | $0.017 | 19.5s | |
| Mistral Small 4 (Reasoning) | 83% | $0.0022 | 30.7s | |
| Qwen 3 32B | 83% | $0.0012 | 39.7s | |
| Mistral Small Creative | 81% | $0.0006 | 9.8s | |
| GPT-5.4 Nano | 80% | $0.0057 | 20.4s | |
| Mistral NeMO | 81% | $0.0003 | 11.1s | |
| Mistral Large 3 | 82% | $0.0026 | 28.4s | |
| Mistral Small 4 | 79% | $0.0011 | 18.0s | |
| LFM2 24B | 78% | $0.0002 | 33.2s | |
| Ministral 3 14B | 79% | $0.0005 | 12.5s | |
| GPT-5.4 Nano (Reasoning, Low) | 81% | $0.0055 | 20.4s | |
| GPT-5.4 Mini (Reasoning) | 89% | $0.031 | 37.4s | |
| DeepSeek V3 (2025-03-24) | 80% | $0.0016 | 33.7s | |
| Qwen 3.5 Flash | 81% | $0.0019 | 36.5s | |
| o4 Mini | 81% | $0.014 | 22.3s | |
| Hermes 3 405B | 82% | $0.0020 | 59.2s | |
| Grok 4.1 Fast | 80% | $0.0016 | 40.6s | |
| Gemma 3 12B | 79% | $0.0004 | 37.1s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 (Reasoning) | 92% | 98% | 91% | |
| GPT-5.4 | 92% | 96% | 88% | |
| GPT-5.4 (Reasoning, Low) | 91% | 94% | 87% | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | 96% | 84% | |
| GPT-5.4 Mini (Reasoning) | 89% | 95% | 84% | |
| GPT-5.1 | 85% | 98% | 83% | |
| GPT-5 | 85% | 96% | 83% | |
| Claude Sonnet 4.5 | 85% | 96% | 81% | |
| Qwen 3.5 397B A17B | 84% | 95% | 80% | |
| Qwen 3 32B | 83% | 97% | 80% | |
| GPT-5.4 Mini | 89% | 92% | 80% | |
| Claude Opus 4 | 83% | 94% | 79% | |
| Mistral Small 4 (Reasoning) | 83% | 94% | 78% | |
| o4 Mini | 81% | 94% | 77% | |
| Writer: Palmyra X5 | 81% | 97% | 77% | |
| GPT-5.4 Nano (Reasoning, Low) | 81% | 95% | 77% | |
| GPT-5.4 Nano | 80% | 94% | 77% | |
| Grok 4 | 79% | 98% | 77% | |
| Rocinante 12B | 85% | 88% | 77% | |
| Qwen3 235B A22B Instruct 2507 | 81% | 95% | 77% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.016 | 18.8s | 84% | |
| GPT-5.4 Mini | 89% | $0.017 | 19.5s | 80% | |
| GPT-5.4 Mini (Reasoning) | 89% | $0.031 | 37.4s | 84% | |
| Rocinante 12B | 85% | $0.0006 | 25.4s | 77% | |
| Qwen 3 32B | 83% | $0.0012 | 39.7s | 80% | |
| GPT-5.4 | 92% | $0.057 | 1.6m | 88% | |
| Mistral Small 4 (Reasoning) | 83% | $0.0022 | 30.7s | 78% | |
| Claude Sonnet 4.5 | 85% | $0.029 | 35.7s | 81% | |
| Mistral Large 3 | 82% | $0.0026 | 28.4s | 76% | |
| GPT-5.4 (Reasoning, Low) | 91% | $0.060 | 1.6m | 87% | |
| Mistral Small Creative | 81% | $0.0006 | 9.8s | 75% | |
| Mistral NeMO | 81% | $0.0003 | 11.1s | 75% | |
| GPT-5.4 Nano (Reasoning, Low) | 81% | $0.0055 | 20.4s | 77% | |
| GPT-5.4 Nano | 80% | $0.0057 | 20.4s | 77% | |
| Writer: Palmyra X5 | 81% | $0.011 | 24.2s | 77% | |
| o4 Mini | 81% | $0.014 | 22.3s | 77% | |
| Ministral 3 14B | 79% | $0.0005 | 12.5s | 75% | |
| Qwen 3.5 Flash | 81% | $0.0019 | 36.5s | 75% | |
| Mistral Small 4 | 79% | $0.0011 | 18.0s | 75% | |
| MiniMax M2.5 | 82% | $0.0030 | 1.0m | 76% | |
Horror: alone in an eerie place at night
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| DeepSeek V3 (2025-03-24) | 85% | $0.0011 | 25.4s | |
| GPT-5.4 Mini (Reasoning, Low) | 86% | $0.013 | 15.6s | |
| GPT-5.4 Mini (Reasoning) | 85% | $0.015 | 18.0s | |
| GPT-5.4 Mini | 86% | $0.013 | 15.9s | |
| Qwen 3.5 Flash | 81% | $0.0015 | 28.4s | |
| Z.AI GLM 5 | 82% | $0.0072 | 59.0s | |
| LFM2 24B | 80% | $0.0002 | 27.7s | |
| Rocinante 12B | 81% | $0.0007 | 21.4s | |
| Mistral Small 4 | 83% | $0.0012 | 21.9s | |
| Mistral NeMO | 77% | $0.0003 | 10.0s | |
| Qwen 3.5 397B A17B | 86% | $0.017 | 1.7m | |
| Z.AI GLM 5 Turbo | 83% | $0.0066 | 34.0s | |
| Claude 3.5 Haiku | 78% | $0.0022 | 10.2s | |
| Mistral Small 4 (Reasoning) | 80% | $0.0021 | 28.9s | |
| Qwen 3.5 9B | 79% | $0.0006 | 53.5s | |
| Ministral 3 3B | 77% | $0.0003 | 7.5s | |
| Gemma 3 12B | 78% | $0.0002 | 36.7s | |
| Ministral 8B | 78% | $0.0002 | 11.1s | |
| Qwen3 235B A22B Instruct 2507 | 81% | $0.0011 | 1.1m | |
| Mistral Small Creative | 80% | $0.0006 | 9.8s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 | 88% | 97% | 86% | |
| GPT-5.4 Mini (Reasoning) | 85% | 97% | 83% | |
| GPT-5.4 (Reasoning) | 88% | 95% | 82% | |
| GPT-5.4 (Reasoning, Low) | 86% | 95% | 82% | |
| GPT-5.4 Mini | 86% | 96% | 82% | |
| GPT-5.4 Mini (Reasoning, Low) | 86% | 95% | 82% | |
| Qwen 3.5 397B A17B | 86% | 95% | 81% | |
| Claude Opus 4.6 (Reasoning) | 83% | 97% | 80% | |
| DeepSeek V3 (2025-03-24) | 85% | 94% | 80% | |
| Claude Opus 4.6 | 82% | 96% | 80% | |
| GPT-5.1 | 83% | 95% | 79% | |
| Claude Opus 4.5 | 84% | 94% | 79% | |
| GPT-5 | 83% | 96% | 79% | |
| Qwen 3.5 Flash | 81% | 95% | 79% | |
| Claude Sonnet 4.6 (Reasoning) | 83% | 93% | 78% | |
| Mistral Small 4 | 83% | 96% | 78% | |
| Z.AI GLM 5 Turbo | 83% | 94% | 78% | |
| Gemini 3.1 Pro (Preview) | 85% | 93% | 78% | |
| Claude Sonnet 4.6 | 81% | 95% | 77% | |
| Grok 4.20 (Beta) | 80% | 96% | 77% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini | 86% | $0.013 | 15.9s | 82% | |
| GPT-5.4 Mini (Reasoning) | 85% | $0.015 | 18.0s | 83% | |
| GPT-5.4 Mini (Reasoning, Low) | 86% | $0.013 | 15.6s | 82% | |
| DeepSeek V3 (2025-03-24) | 85% | $0.0011 | 25.4s | 80% | |
| Mistral Small 4 | 83% | $0.0012 | 21.9s | 78% | |
| Qwen 3.5 Flash | 81% | $0.0015 | 28.4s | 79% | |
| Z.AI GLM 5 Turbo | 83% | $0.0066 | 34.0s | 78% | |
| GPT-5.4 | 88% | $0.048 | 1.4m | 86% | |
| Qwen 3.5 397B A17B | 86% | $0.017 | 1.7m | 81% | |
| Rocinante 12B | 81% | $0.0007 | 21.4s | 74% | |
| Grok 4.1 Fast | 80% | $0.0013 | 27.0s | 75% | |
| Grok 4.20 (Beta) | 80% | $0.016 | 15.3s | 77% | |
| LFM2 24B | 80% | $0.0002 | 27.7s | 75% | |
| Mistral Small Creative | 80% | $0.0006 | 9.8s | 73% | |
| Ministral 3 14B | 78% | $0.0005 | 13.1s | 74% | |
| Mistral Large 2 | 81% | $0.0094 | 25.1s | 75% | |
| Z.AI GLM 5 | 82% | $0.0072 | 59.0s | 76% | |
| Mistral Small 4 (Reasoning) | 80% | $0.0021 | 28.9s | 74% | |
| Writer: Palmyra X5 | 81% | $0.011 | 22.8s | 73% | |
| Ministral 8B | 78% | $0.0002 | 11.1s | 73% | |
Novelcrafter Default Prompt
Literary fiction: old friends reunite
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Mistral Small Creative | 86% | $0.0005 | 8.0s | |
| Mistral Large 2 | 88% | $0.0099 | 26.9s | |
| Mistral Small 4 | 86% | $0.0012 | 20.4s | |
| Ministral 3 14B | 86% | $0.0005 | 11.5s | |
| GPT-5.4 Mini | 87% | $0.015 | 16.0s | |
| Mistral Small 4 (Reasoning) | 85% | $0.0017 | 27.6s | |
| Grok 4.20 (Beta) | 85% | $0.017 | 15.1s | |
| Mistral Large 3 | 85% | $0.0019 | 19.0s | |
| Mistral Medium 3.1 | 85% | $0.0039 | 31.4s | |
| Qwen3 235B A22B Instruct 2507 | 86% | $0.0012 | 1.2m | |
| LFM2 24B | 82% | $0.0002 | 23.9s | |
| Writer: Palmyra X5 | 85% | $0.011 | 22.5s | |
| Grok 4.20 (Beta, Reasoning) | 91% | $0.045 | 38.3s | |
| Ministral 3 3B | 78% | $0.0003 | 7.7s | |
| Hermes 3 405B | 82% | $0.0023 | 35.3s | |
| GPT-5.4 Mini (Reasoning, Low) | 84% | $0.015 | 17.3s | |
| Llama 3.1 8B | 80% | $0.0002 | 28.3s | |
| Grok 4 Fast | 82% | $0.0018 | 28.2s | |
| Grok 4.1 Fast | 85% | $0.0018 | 1.1m | |
| Ministral 3 8B | 80% | $0.0004 | 9.3s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Grok 4.20 (Beta, Reasoning) | 91% | 94% | 87% | |
| GPT-5.4 Mini | 87% | 98% | 85% | |
| GPT-5.4 | 88% | 97% | 85% | |
| Mistral Large 2 | 88% | 95% | 85% | |
| GPT-5.4 (Reasoning, Low) | 86% | 97% | 84% | |
| GPT-5.4 (Reasoning) | 86% | 97% | 83% | |
| Mistral Small 4 | 86% | 96% | 83% | |
| Gemini 3.1 Pro (Preview) | 83% | 99% | 83% | |
| Writer: Palmyra X5 | 85% | 97% | 82% | |
| Qwen 3.5 397B A17B | 85% | 97% | 82% | |
| Mistral Medium 3.1 | 85% | 96% | 82% | |
| Mistral Large 3 | 85% | 97% | 82% | |
| Mistral Small Creative | 86% | 93% | 82% | |
| GPT-5.4 Mini (Reasoning) | 84% | 97% | 81% | |
| GPT-5 | 84% | 97% | 81% | |
| Ministral 3 14B | 86% | 95% | 81% | |
| Claude Opus 4 | 83% | 97% | 81% | |
| GPT-5.1 | 84% | 94% | 81% | |
| Qwen3 235B A22B Instruct 2507 | 86% | 93% | 80% | |
| Grok 4.1 Fast | 85% | 95% | 80% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Mistral Large 2 | 88% | $0.0099 | 26.9s | 85% | |
| Grok 4.20 (Beta, Reasoning) | 91% | $0.045 | 38.3s | 87% | |
| GPT-5.4 Mini | 87% | $0.015 | 16.0s | 85% | |
| Mistral Small Creative | 86% | $0.0005 | 8.0s | 82% | |
| Mistral Small 4 | 86% | $0.0012 | 20.4s | 83% | |
| Ministral 3 14B | 86% | $0.0005 | 11.5s | 81% | |
| Mistral Large 3 | 85% | $0.0019 | 19.0s | 82% | |
| Mistral Medium 3.1 | 85% | $0.0039 | 31.4s | 82% | |
| Writer: Palmyra X5 | 85% | $0.011 | 22.5s | 82% | |
| Mistral Small 4 (Reasoning) | 85% | $0.0017 | 27.6s | 80% | |
| Qwen3 235B A22B Instruct 2507 | 86% | $0.0012 | 1.2m | 80% | |
| GPT-5.4 Mini (Reasoning, Low) | 84% | $0.015 | 17.3s | 80% | |
| GPT-5.4 Mini (Reasoning) | 84% | $0.025 | 28.5s | 81% | |
| Grok 4.1 Fast | 85% | $0.0018 | 1.1m | 80% | |
| Grok 4.20 (Beta) | 85% | $0.017 | 15.1s | 79% | |
| Grok 4 Fast | 82% | $0.0018 | 28.2s | 79% | |
| GPT-5.4 Nano (Reasoning, Low) | 83% | $0.0062 | 22.3s | 79% | |
| LFM2 24B | 82% | $0.0002 | 23.9s | 78% | |
| Stealth: Healer Alpha | 81% | $0.0000 | 25.3s | 79% | |
| GPT-5.4 Nano (Reasoning) | 82% | $0.0069 | 27.2s | 79% | |
Thriller: chase through city streets
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Rocinante 12B | 86% | $0.0008 | 26.2s | |
| Qwen3 235B A22B Instruct 2507 | 87% | $0.0006 | 49.7s | |
| GPT-5.4 Mini (Reasoning) | 90% | $0.020 | 29.8s | |
| Z.AI GLM 5 Turbo | 87% | $0.0063 | 29.9s | |
| Writer: Palmyra X5 | 88% | $0.010 | 20.5s | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | $0.014 | 16.1s | |
| Mistral Small 4 (Reasoning) | 84% | $0.0015 | 21.5s | |
| Grok 4.1 Fast | 85% | $0.0014 | 23.5s | |
| Mistral Small 4 | 83% | $0.0009 | 12.8s | |
| Claude Haiku 4.5 | 85% | $0.0089 | 18.4s | |
| MiniMax M2.5 | 86% | $0.0026 | 46.1s | |
| Gemini 2.5 Flash (Reasoning) | 83% | $0.0076 | 14.4s | |
| Stealth: Hunter Alpha | 84% | $0.0000 | 47.7s | |
| GPT-5.4 Mini | 87% | $0.012 | 14.1s | |
| Qwen 3.5 35B | 85% | $0.0086 | 33.0s | |
| Z.AI GLM 4.7 | 84% | $0.0095 | 1.5m | |
| Ministral 3 14B | 79% | $0.0004 | 8.9s | |
| Gemini 2.5 Flash Lite | 83% | $0.0007 | 9.0s | |
| Grok 4.20 (Beta) | 84% | $0.017 | 17.9s | |
| Mistral Medium 3.1 | 82% | $0.0041 | 30.6s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 | 92% | 98% | 90% | |
| GPT-5.4 (Reasoning) | 93% | 95% | 88% | |
| GPT-5.4 (Reasoning, Low) | 93% | 96% | 87% | |
| GPT-5.4 Mini (Reasoning) | 90% | 96% | 86% | |
| Gemini 3.1 Pro (Preview) | 87% | 98% | 85% | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | 97% | 85% | |
| GPT-5.1 | 88% | 95% | 85% | |
| Writer: Palmyra X5 | 88% | 97% | 84% | |
| Grok 4.1 Fast | 85% | 99% | 84% | |
| GPT-5 | 85% | 98% | 84% | |
| GPT-5.4 Mini | 87% | 96% | 83% | |
| Qwen 3.5 35B | 85% | 96% | 82% | |
| Z.AI GLM 5 Turbo | 87% | 95% | 82% | |
| Qwen3 235B A22B Instruct 2507 | 87% | 94% | 82% | |
| Claude Sonnet 4.5 | 85% | 97% | 82% | |
| Claude Haiku 4.5 | 85% | 94% | 81% | |
| Claude Opus 4.6 (Reasoning) | 82% | 98% | 81% | |
| Grok 4.20 (Beta) | 84% | 94% | 81% | |
| Claude Opus 4 | 85% | 95% | 81% | |
| Z.AI GLM 4.7 | 84% | 94% | 80% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning) | 90% | $0.020 | 29.8s | 86% | |
| GPT-5.4 Mini (Reasoning, Low) | 88% | $0.014 | 16.1s | 85% | |
| Writer: Palmyra X5 | 88% | $0.010 | 20.5s | 84% | |
| Grok 4.1 Fast | 85% | $0.0014 | 23.5s | 84% | |
| Z.AI GLM 5 Turbo | 87% | $0.0063 | 29.9s | 82% | |
| GPT-5.4 Mini | 87% | $0.012 | 14.1s | 83% | |
| Qwen3 235B A22B Instruct 2507 | 87% | $0.0006 | 49.7s | 82% | |
| GPT-5.4 | 92% | $0.045 | 1.4m | 90% | |
| GPT-5.4 (Reasoning, Low) | 93% | $0.046 | 1.2m | 87% | |
| Qwen 3.5 35B | 85% | $0.0086 | 33.0s | 82% | |
| Claude Haiku 4.5 | 85% | $0.0089 | 18.4s | 81% | |
| MiniMax M2.5 | 86% | $0.0026 | 46.1s | 80% | |
| Mistral Small 4 (Reasoning) | 84% | $0.0015 | 21.5s | 80% | |
| Mistral Small 4 | 83% | $0.0009 | 12.8s | 78% | |
| Rocinante 12B | 86% | $0.0008 | 26.2s | 75% | |
| Gemini 2.5 Flash Lite | 83% | $0.0007 | 9.0s | 77% | |
| Grok 4.20 (Beta) | 84% | $0.017 | 17.9s | 81% | |
| Z.AI GLM 5 | 86% | $0.0062 | 1.3m | 78% | |
| Claude Sonnet 4.5 | 85% | $0.028 | 34.3s | 82% | |
| GPT-5.4 Nano (Reasoning, Low) | 81% | $0.0055 | 22.0s | 79% | |
Romance: separated couple reunites
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| GPT-5.4 Mini | 90% | $0.014 | 15.5s | |
| Mistral Small Creative | 86% | $0.0006 | 9.0s | |
| Mistral Large 3 | 84% | $0.0026 | 28.0s | |
| GPT-5.4 Mini (Reasoning, Low) | 86% | $0.015 | 16.6s | |
| Ministral 3 14B | 84% | $0.0004 | 9.5s | |
| Mistral Small 4 | 84% | $0.0011 | 16.9s | |
| GPT-5.4 Mini (Reasoning) | 87% | $0.021 | 24.2s | |
| Qwen 3.5 Flash | 84% | $0.0026 | 45.9s | |
| Qwen 3 32B | 84% | $0.0015 | 1.1m | |
| Qwen 3.5 122B | 84% | $0.016 | 47.3s | |
| ByteDance Seed 1.6 Flash | 84% | $0.0012 | 28.0s | |
| Mistral Small 4 (Reasoning) | 85% | $0.0019 | 29.4s | |
| Mistral Large 2 | 83% | $0.0097 | 32.4s | |
| Qwen 3.5 35B | 84% | $0.016 | 1.0m | |
| Mistral Medium 3.1 | 82% | $0.0040 | 36.4s | |
| Mistral Large | 85% | $0.010 | 27.4s | |
| Grok 4.20 (Beta, Reasoning) | 87% | $0.046 | 44.0s | |
| Stealth: Healer Alpha | 80% | $0.0000 | 26.3s | |
| LFM2 24B | 81% | $0.0001 | 19.3s | |
| Z.AI GLM 5 Turbo | 84% | $0.0085 | 41.1s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 (Reasoning, Low) | 90% | 99% | 90% | |
| GPT-5.4 Mini | 90% | 98% | 88% | |
| GPT-5.4 (Reasoning) | 89% | 97% | 87% | |
| GPT-5.4 Mini (Reasoning) | 87% | 98% | 85% | |
| GPT-5.4 Mini (Reasoning, Low) | 86% | 98% | 84% | |
| GPT-5.4 | 89% | 94% | 83% | |
| Mistral Small Creative | 86% | 96% | 83% | |
| Grok 4.20 (Beta, Reasoning) | 87% | 93% | 83% | |
| GPT-5.1 | 87% | 96% | 83% | |
| Claude Opus 4 | 86% | 96% | 82% | |
| Qwen 3.5 397B A17B | 85% | 98% | 82% | |
| o4 Mini High | 84% | 98% | 82% | |
| Qwen 3.5 Flash | 84% | 97% | 82% | |
| Gemini 3.1 Pro (Preview) | 83% | 97% | 81% | |
| DeepSeek V3 (2025-03-24) | 84% | 96% | 81% | |
| Mistral Large 2 | 83% | 96% | 80% | |
| Claude Opus 4.6 (Reasoning) | 85% | 93% | 80% | |
| Qwen 3.5 27B | 83% | 97% | 80% | |
| Mistral Small 4 | 84% | 95% | 80% | |
| Qwen 3.5 122B | 84% | 93% | 79% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini | 90% | $0.014 | 15.5s | 88% | |
| Mistral Small Creative | 86% | $0.0006 | 9.0s | 83% | |
| GPT-5.4 Mini (Reasoning, Low) | 86% | $0.015 | 16.6s | 84% | |
| GPT-5.4 Mini (Reasoning) | 87% | $0.021 | 24.2s | 85% | |
| GPT-5.4 (Reasoning, Low) | 90% | $0.052 | 1.4m | 90% | |
| Mistral Small 4 | 84% | $0.0011 | 16.9s | 80% | |
| Ministral 3 14B | 84% | $0.0004 | 9.5s | 79% | |
| Qwen 3.5 Flash | 84% | $0.0026 | 45.9s | 82% | |
| ByteDance Seed 1.6 Flash | 84% | $0.0012 | 28.0s | 79% | |
| Grok 4.20 (Beta, Reasoning) | 87% | $0.046 | 44.0s | 83% | |
| o4 Mini High | 84% | $0.021 | 39.0s | 82% | |
| Mistral Large 2 | 83% | $0.0097 | 32.4s | 80% | |
| Mistral Small 4 (Reasoning) | 85% | $0.0019 | 29.4s | 76% | |
| DeepSeek V3 (2025-03-24) | 84% | $0.0013 | 1.1m | 81% | |
| Mistral Large | 85% | $0.010 | 27.4s | 77% | |
| o4 Mini | 83% | $0.012 | 22.7s | 79% | |
| GPT-5.4 | 89% | $0.051 | 1.4m | 83% | |
| Mistral Large 3 | 84% | $0.0026 | 28.0s | 75% | |
| Mistral Medium 3.1 | 82% | $0.0040 | 36.4s | 79% | |
| Qwen 3.5 122B | 84% | $0.016 | 47.3s | 79% | |
Fantasy: entering an ancient ruin
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| GPT-5.4 Mini | 86% | $0.013 | 15.7s | |
| GPT-5.4 Mini (Reasoning, Low) | 84% | $0.014 | 15.9s | |
| Gemini 3.1 Flash Lite (Preview) | 81% | $0.0027 | 8.1s | |
| GPT-5.4 | 90% | $0.042 | 1.3m | |
| GPT-5.4 Mini (Reasoning) | 85% | $0.020 | 37.9s | |
| Mistral Small Creative | 78% | $0.0007 | 8.7s | |
| Qwen 3.5 Flash | 82% | $0.0039 | 1.1m | |
| Qwen 3.5 9B | 82% | $0.0009 | 1.3m | |
| GPT-5.4 (Reasoning, Low) | 90% | $0.049 | 1.2m | |
| Qwen3 235B A22B Instruct 2507 | 80% | $0.0010 | 41.5s | |
| Qwen 3.5 122B | 81% | $0.015 | 43.6s | |
| ByteDance Seed 1.6 Flash | 79% | $0.0013 | 29.3s | |
| Grok 4.20 (Beta, Reasoning) | 87% | $0.045 | 40.0s | |
| Grok 4.1 Fast | 78% | $0.0018 | 35.4s | |
| Mistral Small 4 | 77% | $0.0013 | 17.2s | |
| DeepSeek V3 (2025-03-24) | 77% | $0.0013 | 43.2s | |
| Rocinante 12B | 78% | $0.0010 | 21.7s | |
| LFM2 24B | 77% | $0.0002 | 28.5s | |
| Qwen 3 32B | 80% | $0.0016 | 1.0m | |
| GPT-5.4 Nano (Reasoning) | 78% | $0.0066 | 26.7s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 (Reasoning, Low) | 90% | 98% | 88% | |
| GPT-5.4 (Reasoning) | 90% | 96% | 87% | |
| GPT-5.4 | 90% | 95% | 86% | |
| GPT-5.1 | 87% | 97% | 86% | |
| Grok 4.20 (Beta, Reasoning) | 87% | 97% | 85% | |
| GPT-5.4 Mini | 86% | 95% | 82% | |
| Qwen 3.5 397B A17B | 86% | 96% | 82% | |
| Gemini 3.1 Pro (Preview) | 83% | 98% | 81% | |
| Qwen 3.5 Flash | 82% | 98% | 81% | |
| GPT-5.4 Mini (Reasoning) | 85% | 95% | 80% | |
| Claude Opus 4.6 (Reasoning) | 84% | 94% | 80% | |
| Qwen 3.5 35B | 83% | 95% | 80% | |
| Gemini 3.1 Flash Lite (Preview) | 81% | 98% | 80% | |
| GPT-5.4 Mini (Reasoning, Low) | 84% | 94% | 79% | |
| Qwen 3.5 122B | 81% | 96% | 79% | |
| o4 Mini High | 81% | 97% | 79% | |
| Qwen 3.5 9B | 82% | 95% | 79% | |
| Qwen3 235B A22B Instruct 2507 | 80% | 98% | 78% | |
| Qwen 3.5 27B | 80% | 96% | 77% | |
| GPT-5 | 82% | 93% | 77% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini | 86% | $0.013 | 15.7s | 82% | |
| GPT-5.4 | 90% | $0.042 | 1.3m | 86% | |
| GPT-5.4 (Reasoning, Low) | 90% | $0.049 | 1.2m | 88% | |
| Grok 4.20 (Beta, Reasoning) | 87% | $0.045 | 40.0s | 85% | |
| GPT-5.4 Mini (Reasoning, Low) | 84% | $0.014 | 15.9s | 79% | |
| Gemini 3.1 Flash Lite (Preview) | 81% | $0.0027 | 8.1s | 80% | |
| GPT-5.4 Mini (Reasoning) | 85% | $0.020 | 37.9s | 80% | |
| GPT-5.1 | 87% | $0.049 | 1.5m | 86% | |
| Qwen 3.5 Flash | 82% | $0.0039 | 1.1m | 81% | |
| Qwen3 235B A22B Instruct 2507 | 80% | $0.0010 | 41.5s | 78% | |
| Qwen 3.5 122B | 81% | $0.015 | 43.6s | 79% | |
| Qwen 3.5 9B | 82% | $0.0009 | 1.3m | 79% | |
| Qwen 3.5 35B | 83% | $0.019 | 1.2m | 80% | |
| GPT-5.4 (Reasoning) | 90% | $0.074 | 2.2m | 87% | |
| o4 Mini High | 81% | $0.025 | 47.7s | 79% | |
| Rocinante 12B | 78% | $0.0010 | 21.7s | 74% | |
| Mistral Small 4 | 77% | $0.0013 | 17.2s | 74% | |
| Grok 4.1 Fast | 78% | $0.0018 | 35.4s | 74% | |
| ByteDance Seed 1.6 Flash | 79% | $0.0013 | 29.3s | 73% | |
| GPT-5.4 Nano (Reasoning, Low) | 77% | $0.0058 | 22.7s | 74% | |
Mystery: examining a crime scene
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Healer Alpha | 85% | $0.0000 | 22.2s | |
| Grok 4.20 (Beta) | 87% | $0.016 | 16.9s | |
| Z.AI GLM 5 Turbo | 86% | $0.0062 | 27.2s | |
| Cohere Command R+ (Aug. 2024) | 89% | $0.017 | 52.5s | |
| GPT-5.4 Mini | 87% | $0.015 | 17.5s | |
| GPT-5.4 Mini (Reasoning) | 89% | $0.025 | 33.6s | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.014 | 16.6s | |
| Mistral Medium 3.1 | 84% | $0.0039 | 28.3s | |
| Mistral Small 4 (Reasoning) | 84% | $0.0021 | 28.4s | |
| Grok 4.20 (Beta, Reasoning) | 90% | $0.053 | 47.5s | |
| GPT-5.4 | 91% | $0.047 | 1.4m | |
| Mistral Small Creative | 82% | $0.0005 | 6.9s | |
| Hermes 3 405B | 82% | $0.0019 | 37.5s | |
| Claude Sonnet 4.6 | 85% | $0.026 | 39.4s | |
| Mistral Large 2 | 84% | $0.0090 | 23.4s | |
| Stealth: Hunter Alpha | 84% | $0.0000 | 42.9s | |
| Qwen 3.5 Flash | 84% | $0.0033 | 55.3s | |
| LFM2 24B | 80% | $0.0002 | 25.5s | |
| GPT-5.4 Nano (Reasoning, Low) | 84% | $0.0068 | 24.2s | |
| Mistral Large 3 | 81% | $0.0024 | 24.5s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 (Reasoning) | 93% | 96% | 90% | |
| GPT-5.4 | 91% | 97% | 89% | |
| Grok 4.20 (Beta, Reasoning) | 90% | 97% | 87% | |
| GPT-5.4 (Reasoning, Low) | 89% | 97% | 87% | |
| GPT-5.1 | 90% | 94% | 87% | |
| GPT-5.4 Mini (Reasoning) | 89% | 97% | 86% | |
| Z.AI GLM 5 Turbo | 86% | 98% | 84% | |
| GPT-5.4 Mini | 87% | 97% | 84% | |
| Grok 4.20 (Beta) | 87% | 94% | 83% | |
| Mistral Large 2 | 84% | 99% | 83% | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | 96% | 83% | |
| GPT-5 | 88% | 94% | 83% | |
| ByteDance Seed 2.0 Lite | 87% | 94% | 82% | |
| Mistral Medium 3.1 | 84% | 96% | 81% | |
| Claude Opus 4.6 (Reasoning) | 86% | 96% | 81% | |
| Qwen 3.5 397B A17B | 85% | 95% | 81% | |
| Qwen 3.5 9B | 83% | 97% | 81% | |
| Cohere Command R+ (Aug. 2024) | 89% | 92% | 81% | |
| Qwen 3.5 27B | 82% | 98% | 81% | |
| Claude Sonnet 4.6 (Reasoning) | 83% | 96% | 81% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning) | 89% | $0.025 | 33.6s | 86% | |
| GPT-5.4 | 91% | $0.047 | 1.4m | 89% | |
| Z.AI GLM 5 Turbo | 86% | $0.0062 | 27.2s | 84% | |
| GPT-5.4 Mini | 87% | $0.015 | 17.5s | 84% | |
| GPT-5.4 Mini (Reasoning, Low) | 87% | $0.014 | 16.6s | 83% | |
| Grok 4.20 (Beta) | 87% | $0.016 | 16.9s | 83% | |
| Grok 4.20 (Beta, Reasoning) | 90% | $0.053 | 47.5s | 87% | |
| Cohere Command R+ (Aug. 2024) | 89% | $0.017 | 52.5s | 81% | |
| Stealth: Healer Alpha | 85% | $0.0000 | 22.2s | 80% | |
| Mistral Large 2 | 84% | $0.0090 | 23.4s | 83% | |
| Mistral Medium 3.1 | 84% | $0.0039 | 28.3s | 81% | |
| Stealth: Hunter Alpha | 84% | $0.0000 | 42.9s | 80% | |
| GPT-5.4 (Reasoning, Low) | 89% | $0.055 | 1.4m | 87% | |
| Qwen3 235B A22B Instruct 2507 | 85% | $0.0008 | 1.0m | 80% | |
| GPT-5.1 | 90% | $0.053 | 2.2m | 87% | |
| GPT-5.4 Nano (Reasoning, Low) | 84% | $0.0068 | 24.2s | 79% | |
| GPT-5.4 Nano (Reasoning) | 84% | $0.0079 | 32.4s | 80% | |
| ByteDance Seed 2.0 Lite | 87% | $0.013 | 2.4m | 82% | |
| Mistral Small 4 (Reasoning) | 84% | $0.0021 | 28.4s | 78% | |
| GPT-5.4 (Reasoning) | 93% | $0.087 | 2.5m | 90% | |
Horror: alone in an eerie place at night
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Rocinante 12B | 84% | $0.0006 | 11.9s | |
| GPT-5.4 Mini | 85% | $0.013 | 15.4s | |
| GPT-5.4 Mini (Reasoning) | 86% | $0.014 | 20.5s | |
| Z.AI GLM 5 Turbo | 84% | $0.0060 | 27.7s | |
| GPT-5.4 Nano | 84% | $0.0060 | 23.9s | |
| GPT-5.4 Mini (Reasoning, Low) | 86% | $0.012 | 15.0s | |
| Mistral Small 4 (Reasoning) | 82% | $0.0017 | 23.6s | |
| Mistral Medium 3.1 | 83% | $0.0030 | 25.4s | |
| Writer: Palmyra X5 | 84% | $0.010 | 20.4s | |
| Ministral 8B | 80% | $0.0002 | 5.2s | |
| Grok 4.20 (Beta) | 86% | $0.017 | 17.0s | |
| Hermes 3 405B | 82% | $0.0020 | 35.8s | |
| GPT-5.4 Nano (Reasoning, Low) | 82% | $0.0060 | 21.9s | |
| Qwen3 235B A22B Instruct 2507 | 85% | $0.0008 | 1.0m | |
| GPT-5.4 | 90% | $0.040 | 1.2m | |
| Aion 2.0 | 83% | $0.0050 | 1.1m | |
| Qwen 3.5 9B | 83% | $0.0018 | 2.7m | |
| Qwen 3 32B | 82% | $0.0012 | 30.5s | |
| Claude Haiku 4.5 | 81% | $0.0093 | 19.7s | |
| Grok 4.1 Fast | 81% | $0.0014 | 31.7s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| GPT-5.4 (Reasoning, Low) | 90% | 97% | 87% | |
| GPT-5.4 (Reasoning) | 92% | 93% | 87% | |
| Gemini 3.1 Pro (Preview) | 87% | 99% | 85% | |
| GPT-5.4 | 90% | 93% | 84% | |
| GPT-5.4 Mini (Reasoning) | 86% | 98% | 84% | |
| GPT-5.1 | 88% | 95% | 84% | |
| GPT-5.4 Mini | 85% | 96% | 83% | |
| GPT-5.4 Nano | 84% | 97% | 82% | |
| Qwen 3.5 397B A17B | 86% | 95% | 82% | |
| Writer: Palmyra X5 | 84% | 97% | 82% | |
| Z.AI GLM 5 | 85% | 96% | 82% | |
| GPT-5.4 Mini (Reasoning, Low) | 86% | 96% | 82% | |
| Qwen 3.5 27B | 82% | 98% | 81% | |
| GPT-5.4 Nano (Reasoning) | 83% | 97% | 80% | |
| Qwen 3.5 35B | 84% | 96% | 80% | |
| Cohere Command R+ (Aug. 2024) | 81% | 98% | 80% | |
| MiniMax M2.7 | 83% | 95% | 80% | |
| Grok 4.20 (Beta) | 86% | 93% | 80% | |
| Grok 4.20 (Beta, Reasoning) | 85% | 92% | 80% | |
| Qwen3 235B A22B Instruct 2507 | 85% | 95% | 80% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 Mini (Reasoning) | 86% | $0.014 | 20.5s | 84% | |
| GPT-5.4 Mini (Reasoning, Low) | 86% | $0.012 | 15.0s | 82% | |
| GPT-5.4 Mini | 85% | $0.013 | 15.4s | 83% | |
| GPT-5.4 Nano | 84% | $0.0060 | 23.9s | 82% | |
| GPT-5.4 (Reasoning, Low) | 90% | $0.041 | 1.1m | 87% | |
| Writer: Palmyra X5 | 84% | $0.010 | 20.4s | 82% | |
| Z.AI GLM 5 | 85% | $0.0061 | 59.0s | 82% | |
| Grok 4.20 (Beta) | 86% | $0.017 | 17.0s | 80% | |
| Rocinante 12B | 84% | $0.0006 | 11.9s | 78% | |
| Qwen3 235B A22B Instruct 2507 | 85% | $0.0008 | 1.0m | 80% | |
| GPT-5.4 Nano (Reasoning) | 83% | $0.0054 | 24.4s | 80% | |
| Mistral Medium 3.1 | 83% | $0.0030 | 25.4s | 80% | |
| GPT-5.4 | 90% | $0.040 | 1.2m | 84% | |
| Mistral Small 4 (Reasoning) | 82% | $0.0017 | 23.6s | 79% | |
| MiniMax M2.7 | 83% | $0.0040 | 55.8s | 80% | |
| GPT-5.4 Nano (Reasoning, Low) | 82% | $0.0060 | 21.9s | 79% | |
| Qwen 3.5 Flash | 83% | $0.0021 | 44.4s | 79% | |
| Qwen 3.5 397B A17B | 86% | $0.0084 | 2.0m | 82% | |
| Z.AI GLM 5 Turbo | 84% | $0.0060 | 27.7s | 77% | |
| Qwen 3 32B | 82% | $0.0012 | 30.5s | 78% | |