Voice/dialogue sheets
Extract dialogue from given text as voice sheets.
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Healer Alpha | 76% | $0.0000 | 5.0s | |
| Gemini 2.5 Flash Lite | 66% | $0.0001 | 610ms | |
| Mistral Small 3.2 24B | 72% | $0.0001 | 2.1s | |
| Stealth: Hunter Alpha | 70% | $0.0000 | 18.7s | |
| Mistral Small 4 | 60% | $0.0001 | 1.3s | |
| GPT-4o Mini (temp=0) | 60% | $0.0001 | 3.7s | |
| Llama 3.1 8B | 54% | $0.0001 | 919ms | |
| Qwen3 235B A22B Instruct 2507 | 72% | $0.0001 | 4.9s | |
| DeepSeek-V2 Chat | 80% | $0.0001 | 8.2s | |
| Gemma 3 12B | 52% | $0.0000 | 4.1s | |
| Z.AI GLM 4.5 | 76% | $0.0004 | 5.3s | |
| Hermes 3 405B | 58% | $0.0000 | 13.3s | |
| Gemini 3.1 Flash Lite (Preview) | 70% | $0.0003 | 979ms | |
| DeepSeek V3 (2025-03-24) | 84% | $0.0003 | 6.1s | |
| ByteDance Seed 1.6 Flash | 66% | $0.0002 | 4.5s | |
| Grok 4 Fast | 84% | $0.0003 | 3.6s | |
| GPT-4o Mini (temp=1) | 60% | $0.0001 | 15.0s | |
| Hermes 3 70B | 68% | $0.0002 | 6.0s | |
| DeepSeek V3.1 | 76% | $0.0002 | 8.4s | |
| GPT-4.1 Mini | 76% | $0.0003 | 2.7s | |
Cost vs Performance
Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| Claude Sonnet 4 | 100% | 100% | 100% | |
| Claude 3.7 Sonnet | 100% | 100% | 100% | |
| GPT-4.1 | 96% | 61% | 61% | |
| GPT-4o, May 13th (temp=0) | 96% | 61% | 61% | |
| ByteDance Seed 1.6 | 94% | 53% | 53% | |
| GPT-4o, Aug. 6th (temp=0) | 92% | 46% | 46% | |
| Claude Sonnet 4.6 (Reasoning) | 90% | 40% | 40% | |
| GPT-4o, May 13th (temp=1) | 90% | 40% | 40% | |
| Qwen 3.5 122B | 88% | 35% | 35% | |
| Gemini 2.5 Flash | 88% | 35% | 35% | |
| Claude Opus 4 | 86% | 31% | 31% | |
| Qwen 3.5 Plus (2026-02-15) | 86% | 31% | 31% | |
| DeepSeek V3 (2025-03-24) | 84% | 27% | 27% | |
| Grok 4.1 Fast | 84% | 27% | 27% | |
| Grok 4 Fast | 84% | 27% | 27% | |
| Claude Sonnet 4.5 | 84% | 27% | 27% | |
| Qwen 3.5 397B A17B | 82% | 23% | 23% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Claude Sonnet 4.6 | 100% | $0.0033 | 1.9s | 100% | |
| Claude Sonnet 4 | 100% | $0.0033 | 2.8s | 100% | |
| Claude 3.7 Sonnet | 100% | $0.0034 | 3.5s | 100% | |
| Claude Opus 4.6 | 100% | $0.0055 | 3.9s | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | $0.0060 | 3.1s | 100% | |
| GPT-4.1 | 96% | $0.0017 | 2.6s | 61% | |
| GPT-4o, May 13th (temp=0) | 96% | $0.0037 | 5.2s | 61% | |
| GPT-4o, Aug. 6th (temp=0) | 92% | $0.0022 | 2.4s | 46% | |
| Gemini 2.5 Flash | 88% | $0.0005 | 933ms | 35% | |
| ByteDance Seed 1.6 | 94% | $0.0013 | 14.2s | 53% | |
| Qwen 3.5 Plus (2026-02-15) | 86% | $0.0005 | 6.6s | 31% | |
| Grok 4 Fast | 84% | $0.0003 | 3.6s | 27% | |
| Grok 4.1 Fast | 84% | $0.0004 | 4.4s | 27% | |
| GPT-4o, May 13th (temp=1) | 90% | $0.0037 | 5.2s | 40% | |
| DeepSeek V3 (2025-03-24) | 84% | $0.0003 | 6.1s | 27% | |
| Claude Haiku 4.5 | 82% | $0.0011 | 1.6s | 23% | |
| Gemini 3 Flash (Preview) | 80% | $0.0006 | 1.8s | 20% | |
| Claude Sonnet 4.6 (Reasoning) | 90% | $0.0053 | 3.3s | 40% | |
| Mistral Large 3 | 80% | $0.0004 | 4.1s | 20% | |
| GPT-4o, Aug. 6th (temp=1) | 82% | $0.0022 | 2.3s | 23% | |
| Model | Total â–¼ | Simple | Simple (1-shot) | Simple (5-shot) | Multiple speakers | Unattributed dialogue |
|---|---|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% |
| Claude Opus 4.6 | 100% | 100% | 100% | 100% | 100% | 100% |
| Claude Sonnet 4.6 | 100% | 100% | 100% | 100% | 100% | 100% |
| Claude Sonnet 4 | 100% | 100% | 100% | 100% | 100% | 100% |
| Claude 3.7 Sonnet | 100% | 100% | 100% | 100% | 100% | 100% |
| GPT-4.1 | 96% | 90% | 100% | 100% | 90% | 100% |
| GPT-4o, May 13th (temp=0) | 96% | 100% | 100% | 100% | 80% | 100% |
| ByteDance Seed 1.6 | 94% | 90% | 100% | 100% | 80% | 100% |
| GPT-4o, Aug. 6th (temp=0) | 92% | 100% | 100% | 100% | 60% | 100% |
| Claude Sonnet 4.6 (Reasoning) | 90% | 100% | 90% | 60% | 100% | 100% |
| GPT-4o, May 13th (temp=1) | 90% | 90% | 100% | 70% | 90% | 100% |
| Qwen 3.5 122B | 88% | 60% | 90% | 100% | 90% | 100% |
| Gemini 2.5 Flash | 88% | 60% | 100% | 90% | 100% | 90% |
| Claude Opus 4 | 86% | 100% | 100% | 100% | 100% | 30% |
| Qwen 3.5 Plus (2026-02-15) | 86% | 100% | 90% | 80% | 100% | 60% |
Simple
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 90% | $0.0000 | 446ms | |
| Mistral Small 4 | 90% | $0.0001 | 1.2s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0001 | 808ms | |
| Qwen3 235B A22B Instruct 2507 | 90% | $0.0000 | 1.7s | |
| Gemini 2.5 Flash | 60% | $0.0002 | 697ms | |
| Llama 3.1 70B | 70% | $0.0002 | 1.1s | |
| Mistral Large 3 | 100% | $0.0001 | 1.4s | |
| GPT-4.1 Mini | 90% | $0.0001 | 2.4s | |
| Mistral Small 3.2 24B | 60% | $0.0000 | 1.6s | |
| DeepSeek V3 (2024-12-26) | 100% | $0.0001 | 2.4s | |
| Grok 4 Fast | 80% | $0.0002 | 2.3s | |
| Grok 4.20 (Beta) | 100% | $0.0005 | 613ms | |
| GPT-4o Mini (temp=0) | 100% | $0.0001 | 3.1s | |
| Z.AI GLM 4.5 | 100% | $0.0001 | 2.7s | |
| DeepSeek V3 (2025-03-24) | 90% | $0.0001 | 3.5s | |
| Claude 3.5 Haiku | 100% | $0.0004 | 1.6s | |
| Stealth: Hunter Alpha | 90% | $0.0000 | 10.1s | |
| Hermes 3 70B | 60% | $0.0001 | 2.9s | |
| DeepSeek-V2 Chat | 100% | $0.0000 | 4.3s | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0002 | 3.6s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| Claude Sonnet 4 | 100% | 100% | 100% | |
| Claude Opus 4 | 100% | 100% | 100% | |
| Z.AI GLM 4.5 | 100% | 100% | 100% | |
| Qwen 3.5 Plus (2026-02-15) | 100% | 100% | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | 100% | 100% | |
| Mistral Large 3 | 100% | 100% | 100% | |
| GPT-4o, May 13th (temp=0) | 100% | 100% | 100% | |
| DeepSeek-V2 Chat | 100% | 100% | 100% | |
| Claude 3.5 Sonnet | 100% | 100% | 100% | |
| Grok 4.20 (Beta) | 100% | 100% | 100% | |
| Claude 3.5 Haiku | 100% | 100% | 100% | |
| DeepSeek V3 (2024-12-26) | 100% | 100% | 100% | |
| Claude 3.7 Sonnet | 100% | 100% | 100% | |
| Hermes 3 405B | 100% | 100% | 100% | |
| GPT-4o, Aug. 6th (temp=1) | 100% | 100% | 100% | |
| GPT-4o, Aug. 6th (temp=0) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0001 | 808ms | 100% | |
| Mistral Large 3 | 100% | $0.0001 | 1.4s | 100% | |
| Grok 4.20 (Beta) | 100% | $0.0005 | 613ms | 100% | |
| DeepSeek V3 (2024-12-26) | 100% | $0.0001 | 2.4s | 100% | |
| Claude 3.5 Haiku | 100% | $0.0004 | 1.6s | 100% | |
| Z.AI GLM 4.5 | 100% | $0.0001 | 2.7s | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0001 | 3.1s | 100% | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0002 | 3.6s | 100% | |
| DeepSeek-V2 Chat | 100% | $0.0000 | 4.3s | 100% | |
| GPT-4o, Aug. 6th (temp=0) | 100% | $0.0009 | 1.5s | 100% | |
| GPT-4o, Aug. 6th (temp=1) | 100% | $0.0009 | 1.5s | 100% | |
| Claude Sonnet 4.6 | 100% | $0.0013 | 1.4s | 100% | |
| Claude Sonnet 4 | 100% | $0.0013 | 2.1s | 100% | |
| Writer: Palmyra X5 | 100% | $0.0004 | 5.9s | 100% | |
| Claude 3.7 Sonnet | 100% | $0.0016 | 2.3s | 100% | |
| GPT-4o, May 13th (temp=0) | 100% | $0.0014 | 3.9s | 100% | |
| Claude 3.5 Sonnet | 100% | $0.0015 | 4.0s | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | $0.0023 | 2.3s | 100% | |
| Hermes 3 405B | 100% | $0.0000 | 11.1s | 100% | |
| Claude Opus 4.6 | 100% | $0.0022 | 4.3s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 50.0% | Matches Regex |
Simple (1-shot)
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | ||
|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | |
| Z.AI GLM 5 Turbo | 100% | |
| GPT-5.4 (Reasoning) | 100% | |
| GPT-5 Mini | 100% | |
| GPT-5.1 | 100% | |
| Claude Opus 4.6 | 100% | |
| GPT-5 | 100% | |
| Qwen 3.5 397B A17B | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | |
| Claude Sonnet 4.6 | 100% | |
| ByteDance Seed 1.6 | 100% | |
| o4 Mini High | 100% | |
| Claude Opus 4.5 | 100% | |
| Claude Sonnet 4 | 100% | |
| GPT-4.1 | 100% | |
| o4 Mini | 100% | |
| Claude Sonnet 4.5 | 100% | |
| Qwen 3.5 35B | 100% | |
| Claude Opus 4 | 100% | |
| ByteDance Seed 2.0 Mini | 100% | |
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Healer Alpha | 100% | $0.0000 | 3.5s | |
| Gemma 3 12B | 100% | $0.0000 | 1.9s | |
| Mistral Small Creative | 90% | $0.0000 | 877ms | |
| Ministral 3 8B | 100% | $0.0000 | 644ms | |
| Mistral Small 3.2 24B | 100% | $0.0000 | 1.2s | |
| Gemini 2.5 Flash Lite | 90% | $0.0000 | 451ms | |
| GPT-4.1 Nano | 100% | $0.0000 | 1.6s | |
| Mistral Small 4 | 100% | $0.0001 | 1.1s | |
| Llama 3.1 8B | 60% | $0.0001 | 801ms | |
| DeepSeek-V2 Chat | 100% | $0.0000 | 4.4s | |
| Hermes 3 405B | 90% | $0.0000 | 9.7s | |
| GPT-4o Mini (temp=0) | 100% | $0.0001 | 3.2s | |
| Qwen3 235B A22B Instruct 2507 | 100% | $0.0001 | 4.5s | |
| GPT-5.4 Nano (Reasoning, Low) | 70% | $0.0001 | 1.0s | |
| DeepSeek V3.1 | 90% | $0.0001 | 5.8s | |
| Hermes 3 70B | 100% | $0.0001 | 3.5s | |
| DeepSeek V3 (2025-03-24) | 100% | $0.0001 | 3.0s | |
| Rocinante 12B | 80% | $0.0001 | 8.0s | |
| Claude 3 Haiku | 100% | $0.0001 | 3.6s | |
| Gemini 3.1 Flash Lite (Preview) | 70% | $0.0001 | 801ms | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| ByteDance Seed 1.6 | 100% | 100% | 100% | |
| o4 Mini High | 100% | 100% | 100% | |
| Claude Opus 4.5 | 100% | 100% | 100% | |
| Claude Sonnet 4 | 100% | 100% | 100% | |
| GPT-4.1 | 100% | 100% | 100% | |
| o4 Mini | 100% | 100% | 100% | |
| Claude Sonnet 4.5 | 100% | 100% | 100% | |
| Qwen 3.5 35B | 100% | 100% | 100% | |
| Claude Opus 4 | 100% | 100% | 100% | |
| ByteDance Seed 2.0 Mini | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Ministral 3 8B | 100% | $0.0000 | 644ms | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0000 | 1.2s | 100% | |
| Gemma 3 12B | 100% | $0.0000 | 1.9s | 100% | |
| Mistral Small 4 | 100% | $0.0001 | 1.1s | 100% | |
| GPT-4.1 Nano | 100% | $0.0000 | 1.6s | 100% | |
| Stealth: Healer Alpha | 100% | $0.0000 | 3.5s | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0001 | 3.2s | 100% | |
| DeepSeek-V2 Chat | 100% | $0.0000 | 4.4s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0002 | 666ms | 100% | |
| DeepSeek V3 (2025-03-24) | 100% | $0.0001 | 3.0s | 100% | |
| Qwen3 235B A22B Instruct 2507 | 100% | $0.0001 | 4.5s | 100% | |
| Hermes 3 70B | 100% | $0.0001 | 3.5s | 100% | |
| GPT-4.1 Mini | 100% | $0.0002 | 2.3s | 100% | |
| Claude 3 Haiku | 100% | $0.0001 | 3.6s | 100% | |
| Z.AI GLM 4.5 | 100% | $0.0002 | 2.7s | 100% | |
| WizardLM 2 8x22b | 100% | $0.0002 | 4.5s | 100% | |
| GPT-5.4 Mini | 100% | $0.0004 | 914ms | 100% | |
| Grok 4.20 (Beta) | 100% | $0.0005 | 552ms | 100% | |
| Claude 3.5 Haiku | 100% | $0.0004 | 1.1s | 100% | |
| Claude Haiku 4.5 | 100% | $0.0005 | 1.2s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 90.0% | Matches Regex |
Simple (5-shot)
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | ||
|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | |
| Z.AI GLM 5 Turbo | 100% | |
| GPT-5.4 (Reasoning) | 100% | |
| GPT-5.1 | 100% | |
| Claude Opus 4.6 | 100% | |
| Qwen 3.5 397B A17B | 100% | |
| Qwen 3.5 122B | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | |
| Claude Sonnet 4.6 | 100% | |
| ByteDance Seed 1.6 | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | |
| Claude Opus 4.5 | 100% | |
| Grok 4.1 Fast | 100% | |
| Claude Sonnet 4 | 100% | |
| GPT-4.1 | 100% | |
| Claude Sonnet 4.5 | 100% | |
| Qwen 3.5 35B | 100% | |
| Claude Opus 4 | 100% | |
| ByteDance Seed 2.0 Mini | 100% | |
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemma 3 4B | 100% | $0.0000 | 929ms | |
| Mistral Small Creative | 100% | $0.0001 | 653ms | |
| Gemini 2.5 Flash Lite | 70% | $0.0001 | 493ms | |
| Ministral 3 8B | 100% | $0.0001 | 633ms | |
| Mistral Small 3.2 24B | 100% | $0.0001 | 1.2s | |
| Mistral Small 4 | 100% | $0.0001 | 844ms | |
| Llama 3.1 8B | 80% | $0.0002 | 739ms | |
| Ministral 3 14B | 100% | $0.0002 | 779ms | |
| GPT-4o Mini (temp=0) | 100% | $0.0002 | 959ms | |
| Gemma 3 12B | 100% | $0.0000 | 1.7s | |
| GPT-4o Mini (temp=1) | 100% | $0.0002 | 977ms | |
| Mistral NeMO | 60% | $0.0001 | 1.3s | |
| GPT-4.1 Nano | 100% | $0.0001 | 1.8s | |
| GPT-5.4 Nano | 100% | $0.0002 | 1.1s | |
| GPT-5.4 Nano (Reasoning, Low) | 100% | $0.0002 | 1.2s | |
| Gemini 3.1 Flash Lite (Preview) | 90% | $0.0003 | 754ms | |
| Claude 3 Haiku | 100% | $0.0003 | 1.1s | |
| GPT-5.4 Nano (Reasoning) | 80% | $0.0003 | 1.6s | |
| Arcee AI: Trinity Large (Preview) | 100% | $0.0000 | 3.1s | |
| Gemini 2.5 Flash | 90% | $0.0004 | 583ms | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| ByteDance Seed 1.6 | 100% | 100% | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.5 | 100% | 100% | 100% | |
| Grok 4.1 Fast | 100% | 100% | 100% | |
| Claude Sonnet 4 | 100% | 100% | 100% | |
| GPT-4.1 | 100% | 100% | 100% | |
| Claude Sonnet 4.5 | 100% | 100% | 100% | |
| Qwen 3.5 35B | 100% | 100% | 100% | |
| Claude Opus 4 | 100% | 100% | 100% | |
| ByteDance Seed 2.0 Mini | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Mistral Small Creative | 100% | $0.0001 | 653ms | 100% | |
| Gemma 3 4B | 100% | $0.0000 | 929ms | 100% | |
| Ministral 3 8B | 100% | $0.0001 | 633ms | 100% | |
| Ministral 3 14B | 100% | $0.0002 | 779ms | 100% | |
| Mistral Small 4 | 100% | $0.0001 | 844ms | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0002 | 959ms | 100% | |
| GPT-4o Mini (temp=1) | 100% | $0.0002 | 977ms | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0001 | 1.2s | 100% | |
| Gemma 3 12B | 100% | $0.0000 | 1.7s | 100% | |
| GPT-5.4 Nano | 100% | $0.0002 | 1.1s | 100% | |
| GPT-5.4 Nano (Reasoning, Low) | 100% | $0.0002 | 1.2s | 100% | |
| Claude 3 Haiku | 100% | $0.0003 | 1.1s | 100% | |
| GPT-4.1 Nano | 100% | $0.0001 | 1.8s | 100% | |
| Mistral Large 3 | 100% | $0.0005 | 1.4s | 100% | |
| Arcee AI: Trinity Large (Preview) | 100% | $0.0000 | 3.1s | 100% | |
| Grok 4.20 (Beta) | 100% | $0.0007 | 601ms | 100% | |
| GPT-4.1 Mini | 100% | $0.0004 | 1.9s | 100% | |
| Llama 3.1 70B | 100% | $0.0008 | 993ms | 100% | |
| Z.AI GLM 4.5 | 100% | $0.0003 | 2.8s | 100% | |
| GPT-5.4 Mini | 100% | $0.0009 | 1.1s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Matches Regex |
Multiple speakers
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Hunter Alpha | 60% | $0.0000 | 15.5s | |
| Stealth: Healer Alpha | 60% | $0.0000 | 4.7s | |
| Gemini 2.5 Flash Lite (Reasoning) | 90% | $0.0005 | 3.8s | |
| GPT-4.1 Mini | 90% | $0.0006 | 3.8s | |
| DeepSeek V3.2 | 80% | $0.0002 | 9.0s | |
| Gemini 2.5 Flash | 100% | $0.0008 | 1.3s | |
| Grok 4.1 Fast | 60% | $0.0005 | 5.3s | |
| Grok 4 Fast | 60% | $0.0005 | 5.5s | |
| Gemini 3 Flash (Preview) | 100% | $0.0010 | 2.0s | |
| Gemini 2.5 Flash (Reasoning) | 60% | $0.0017 | 3.1s | |
| DeepSeek V3.1 | 70% | $0.0003 | 13.7s | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0008 | 9.9s | |
| Claude Haiku 4.5 | 100% | $0.0018 | 2.1s | |
| GPT-4.1 | 90% | $0.0028 | 3.3s | |
| Mistral Large 2 | 100% | $0.0029 | 8.3s | |
| ByteDance Seed 1.6 | 80% | $0.0024 | 25.7s | |
| Z.AI GLM 5 | 60% | $0.0024 | 13.1s | |
| GPT-4o, Aug. 6th (temp=0) | 60% | $0.0034 | 3.4s | |
| Aion 2.0 | 70% | $0.0021 | 25.5s | |
| ByteDance Seed 2.0 Lite | 100% | $0.0029 | 33.8s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| Gemini 3 Pro (Preview) | 100% | 100% | 100% | |
| Claude Sonnet 4 | 100% | 100% | 100% | |
| Claude Sonnet 4.5 | 100% | 100% | 100% | |
| Claude Opus 4 | 100% | 100% | 100% | |
| Qwen 3.5 Plus (2026-02-15) | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview) | 100% | 100% | 100% | |
| Claude Haiku 4.5 | 100% | 100% | 100% | |
| ByteDance Seed 2.0 Lite | 100% | 100% | 100% | |
| Claude 3.7 Sonnet | 100% | 100% | 100% | |
| Mistral Large 2 | 100% | 100% | 100% | |
| Gemini 2.5 Flash | 100% | 100% | 100% | |
| Qwen 3.5 122B | 90% | 40% | 40% | |
| GPT-4.1 | 90% | 40% | 40% | |
| Gemini 2.5 Flash Lite (Reasoning) | 90% | 40% | 40% | |
| GPT-4o, May 13th (temp=1) | 90% | 40% | 40% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash | 100% | $0.0008 | 1.3s | 100% | |
| Gemini 3 Flash (Preview) | 100% | $0.0010 | 2.0s | 100% | |
| Claude Haiku 4.5 | 100% | $0.0018 | 2.1s | 100% | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0008 | 9.9s | 100% | |
| Mistral Large 2 | 100% | $0.0029 | 8.3s | 100% | |
| Claude Sonnet 4.6 | 100% | $0.0054 | 2.6s | 100% | |
| Claude Sonnet 4.5 | 100% | $0.0054 | 3.5s | 100% | |
| Claude Sonnet 4 | 100% | $0.0054 | 4.0s | 100% | |
| Claude 3.7 Sonnet | 100% | $0.0055 | 4.7s | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | $0.0060 | 3.2s | 100% | |
| Claude Opus 4.6 | 100% | $0.0091 | 5.0s | 100% | |
| Claude Opus 4.6 (Reasoning) | 100% | $0.0098 | 4.1s | 100% | |
| ByteDance Seed 2.0 Lite | 100% | $0.0029 | 33.8s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 90% | $0.0005 | 3.8s | 40% | |
| GPT-4.1 Mini | 90% | $0.0006 | 3.8s | 40% | |
| GPT-4.1 | 90% | $0.0028 | 3.3s | 40% | |
| Gemini 3 Pro (Preview) | 100% | $0.021 | 12.9s | 100% | |
| GPT-4o, May 13th (temp=1) | 90% | $0.0057 | 8.8s | 40% | |
| Gemini 3.1 Pro (Preview) | 100% | $0.023 | 20.9s | 100% | |
| DeepSeek V3.2 | 80% | $0.0002 | 9.0s | 20% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 20.0% | Matches Regex |
Unattributed dialogue
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Aurora Alpha | 100% | — | 971ms | |
| Gemini 2.5 Flash Lite | 70% | $0.0001 | 781ms | |
| Llama 3.1 8B | 90% | $0.0001 | 1.2s | |
| Inception Mercury | 100% | $0.0002 | 1.0s | |
| Inception Mercury 2 | 100% | $0.0005 | 815ms | |
| Mistral Small 3.2 24B | 100% | $0.0001 | 3.5s | |
| Gemini 3.1 Flash Lite (Preview) | 60% | $0.0005 | 1.2s | |
| Llama 3.1 70B | 100% | $0.0005 | 2.3s | |
| Grok 4 Fast | 100% | $0.0003 | 3.2s | |
| Gemini 2.5 Flash | 90% | $0.0007 | 1.4s | |
| Grok 4.1 Fast | 100% | $0.0003 | 4.0s | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0004 | 3.4s | |
| Stealth: Healer Alpha | 90% | $0.0000 | 5.6s | |
| ByteDance Seed 1.6 Flash | 70% | $0.0003 | 5.0s | |
| Mistral Large 3 | 100% | $0.0005 | 4.0s | |
| Gemini 3 Flash (Preview) | 100% | $0.0009 | 2.6s | |
| Gemma 3 27B | 100% | $0.0001 | 8.6s | |
| Nemotron 3 Super | 100% | $0.0000 | 10.6s | |
| Qwen 2.5 72B | 100% | $0.0002 | 8.2s | |
| Mistral Small 4 (Reasoning) | 80% | $0.0006 | 5.3s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| ByteDance Seed 1.6 | 100% | 100% | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | 100% | 100% | |
| o4 Mini High | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Stealth: Aurora Alpha | 100% | — | 971ms | 100% | |
| Inception Mercury | 100% | $0.0002 | 1.0s | 100% | |
| Inception Mercury 2 | 100% | $0.0005 | 815ms | 100% | |
| Llama 3.1 70B | 100% | $0.0005 | 2.3s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0001 | 3.5s | 100% | |
| Grok 4 Fast | 100% | $0.0003 | 3.2s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0004 | 3.4s | 100% | |
| Gemini 3 Flash (Preview) | 100% | $0.0009 | 2.6s | 100% | |
| Grok 4.1 Fast | 100% | $0.0003 | 4.0s | 100% | |
| Claude Haiku 4.5 | 100% | $0.0016 | 1.9s | 100% | |
| Mistral Large 3 | 100% | $0.0005 | 4.0s | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | $0.0020 | 3.0s | 100% | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0023 | 3.6s | 100% | |
| GPT-4.1 | 100% | $0.0024 | 4.2s | 100% | |
| Qwen 2.5 72B | 100% | $0.0002 | 8.2s | 100% | |
| Gemma 3 27B | 100% | $0.0001 | 8.6s | 100% | |
| GPT-4o, Aug. 6th (temp=0) | 100% | $0.0030 | 3.5s | 100% | |
| MiniMax M2.5 | 100% | $0.0008 | 8.3s | 100% | |
| MiniMax M2.7 | 100% | $0.0009 | 8.7s | 100% | |
| Nemotron 3 Super | 100% | $0.0000 | 10.6s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 90.0% | Matches Regex |