Novel outline
Handle questions about the outline of a novel in various formats
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Inception Mercury | 73% | $0.0000 | 679ms | |
| Stealth: Aurora Alpha | 88% | — | 1.3s | |
| Inception Mercury 2 | 86% | $0.0003 | 567ms | |
| Gemini 2.5 Flash Lite | 57% | $0.0002 | 575ms | |
| Ministral 3 3B | 57% | $0.0002 | 1.3s | |
| Gemini 2.5 Flash Lite (Reasoning) | 98% | $0.0004 | 2.5s | |
| Ministral 3 8B | 67% | $0.0002 | 1.8s | |
| Mistral NeMO | 68% | $0.0002 | 2.6s | |
| DeepSeek V4 Flash | 63% | $0.0001 | 3.1s | |
| Mistral Small 4 | 56% | $0.0002 | 1.7s | |
| Mistral Small Creative | 58% | $0.0003 | 2.3s | |
| Llama 3.1 8B | 54% | $0.0002 | 2.0s | |
| GPT-5.4 Nano (Reasoning, Low) | 88% | $0.0005 | 3.0s | |
| GPT-4.1 Mini | 66% | $0.0003 | 2.5s | |
| Gemini 3.1 Flash Lite (Preview) | 83% | $0.0007 | 1.2s | |
| Ministral 3 14B | 69% | $0.0003 | 3.3s | |
| Gemini 3.1 Flash Lite | 86% | $0.0007 | 1.5s | |
| Arcee AI: Trinity Mini | 59% | $0.0002 | 3.9s | |
| Gemini 3.1 Flash Lite (Reasoning) | 86% | $0.0007 | 1.2s | |
| GPT-5.4 Nano (Reasoning) | 85% | $0.0005 | 3.1s | |
Cost vs Performance
Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| Aion 2.0 | 100% | 100% | 100% | |
| Qwen 3.6 35B | 100% | 100% | 100% | |
| Gemini 3 Pro (Preview) | 100% | 100% | 100% | |
| Gemini 2.5 Pro | 100% | 100% | 100% | |
| Qwen 3.5 35B | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 (Reasoning, Low) | 100% | $0.0030 | 3.1s | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | $0.0027 | 4.0s | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | $0.0050 | 3.0s | 100% | |
| Qwen 3.6 35B | 100% | $0.0019 | 10.0s | 100% | |
| GPT-5.4 (Reasoning) | 100% | $0.0034 | 8.1s | 100% | |
| Qwen 3.5 Flash | 100% | $0.0009 | 15.4s | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | $0.010 | 3.1s | 100% | |
| Aion 2.0 | 100% | $0.0021 | 15.6s | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | $0.0022 | 15.5s | 100% | |
| GPT-5.5 (Reasoning) | 100% | $0.011 | 3.8s | 100% | |
| o4 Mini High | 99% | $0.0032 | 7.7s | 87% | |
| Qwen 3.5 35B | 100% | $0.0045 | 13.6s | 100% | |
| DeepSeek V4 Flash (Reasoning) | 98% | $0.0002 | 8.7s | 82% | |
| Gemini 2.5 Pro | 100% | $0.0095 | 6.9s | 100% | |
| Qwen 3.5 122B | 100% | $0.0057 | 12.6s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 98% | $0.0004 | 2.5s | 72% | |
| Gemini 3 Pro (Preview) | 100% | $0.011 | 6.9s | 100% | |
| Gemini 2.5 Flash (Reasoning) | 97% | $0.0016 | 3.0s | 69% | |
| Gemini 3.1 Pro (Preview) | 100% | $0.012 | 10.0s | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | $0.0033 | 22.2s | 100% | |
| outline-count | pov-count | ||||||
|---|---|---|---|---|---|---|---|
| Model | Total â–¼ | Count chapters | Count acts | Count scenes | Count point of views for Jack Harper | Count point of views for Olivia | Count point of views for Jack and Olivia |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Qwen3.6 Max Preview | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Qwen 3.5 122B | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Qwen 3.5 27B | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
outline-count
Count chapters
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Inception Mercury | 100% | $0.0000 | 333ms | |
| Gemma 3 4B | 100% | $0.0001 | 335ms | |
| Stealth: Aurora Alpha | 100% | — | 607ms | |
| LFM2 24B | 100% | $0.0000 | 1.4s | |
| Gemini 2.5 Flash Lite | 100% | $0.0001 | 442ms | |
| Ministral 8B | 100% | $0.0001 | 413ms | |
| Gemma 3 12B | 100% | $0.0001 | 642ms | |
| Ministral 3B | 80% | $0.0001 | 715ms | |
| Inception Mercury 2 | 100% | $0.0002 | 400ms | |
| GPT-4.1 Nano | 100% | $0.0000 | 1.2s | |
| Ministral 3 3B | 100% | $0.0002 | 665ms | |
| Gemini 2.5 Flash | 100% | $0.0002 | 487ms | |
| GPT-4o Mini (temp=1) | 100% | $0.0002 | 662ms | |
| Mistral Small 4 | 100% | $0.0001 | 947ms | |
| DeepSeek V4 Flash | 100% | $0.0001 | 1.4s | |
| Gemma 3 27B | 100% | $0.0001 | 1.1s | |
| GPT-4o Mini (temp=0) | 100% | $0.0002 | 845ms | |
| Gemma 4 26B | 100% | $0.0002 | 3.2s | |
| Ministral 3 8B | 100% | $0.0002 | 815ms | |
| Llama 3.1 8B | 100% | $0.0002 | 1.4s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| Grok 4.3 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Qwen 3.5 Plus (2026-04-20) | 100% | 100% | 100% | |
| Gemma 4 26B (Reasoning) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Inception Mercury | 100% | $0.0000 | 333ms | 100% | |
| Gemma 3 4B | 100% | $0.0001 | 335ms | 100% | |
| Ministral 8B | 100% | $0.0001 | 413ms | 100% | |
| Inception Mercury 2 | 100% | $0.0002 | 400ms | 100% | |
| Gemini 2.5 Flash Lite | 100% | $0.0001 | 442ms | 100% | |
| Gemini 2.5 Flash | 100% | $0.0002 | 487ms | 100% | |
| Stealth: Aurora Alpha | 100% | — | 607ms | 100% | |
| Gemma 3 12B | 100% | $0.0001 | 642ms | 100% | |
| Ministral 3 3B | 100% | $0.0002 | 665ms | 100% | |
| GPT-4o Mini (temp=1) | 100% | $0.0002 | 662ms | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0004 | 658ms | 100% | |
| Ministral 3 8B | 100% | $0.0002 | 815ms | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0002 | 845ms | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0004 | 692ms | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0004 | 709ms | 100% | |
| Mistral Small 4 | 100% | $0.0001 | 947ms | 100% | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0003 | 826ms | 100% | |
| GPT-5.4 Nano | 100% | $0.0003 | 834ms | 100% | |
| Gemma 3 27B | 100% | $0.0001 | 1.1s | 100% | |
| GPT-4.1 Nano | 100% | $0.0000 | 1.2s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Contains a count of nouns |
Count acts
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Aurora Alpha | 100% | — | 752ms | |
| Inception Mercury | 100% | $0.0000 | 334ms | |
| Gemma 3 4B | 100% | $0.0001 | 311ms | |
| Inception Mercury 2 | 100% | $0.0001 | 301ms | |
| Ministral 8B | 100% | $0.0001 | 520ms | |
| Gemini 2.5 Flash Lite | 100% | $0.0001 | 429ms | |
| Ministral 3B | 100% | $0.0001 | 718ms | |
| GPT-4.1 Nano | 100% | $0.0000 | 1.0s | |
| Gemini 2.5 Flash | 100% | $0.0002 | 512ms | |
| Mistral Small 4 | 100% | $0.0001 | 1.2s | |
| Ministral 3 3B | 100% | $0.0002 | 1.0s | |
| GPT-4o Mini (temp=1) | 100% | $0.0002 | 663ms | |
| GPT-4o Mini (temp=0) | 100% | $0.0002 | 838ms | |
| Mistral Small 4 (Reasoning) | 100% | $0.0002 | 1.1s | |
| Llama 3.1 8B | 100% | $0.0002 | 946ms | |
| LFM2 24B | 100% | $0.0001 | 3.2s | |
| DeepSeek V4 Flash (Reasoning) | 100% | $0.0001 | 1.6s | |
| Arcee AI: Trinity Mini | 100% | $0.0001 | 1.5s | |
| Gemma 3 12B | 100% | $0.0001 | 1.6s | |
| DeepSeek V4 Flash | 100% | $0.0001 | 2.6s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Qwen 3.5 Plus (2026-04-20) | 100% | 100% | 100% | |
| Gemma 4 26B (Reasoning) | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemma 3 4B | 100% | $0.0001 | 311ms | 100% | |
| Inception Mercury | 100% | $0.0000 | 334ms | 100% | |
| Inception Mercury 2 | 100% | $0.0001 | 301ms | 100% | |
| Gemini 2.5 Flash Lite | 100% | $0.0001 | 429ms | 100% | |
| Ministral 8B | 100% | $0.0001 | 520ms | 100% | |
| Gemini 2.5 Flash | 100% | $0.0002 | 512ms | 100% | |
| Ministral 3B | 100% | $0.0001 | 718ms | 100% | |
| GPT-4o Mini (temp=1) | 100% | $0.0002 | 663ms | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0004 | 657ms | 100% | |
| Stealth: Aurora Alpha | 100% | — | 752ms | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0004 | 697ms | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0002 | 838ms | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0004 | 769ms | 100% | |
| GPT-4.1 Nano | 100% | $0.0000 | 1.0s | 100% | |
| Llama 3.1 8B | 100% | $0.0002 | 946ms | 100% | |
| Ministral 3 3B | 100% | $0.0002 | 1.0s | 100% | |
| GPT-5.4 Nano | 100% | $0.0003 | 970ms | 100% | |
| Mistral Small 4 (Reasoning) | 100% | $0.0002 | 1.1s | 100% | |
| Mistral Small 4 | 100% | $0.0001 | 1.2s | 100% | |
| GPT-5.4 Mini | 100% | $0.0011 | 595ms | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Contains a count of nouns |
Count scenes
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Inception Mercury | 90% | $0.0000 | 746ms | |
| Stealth: Aurora Alpha | 70% | — | 801ms | |
| Inception Mercury 2 | 60% | $0.0004 | 610ms | |
| Gemini 2.5 Flash Lite (Reasoning) | 90% | $0.0003 | 2.0s | |
| Arcee AI: Trinity Mini | 100% | $0.0001 | 2.9s | |
| Gemini 3.1 Flash Lite | 70% | $0.0008 | 3.1s | |
| Mistral NeMO | 90% | $0.0003 | 3.1s | |
| Grok 4 Fast | 70% | $0.0004 | 2.6s | |
| Gemini 3.1 Flash Lite (Reasoning) | 60% | $0.0008 | 1.4s | |
| Grok 4.1 Fast | 100% | $0.0005 | 4.5s | |
| Mistral Small 4 (Reasoning) | 90% | $0.0005 | 3.3s | |
| GPT-5.4 Nano (Reasoning, Low) | 90% | $0.0005 | 6.3s | |
| Stealth: Hunter Alpha | 80% | $0.0000 | 6.2s | |
| Grok 4.20 (Beta) | 80% | $0.0019 | 1.1s | |
| ByteDance Seed 1.6 Flash | 90% | $0.0004 | 5.4s | |
| DeepSeek V4 Flash (Reasoning) | 100% | $0.0002 | 8.0s | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0016 | 3.6s | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0016 | 3.0s | |
| GPT-5.4 Mini (Reasoning) | 90% | $0.0017 | 3.9s | |
| Grok 4.20 | 80% | $0.0021 | 2.8s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Grok 4.3 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Qwen 3.5 Plus (2026-04-20) | 100% | 100% | 100% | |
| Gemma 4 26B (Reasoning) | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Arcee AI: Trinity Mini | 100% | $0.0001 | 2.9s | 100% | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0016 | 3.0s | 100% | |
| Grok 4.1 Fast | 100% | $0.0005 | 4.5s | 100% | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0016 | 3.6s | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | $0.0028 | 4.3s | 100% | |
| DeepSeek V4 Flash (Reasoning) | 100% | $0.0002 | 8.0s | 100% | |
| GPT-5.1 | 100% | $0.0022 | 5.6s | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | $0.0032 | 4.5s | 100% | |
| o4 Mini | 100% | $0.0028 | 5.8s | 100% | |
| GPT-5.4 (Reasoning) | 100% | $0.0034 | 5.2s | 100% | |
| GPT-5 Mini | 100% | $0.0012 | 9.0s | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | $0.0061 | 4.0s | 100% | |
| Grok 4.20 (Reasoning) | 100% | $0.0034 | 9.0s | 100% | |
| Qwen 3 32B | 100% | $0.0004 | 12.7s | 100% | |
| o4 Mini High | 100% | $0.0036 | 8.7s | 100% | |
| Qwen 3.6 35B | 100% | $0.0024 | 12.9s | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | $0.010 | 4.0s | 100% | |
| GPT-5 Nano | 100% | $0.0005 | 16.7s | 100% | |
| Qwen 3.5 35B | 100% | $0.0041 | 12.7s | 100% | |
| Nemotron 3 Super | 100% | $0.0000 | 18.1s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 70.0% | Contains a count of nouns |
pov-count
Count point of views for Jack Harper
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Inception Mercury | 80% | $0.0000 | 817ms | |
| Stealth: Aurora Alpha | 100% | — | 4.2s | |
| Inception Mercury 2 | 100% | $0.0004 | 638ms | |
| Ministral 3 8B | 60% | $0.0003 | 2.4s | |
| Mistral Small Creative | 90% | $0.0003 | 2.7s | |
| GPT-5.4 Nano (Reasoning, Low) | 90% | $0.0005 | 2.2s | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0004 | 2.4s | |
| DeepSeek V4 Flash | 70% | $0.0001 | 3.8s | |
| GPT-5.4 Nano (Reasoning) | 70% | $0.0006 | 2.3s | |
| Gemini 3.1 Flash Lite | 90% | $0.0009 | 1.5s | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 1.7s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | |
| Stealth: Healer Alpha | 70% | $0.0000 | 5.2s | |
| DeepSeek V4 Flash (Reasoning) | 100% | $0.0002 | 7.9s | |
| Ministral 3 14B | 60% | $0.0004 | 4.0s | |
| Arcee AI: Trinity Large (Preview) | 70% | $0.0000 | 5.2s | |
| Grok 4.1 Fast | 90% | $0.0006 | 5.0s | |
| Stealth: Hunter Alpha | 70% | $0.0000 | 13.5s | |
| Gemma 3 12B | 100% | $0.0001 | 8.6s | |
| DeepSeek V3.2 | 100% | $0.0005 | 17.4s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Qwen 3.5 Plus (2026-04-20) | 100% | 100% | 100% | |
| Gemma 4 26B (Reasoning) | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Inception Mercury 2 | 100% | $0.0004 | 638ms | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0004 | 2.4s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0009 | 1.7s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | 100% | |
| Stealth: Aurora Alpha | 100% | — | 4.2s | 100% | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0021 | 2.9s | 100% | |
| DeepSeek V4 Flash (Reasoning) | 100% | $0.0002 | 7.9s | 100% | |
| Gemma 3 12B | 100% | $0.0001 | 8.6s | 100% | |
| Mistral Medium 3.1 | 100% | $0.0012 | 6.3s | 100% | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0023 | 4.0s | 100% | |
| GPT-4.1 | 100% | $0.0029 | 4.7s | 100% | |
| GPT-5 Mini | 100% | $0.0012 | 8.8s | 100% | |
| Z.AI GLM 5 Turbo | 100% | $0.0027 | 5.6s | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | $0.0032 | 4.6s | 100% | |
| GPT-5.2 | 100% | $0.0039 | 4.8s | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | $0.0044 | 4.0s | 100% | |
| o4 Mini | 100% | $0.0031 | 7.5s | 100% | |
| GPT-5.4 (Reasoning) | 100% | $0.0043 | 4.8s | 100% | |
| Llama 3.1 Nemotron 70B | 100% | $0.0007 | 13.8s | 100% | |
| Qwen 3.6 Flash | 100% | $0.0033 | 8.9s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 80.0% | Contains a count of nouns |
Count point of views for Olivia
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Aurora Alpha | 100% | — | 764ms | |
| Inception Mercury 2 | 100% | $0.0003 | 618ms | |
| Ministral 3 8B | 80% | $0.0003 | 2.4s | |
| GPT-4.1 Mini | 90% | $0.0003 | 2.0s | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0004 | 2.4s | |
| Ministral 3 14B | 100% | $0.0003 | 3.3s | |
| Gemini 3.1 Flash Lite (Preview) | 80% | $0.0007 | 1.2s | |
| Stealth: Hunter Alpha | 80% | $0.0000 | 23.6s | |
| Grok 4.1 Fast | 80% | $0.0004 | 3.2s | |
| Nemotron 3 Super | 90% | $0.0000 | 10.7s | |
| Gemini 3.1 Flash Lite | 80% | $0.0007 | 1.3s | |
| DeepSeek V4 Flash (Reasoning) | 100% | $0.0002 | 21.4s | |
| Gemini 3.1 Flash Lite (Reasoning) | 80% | $0.0008 | 1.4s | |
| GPT-5.4 Nano (Reasoning) | 90% | $0.0005 | 5.3s | |
| DeepSeek V3.2 | 80% | $0.0004 | 10.0s | |
| Gemma 4 31B | 90% | $0.0003 | 8.5s | |
| Qwen3 235B A22B Instruct 2507 | 70% | $0.0003 | 6.0s | |
| GPT-OSS 120B | 100% | $0.0003 | 8.7s | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0014 | 3.2s | |
| GPT-5.4 Mini (Reasoning) | 100% | $0.0015 | 3.6s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Qwen 3.5 Plus (2026-04-20) | 100% | 100% | 100% | |
| Gemma 4 26B (Reasoning) | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Stealth: Aurora Alpha | 100% | — | 764ms | 100% | |
| Inception Mercury 2 | 100% | $0.0003 | 618ms | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0004 | 2.4s | 100% | |
| Ministral 3 14B | 100% | $0.0003 | 3.3s | 100% | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0014 | 3.2s | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | $0.0015 | 3.6s | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | $0.0022 | 3.0s | 100% | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0021 | 3.4s | 100% | |
| GPT-5.1 | 100% | $0.0018 | 4.7s | 100% | |
| GPT-5.2 | 100% | $0.0025 | 3.7s | 100% | |
| GPT-OSS 120B | 100% | $0.0003 | 8.7s | 100% | |
| Xiaomi MIMO v2.5 Pro | 100% | $0.0014 | 6.6s | 100% | |
| GPT-5.4 (Reasoning) | 100% | $0.0031 | 3.3s | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | $0.0031 | 4.4s | 100% | |
| Z.AI GLM 5 Turbo | 100% | $0.0025 | 6.3s | 100% | |
| o4 Mini | 100% | $0.0027 | 6.6s | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | $0.0047 | 2.7s | 100% | |
| Grok 4.20 (Reasoning) | 100% | $0.0029 | 6.5s | 100% | |
| o4 Mini High | 100% | $0.0032 | 6.8s | 100% | |
| Qwen 3.6 35B | 100% | $0.0021 | 9.3s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 80.0% | Contains a count of nouns |
Count point of views for Jack and Olivia
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.0s | |
| GPT-5.4 Nano (Reasoning, Low) | 95% | $0.0007 | 3.2s | |
| Gemini 3.1 Flash Lite | 75% | $0.0009 | 1.6s | |
| Gemini 3.1 Flash Lite (Preview) | 80% | $0.0010 | 1.7s | |
| Gemini 3.1 Flash Lite (Reasoning) | 75% | $0.0009 | 1.6s | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0008 | 4.8s | |
| Stealth: Hunter Alpha | 80% | $0.0000 | 7.5s | |
| Gemini 2.5 Flash Lite (Reasoning) | 95% | $0.0009 | 5.1s | |
| Mistral Large 3 | 90% | $0.0012 | 4.8s | |
| Qwen3 235B A22B Instruct 2507 | 100% | $0.0004 | 16.4s | |
| Z.AI GLM 4.5 | 100% | $0.0009 | 8.1s | |
| DeepSeek V4 Flash (Reasoning) | 90% | $0.0003 | 10.5s | |
| Xiaomi MIMO v2.5 | 75% | $0.0013 | 5.9s | |
| Llama 3.1 Nemotron 70B | 95% | $0.0006 | 14.0s | |
| Gemini 2.5 Flash (Reasoning) | 80% | $0.0024 | 3.9s | |
| ByteDance Seed 1.6 Flash | 100% | $0.0010 | 15.0s | |
| DeepSeek-V2 Chat | 95% | $0.0003 | 17.9s | |
| MiniMax M2.7 | 90% | $0.0009 | 11.7s | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0021 | 16.3s | |
| Nemotron 3 Super | 80% | $0.0000 | 13.6s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| Qwen 3.6 Flash | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| GPT-5.2 | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.0s | 100% | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0008 | 4.8s | 100% | |
| Z.AI GLM 4.5 | 100% | $0.0009 | 8.1s | 100% | |
| Z.AI GLM 5 Turbo | 100% | $0.0030 | 7.2s | 100% | |
| Qwen3 235B A22B Instruct 2507 | 100% | $0.0004 | 16.4s | 100% | |
| ByteDance Seed 1.6 Flash | 100% | $0.0010 | 15.0s | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | $0.0041 | 6.0s | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | $0.0051 | 5.3s | 100% | |
| Qwen 3.6 Flash | 100% | $0.0035 | 10.6s | 100% | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0021 | 16.3s | 100% | |
| GPT-5.2 | 100% | $0.0055 | 6.4s | 100% | |
| Qwen 3.5 Flash | 100% | $0.0011 | 20.6s | 100% | |
| Writer: Palmyra X5 | 100% | $0.0039 | 12.4s | 100% | |
| Qwen 3.6 35B | 100% | $0.0030 | 16.3s | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | $0.0072 | 4.3s | 100% | |
| Z.AI GLM 5 | 100% | $0.0032 | 21.5s | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | $0.0033 | 23.3s | 100% | |
| Aion 2.0 | 100% | $0.0028 | 24.9s | 100% | |
| GPT-5.4 (Reasoning) | 100% | $0.0062 | 18.8s | 100% | |
| Z.AI GLM 4.6 | 100% | $0.0023 | 34.1s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 75.0% | Either/Or composite |