Novel outline
Handle questions about the outline of a novel in various formats
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Inception Mercury | 73% | $0.0000 | 679ms | |
| Stealth: Aurora Alpha | 88% | — | 1.3s | |
| Inception Mercury 2 | 86% | $0.0003 | 567ms | |
| Gemini 2.5 Flash Lite | 57% | $0.0002 | 575ms | |
| Ministral 3 3B | 57% | $0.0002 | 1.3s | |
| Gemini 2.5 Flash Lite (Reasoning) | 98% | $0.0004 | 2.5s | |
| Ministral 3 8B | 67% | $0.0002 | 1.8s | |
| Mistral NeMO | 68% | $0.0002 | 2.6s | |
| Gemini 3.1 Flash Lite (Preview) | 83% | $0.0007 | 1.2s | |
| GPT-5.4 Nano (Reasoning, Low) | 88% | $0.0005 | 3.0s | |
| Mistral Small 4 | 56% | $0.0002 | 1.7s | |
| GPT-4.1 Mini | 66% | $0.0003 | 2.5s | |
| Mistral Small Creative | 58% | $0.0003 | 2.3s | |
| Llama 3.1 8B | 54% | $0.0002 | 2.0s | |
| Ministral 3 14B | 69% | $0.0003 | 3.3s | |
| GPT-5.4 Nano (Reasoning) | 85% | $0.0005 | 3.1s | |
| Arcee AI: Trinity Mini | 59% | $0.0002 | 3.9s | |
| Grok 4.1 Fast | 84% | $0.0005 | 4.0s | |
| Mistral Small 4 (Reasoning) | 80% | $0.0005 | 4.6s | |
| Stealth: Healer Alpha | 72% | $0.0000 | 6.2s | |
Cost vs Performance
Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| Aion 2.0 | 100% | 100% | 100% | |
| Gemini 3 Pro (Preview) | 100% | 100% | 100% | |
| Gemini 2.5 Pro | 100% | 100% | 100% | |
| Qwen 3.5 35B | 100% | 100% | 100% | |
| Qwen 3.5 Flash | 100% | 100% | 100% | |
| o4 Mini High | 99% | 87% | 87% | |
| Claude Sonnet 4.6 (Reasoning) | 98% | 74% | 74% | |
| Gemini 2.5 Flash Lite (Reasoning) | 98% | 72% | 72% | |
| o4 Mini | 98% | 72% | 72% | |
| GPT-5 Nano | 95% | 70% | 70% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| GPT-5.4 (Reasoning, Low) | 100% | $0.0030 | 3.1s | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | $0.0027 | 4.0s | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | $0.0050 | 3.0s | 100% | |
| GPT-5.4 (Reasoning) | 100% | $0.0034 | 8.1s | 100% | |
| Qwen 3.5 Flash | 100% | $0.0009 | 15.4s | 100% | |
| Aion 2.0 | 100% | $0.0021 | 15.6s | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | $0.0022 | 15.5s | 100% | |
| o4 Mini High | 99% | $0.0032 | 7.7s | 87% | |
| Qwen 3.5 35B | 100% | $0.0045 | 13.6s | 100% | |
| Gemini 2.5 Pro | 100% | $0.0095 | 6.9s | 100% | |
| Qwen 3.5 122B | 100% | $0.0057 | 12.6s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 98% | $0.0004 | 2.5s | 72% | |
| Gemini 3 Pro (Preview) | 100% | $0.011 | 6.9s | 100% | |
| Gemini 2.5 Flash (Reasoning) | 97% | $0.0016 | 3.0s | 69% | |
| Gemini 3.1 Pro (Preview) | 100% | $0.012 | 10.0s | 100% | |
| o4 Mini | 98% | $0.0027 | 6.2s | 72% | |
| GPT-5 Mini | 97% | $0.0011 | 7.6s | 69% | |
| Claude Opus 4.6 (Reasoning) | 100% | $0.016 | 6.7s | 100% | |
| GPT-5.1 | 96% | $0.0024 | 4.9s | 62% | |
| Qwen 3.5 27B | 100% | $0.0040 | 28.1s | 100% | |
| outline-count | pov-count | ||||||
|---|---|---|---|---|---|---|---|
| Model | Total â–¼ | Count chapters | Count acts | Count scenes | Count point of views for Jack Harper | Count point of views for Olivia | Count point of views for Jack and Olivia |
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Qwen 3.5 122B | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Qwen 3.5 27B | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Aion 2.0 | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Gemini 3 Pro (Preview) | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Gemini 2.5 Pro | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Qwen 3.5 35B | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Qwen 3.5 Flash | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
outline-count
Count chapters
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Inception Mercury | 100% | $0.0000 | 333ms | |
| Gemma 3 4B | 100% | $0.0001 | 335ms | |
| Stealth: Aurora Alpha | 100% | — | 607ms | |
| LFM2 24B | 100% | $0.0000 | 1.4s | |
| Gemini 2.5 Flash Lite | 100% | $0.0001 | 442ms | |
| Ministral 8B | 100% | $0.0001 | 413ms | |
| Gemma 3 12B | 100% | $0.0001 | 642ms | |
| Ministral 3B | 80% | $0.0001 | 715ms | |
| Inception Mercury 2 | 100% | $0.0002 | 400ms | |
| GPT-4.1 Nano | 100% | $0.0000 | 1.2s | |
| Ministral 3 3B | 100% | $0.0002 | 665ms | |
| Gemini 2.5 Flash | 100% | $0.0002 | 487ms | |
| GPT-4o Mini (temp=1) | 100% | $0.0002 | 662ms | |
| Mistral Small 4 | 100% | $0.0001 | 947ms | |
| Gemma 3 27B | 100% | $0.0001 | 1.1s | |
| GPT-4o Mini (temp=0) | 100% | $0.0002 | 845ms | |
| Ministral 3 8B | 100% | $0.0002 | 815ms | |
| Llama 3.1 8B | 100% | $0.0002 | 1.4s | |
| Mistral Small 3.2 24B | 100% | $0.0002 | 1.4s | |
| Arcee AI: Trinity Large (Preview) | 100% | $0.0000 | 2.3s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| ByteDance Seed 1.6 | 100% | 100% | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| o4 Mini High | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Inception Mercury | 100% | $0.0000 | 333ms | 100% | |
| Gemma 3 4B | 100% | $0.0001 | 335ms | 100% | |
| Ministral 8B | 100% | $0.0001 | 413ms | 100% | |
| Inception Mercury 2 | 100% | $0.0002 | 400ms | 100% | |
| Gemini 2.5 Flash Lite | 100% | $0.0001 | 442ms | 100% | |
| Gemini 2.5 Flash | 100% | $0.0002 | 487ms | 100% | |
| Stealth: Aurora Alpha | 100% | — | 607ms | 100% | |
| Gemma 3 12B | 100% | $0.0001 | 642ms | 100% | |
| Ministral 3 3B | 100% | $0.0002 | 665ms | 100% | |
| GPT-4o Mini (temp=1) | 100% | $0.0002 | 662ms | 100% | |
| Ministral 3 8B | 100% | $0.0002 | 815ms | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0002 | 845ms | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0004 | 692ms | 100% | |
| Mistral Small 4 | 100% | $0.0001 | 947ms | 100% | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0003 | 826ms | 100% | |
| GPT-5.4 Nano | 100% | $0.0003 | 834ms | 100% | |
| Gemma 3 27B | 100% | $0.0001 | 1.1s | 100% | |
| GPT-4.1 Nano | 100% | $0.0000 | 1.2s | 100% | |
| LFM2 24B | 100% | $0.0000 | 1.4s | 100% | |
| GPT-5.4 Mini | 100% | $0.0011 | 541ms | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Contains a count of nouns |
Count acts
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Aurora Alpha | 100% | — | 752ms | |
| Inception Mercury | 100% | $0.0000 | 334ms | |
| Gemma 3 4B | 100% | $0.0001 | 311ms | |
| Inception Mercury 2 | 100% | $0.0001 | 301ms | |
| Ministral 8B | 100% | $0.0001 | 520ms | |
| Gemini 2.5 Flash Lite | 100% | $0.0001 | 429ms | |
| Ministral 3B | 100% | $0.0001 | 718ms | |
| GPT-4.1 Nano | 100% | $0.0000 | 1.0s | |
| Gemini 2.5 Flash | 100% | $0.0002 | 512ms | |
| Mistral Small 4 | 100% | $0.0001 | 1.2s | |
| Ministral 3 3B | 100% | $0.0002 | 1.0s | |
| GPT-4o Mini (temp=1) | 100% | $0.0002 | 663ms | |
| GPT-4o Mini (temp=0) | 100% | $0.0002 | 838ms | |
| Mistral Small 4 (Reasoning) | 100% | $0.0002 | 1.1s | |
| Llama 3.1 8B | 100% | $0.0002 | 946ms | |
| LFM2 24B | 100% | $0.0001 | 3.2s | |
| Arcee AI: Trinity Mini | 100% | $0.0001 | 1.5s | |
| Gemma 3 12B | 100% | $0.0001 | 1.6s | |
| Arcee AI: Trinity Large (Preview) | 100% | $0.0000 | 2.5s | |
| GPT-4.1 Mini | 100% | $0.0002 | 2.0s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| ByteDance Seed 1.6 | 100% | 100% | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| o4 Mini High | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemma 3 4B | 100% | $0.0001 | 311ms | 100% | |
| Inception Mercury | 100% | $0.0000 | 334ms | 100% | |
| Inception Mercury 2 | 100% | $0.0001 | 301ms | 100% | |
| Gemini 2.5 Flash Lite | 100% | $0.0001 | 429ms | 100% | |
| Ministral 8B | 100% | $0.0001 | 520ms | 100% | |
| Gemini 2.5 Flash | 100% | $0.0002 | 512ms | 100% | |
| Ministral 3B | 100% | $0.0001 | 718ms | 100% | |
| GPT-4o Mini (temp=1) | 100% | $0.0002 | 663ms | 100% | |
| Stealth: Aurora Alpha | 100% | — | 752ms | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0002 | 838ms | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0004 | 769ms | 100% | |
| GPT-4.1 Nano | 100% | $0.0000 | 1.0s | 100% | |
| Llama 3.1 8B | 100% | $0.0002 | 946ms | 100% | |
| Ministral 3 3B | 100% | $0.0002 | 1.0s | 100% | |
| GPT-5.4 Nano | 100% | $0.0003 | 970ms | 100% | |
| Mistral Small 4 (Reasoning) | 100% | $0.0002 | 1.1s | 100% | |
| Mistral Small 4 | 100% | $0.0001 | 1.2s | 100% | |
| GPT-5.4 Mini | 100% | $0.0011 | 595ms | 100% | |
| GPT-5.4 Nano (Reasoning, Low) | 100% | $0.0003 | 1.1s | 100% | |
| GPT-5.4 | 100% | $0.0010 | 730ms | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Contains a count of nouns |
Count scenes
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | ||
|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | |
| GPT-5.4 (Reasoning) | 100% | |
| GPT-5 Mini | 100% | |
| GPT-5.1 | 100% | |
| Claude Opus 4.6 | 100% | |
| GPT-5 | 100% | |
| Qwen 3.5 397B A17B | 100% | |
| Qwen 3.5 122B | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | |
| Qwen 3.5 27B | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | |
| o4 Mini High | 100% | |
| Grok 4.1 Fast | 100% | |
| Aion 2.0 | 100% | |
| Gemini 3 Pro (Preview) | 100% | |
| Gemini 2.5 Pro | 100% | |
| o4 Mini | 100% | |
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Inception Mercury | 90% | $0.0000 | 746ms | |
| Stealth: Aurora Alpha | 70% | — | 801ms | |
| Inception Mercury 2 | 60% | $0.0004 | 610ms | |
| Gemini 2.5 Flash Lite (Reasoning) | 90% | $0.0003 | 2.0s | |
| Arcee AI: Trinity Mini | 100% | $0.0001 | 2.9s | |
| Mistral NeMO | 90% | $0.0003 | 3.1s | |
| Grok 4 Fast | 70% | $0.0004 | 2.6s | |
| Grok 4.1 Fast | 100% | $0.0005 | 4.5s | |
| Mistral Small 4 (Reasoning) | 90% | $0.0005 | 3.3s | |
| GPT-5.4 Nano (Reasoning, Low) | 90% | $0.0005 | 6.3s | |
| Stealth: Hunter Alpha | 80% | $0.0000 | 6.2s | |
| Grok 4.20 (Beta) | 80% | $0.0019 | 1.1s | |
| ByteDance Seed 1.6 Flash | 90% | $0.0004 | 5.4s | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0016 | 3.6s | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0016 | 3.0s | |
| GPT-5.4 Mini (Reasoning) | 90% | $0.0017 | 3.9s | |
| MiniMax M2.7 | 70% | $0.0007 | 7.9s | |
| GPT-5.1 | 100% | $0.0022 | 5.6s | |
| MiniMax M2.5 | 90% | $0.0009 | 8.7s | |
| Nemotron 3 Super | 100% | $0.0000 | 18.1s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| o4 Mini High | 100% | 100% | 100% | |
| Grok 4.1 Fast | 100% | 100% | 100% | |
| Aion 2.0 | 100% | 100% | 100% | |
| Gemini 3 Pro (Preview) | 100% | 100% | 100% | |
| Gemini 2.5 Pro | 100% | 100% | 100% | |
| o4 Mini | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Arcee AI: Trinity Mini | 100% | $0.0001 | 2.9s | 100% | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0016 | 3.0s | 100% | |
| Grok 4.1 Fast | 100% | $0.0005 | 4.5s | 100% | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0016 | 3.6s | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | $0.0028 | 4.3s | 100% | |
| GPT-5.1 | 100% | $0.0022 | 5.6s | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | $0.0032 | 4.5s | 100% | |
| o4 Mini | 100% | $0.0028 | 5.8s | 100% | |
| GPT-5.4 (Reasoning) | 100% | $0.0034 | 5.2s | 100% | |
| GPT-5 Mini | 100% | $0.0012 | 9.0s | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | $0.0061 | 4.0s | 100% | |
| Qwen 3 32B | 100% | $0.0004 | 12.7s | 100% | |
| o4 Mini High | 100% | $0.0036 | 8.7s | 100% | |
| GPT-5 Nano | 100% | $0.0005 | 16.7s | 100% | |
| Qwen 3.5 35B | 100% | $0.0041 | 12.7s | 100% | |
| Nemotron 3 Super | 100% | $0.0000 | 18.1s | 100% | |
| Aion 2.0 | 100% | $0.0021 | 16.0s | 100% | |
| Qwen 3.5 Flash | 100% | $0.0009 | 18.0s | 100% | |
| GPT-5 | 100% | $0.0057 | 12.1s | 100% | |
| Gemini 2.5 Pro | 100% | $0.0100 | 7.1s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 60.0% | Contains a count of nouns |
pov-count
Count point of views for Jack Harper
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Inception Mercury | 80% | $0.0000 | 817ms | |
| Stealth: Aurora Alpha | 100% | — | 4.2s | |
| Inception Mercury 2 | 100% | $0.0004 | 638ms | |
| GPT-5.4 Nano (Reasoning, Low) | 90% | $0.0005 | 2.2s | |
| Ministral 3 8B | 60% | $0.0003 | 2.4s | |
| Mistral Small Creative | 90% | $0.0003 | 2.7s | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0004 | 2.4s | |
| GPT-5.4 Nano (Reasoning) | 70% | $0.0006 | 2.3s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | |
| Stealth: Healer Alpha | 70% | $0.0000 | 5.2s | |
| Ministral 3 14B | 60% | $0.0004 | 4.0s | |
| Grok 4.1 Fast | 90% | $0.0006 | 5.0s | |
| Mistral Medium 3.1 | 100% | $0.0012 | 6.3s | |
| Arcee AI: Trinity Large (Preview) | 70% | $0.0000 | 5.2s | |
| DeepSeek V3.2 | 100% | $0.0005 | 17.4s | |
| Stealth: Hunter Alpha | 70% | $0.0000 | 13.5s | |
| Gemma 3 12B | 100% | $0.0001 | 8.6s | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0021 | 2.9s | |
| ByteDance Seed 1.6 Flash | 80% | $0.0007 | 9.8s | |
| GPT-5.4 Mini (Reasoning) | 90% | $0.0022 | 3.3s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| o4 Mini High | 100% | 100% | 100% | |
| GPT-5.2 | 100% | 100% | 100% | |
| Aion 2.0 | 100% | 100% | 100% | |
| Z.AI GLM 4.6 | 100% | 100% | 100% | |
| Gemini 3 Pro (Preview) | 100% | 100% | 100% | |
| Claude Sonnet 4 | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Inception Mercury 2 | 100% | $0.0004 | 638ms | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0004 | 2.4s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0009 | 1.7s | 100% | |
| Stealth: Aurora Alpha | 100% | — | 4.2s | 100% | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0021 | 2.9s | 100% | |
| Gemma 3 12B | 100% | $0.0001 | 8.6s | 100% | |
| Mistral Medium 3.1 | 100% | $0.0012 | 6.3s | 100% | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0023 | 4.0s | 100% | |
| Claude 3.5 Haiku | 100% | $0.0024 | 4.6s | 100% | |
| GPT-4.1 | 100% | $0.0029 | 4.7s | 100% | |
| GPT-5 Mini | 100% | $0.0012 | 8.8s | 100% | |
| Z.AI GLM 5 Turbo | 100% | $0.0027 | 5.6s | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | $0.0032 | 4.6s | 100% | |
| GPT-5.2 | 100% | $0.0039 | 4.8s | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | $0.0044 | 4.0s | 100% | |
| o4 Mini | 100% | $0.0031 | 7.5s | 100% | |
| GPT-5.4 (Reasoning) | 100% | $0.0043 | 4.8s | 100% | |
| Llama 3.1 Nemotron 70B | 100% | $0.0007 | 13.8s | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | $0.0059 | 3.6s | 100% | |
| DeepSeek V3.2 | 100% | $0.0005 | 17.4s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 80.0% | Contains a count of nouns |
Count point of views for Olivia
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Aurora Alpha | 100% | — | 764ms | |
| Inception Mercury 2 | 100% | $0.0003 | 618ms | |
| Ministral 3 8B | 80% | $0.0003 | 2.4s | |
| GPT-4.1 Mini | 90% | $0.0003 | 2.0s | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0004 | 2.4s | |
| Gemini 3.1 Flash Lite (Preview) | 80% | $0.0007 | 1.2s | |
| Ministral 3 14B | 100% | $0.0003 | 3.3s | |
| Grok 4.1 Fast | 80% | $0.0004 | 3.2s | |
| Stealth: Hunter Alpha | 80% | $0.0000 | 23.6s | |
| GPT-5.4 Nano (Reasoning) | 90% | $0.0005 | 5.3s | |
| Nemotron 3 Super | 90% | $0.0000 | 10.7s | |
| DeepSeek V3.2 | 80% | $0.0004 | 10.0s | |
| Qwen3 235B A22B Instruct 2507 | 70% | $0.0003 | 6.0s | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0014 | 3.2s | |
| GPT-5.4 Mini (Reasoning) | 100% | $0.0015 | 3.6s | |
| GPT-5.4 (Reasoning, Low) | 100% | $0.0022 | 3.0s | |
| MiniMax M2.7 | 90% | $0.0009 | 9.7s | |
| Llama 3.1 Nemotron 70B | 90% | $0.0006 | 10.8s | |
| GPT-5.1 | 100% | $0.0018 | 4.7s | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0021 | 3.4s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.1 | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| o4 Mini High | 100% | 100% | 100% | |
| GPT-5.2 | 100% | 100% | 100% | |
| Claude Opus 4.5 | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Stealth: Aurora Alpha | 100% | — | 764ms | 100% | |
| Inception Mercury 2 | 100% | $0.0003 | 618ms | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0004 | 2.4s | 100% | |
| Ministral 3 14B | 100% | $0.0003 | 3.3s | 100% | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0014 | 3.2s | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | $0.0015 | 3.6s | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | $0.0022 | 3.0s | 100% | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0021 | 3.4s | 100% | |
| GPT-5.1 | 100% | $0.0018 | 4.7s | 100% | |
| GPT-5.2 | 100% | $0.0025 | 3.7s | 100% | |
| GPT-5.4 (Reasoning) | 100% | $0.0031 | 3.3s | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | $0.0031 | 4.4s | 100% | |
| Z.AI GLM 5 Turbo | 100% | $0.0025 | 6.3s | 100% | |
| o4 Mini | 100% | $0.0027 | 6.6s | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | $0.0047 | 2.7s | 100% | |
| o4 Mini High | 100% | $0.0032 | 6.8s | 100% | |
| Claude Sonnet 4.6 | 100% | $0.0063 | 2.0s | 100% | |
| MiniMax M2.5 | 100% | $0.0009 | 13.8s | 100% | |
| GPT-5 Nano | 100% | $0.0005 | 16.3s | 100% | |
| Qwen 3.5 Flash | 100% | $0.0011 | 17.2s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 70.0% | Contains a count of nouns |
Count point of views for Jack and Olivia
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.0s | |
| Gemini 3.1 Flash Lite (Preview) | 80% | $0.0010 | 1.7s | |
| GPT-5.4 Nano (Reasoning, Low) | 95% | $0.0007 | 3.2s | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0008 | 4.8s | |
| Stealth: Hunter Alpha | 80% | $0.0000 | 7.5s | |
| Gemini 2.5 Flash Lite (Reasoning) | 95% | $0.0009 | 5.1s | |
| Mistral Large 3 | 90% | $0.0012 | 4.8s | |
| Z.AI GLM 4.5 | 100% | $0.0009 | 8.1s | |
| Qwen3 235B A22B Instruct 2507 | 100% | $0.0004 | 16.4s | |
| Gemini 2.5 Flash (Reasoning) | 80% | $0.0024 | 3.9s | |
| Claude 3.5 Haiku | 100% | $0.0025 | 4.9s | |
| Llama 3.1 Nemotron 70B | 95% | $0.0006 | 14.0s | |
| ByteDance Seed 1.6 Flash | 100% | $0.0010 | 15.0s | |
| Mistral Large | 85% | $0.0037 | 2.0s | |
| MiniMax M2.7 | 90% | $0.0009 | 11.7s | |
| Claude Haiku 4.5 | 65% | $0.0032 | 3.1s | |
| DeepSeek-V2 Chat | 95% | $0.0003 | 17.9s | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0021 | 16.3s | |
| GPT-5 Mini | 90% | $0.0016 | 11.1s | |
| Z.AI GLM 5 Turbo | 100% | $0.0030 | 7.2s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5 Turbo | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| GPT-5.2 | 100% | 100% | 100% | |
| Aion 2.0 | 100% | 100% | 100% | |
| Z.AI GLM 4.6 | 100% | 100% | 100% | |
| Gemini 3 Pro (Preview) | 100% | 100% | 100% | |
| Gemini 2.5 Pro | 100% | 100% | 100% | |
| Qwen 3.5 35B | 100% | 100% | 100% | |
| Qwen 3.5 Flash | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0003 | 1.0s | 100% | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0008 | 4.8s | 100% | |
| Z.AI GLM 4.5 | 100% | $0.0009 | 8.1s | 100% | |
| Claude 3.5 Haiku | 100% | $0.0025 | 4.9s | 100% | |
| Z.AI GLM 5 Turbo | 100% | $0.0030 | 7.2s | 100% | |
| Qwen3 235B A22B Instruct 2507 | 100% | $0.0004 | 16.4s | 100% | |
| ByteDance Seed 1.6 Flash | 100% | $0.0010 | 15.0s | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | $0.0041 | 6.0s | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | $0.0051 | 5.3s | 100% | |
| Qwen 3.5 Plus (2026-02-15) | 100% | $0.0021 | 16.3s | 100% | |
| GPT-5.2 | 100% | $0.0055 | 6.4s | 100% | |
| Qwen 3.5 Flash | 100% | $0.0011 | 20.6s | 100% | |
| Writer: Palmyra X5 | 100% | $0.0039 | 12.4s | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | $0.0072 | 4.3s | 100% | |
| Z.AI GLM 5 | 100% | $0.0032 | 21.5s | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | $0.0033 | 23.3s | 100% | |
| Aion 2.0 | 100% | $0.0028 | 24.9s | 100% | |
| GPT-5.4 (Reasoning) | 100% | $0.0062 | 18.8s | 100% | |
| Z.AI GLM 4.6 | 100% | $0.0023 | 34.1s | 100% | |
| GPT-4o, May 13th (temp=0) | 100% | $0.013 | 7.2s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 72.5% | Either/Or composite |