Tool usage within Novelcrafter
Output messages that are related to tool usage within Novelcrafter
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Aurora Alpha | 100% | — | 3.3s | |
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.5s | |
| Ministral 3B | 80% | $0.0000 | 2.1s | |
| Inception Mercury | 97% | $0.0003 | 1.3s | |
| Inception Mercury 2 | 97% | $0.0004 | 1.1s | |
| Mistral NeMO | 97% | $0.0001 | 3.2s | |
| Ministral 8B | 80% | $0.0001 | 6.9s | |
| Ministral 3 3B | 90% | $0.0001 | 3.6s | |
| Mistral Small 3.2 24B | 100% | $0.0001 | 4.9s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0008 | 2.6s | |
| Llama 3.1 8B | 60% | $0.0001 | 4.1s | |
| GPT-4.1 Nano | 90% | $0.0001 | 4.0s | |
| Gemini 2.5 Flash | 100% | $0.0010 | 2.4s | |
| GPT-4.1 Mini | 97% | $0.0006 | 4.3s | |
| Claude 3 Haiku | 100% | $0.0007 | 4.2s | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 5.5s | |
| Arcee AI: Trinity Mini | 97% | $0.0002 | 6.4s | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 8.0s | |
| Ministral 3 8B | 100% | $0.0002 | 6.6s | |
| Stealth: Healer Alpha | 100% | $0.0000 | 6.7s | |
Cost vs Performance
Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.
19 low-scoring outliers hidden: Z.AI GLM 5 Turbo (90.0%), Qwen 3.5 35B (90.0%), Z.AI GLM 4.7 Flash (90.0%), Nemotron 3 Super (90.0%), DeepSeek V3 (2025-03-24) (90.0%), Ministral 3 14B (90.0%), GPT-4.1 Nano (90.0%), Cohere Command R+ (Aug. 2024) (90.0%), Ministral 3 3B (90.0%), Llama 3.1 70B (86.7%), Qwen 3.5 Flash (80.0%), Mistral Small Creative (80.0%), Ministral 8B (80.0%), Ministral 3B (80.0%), Nemotron 3 Nano (73.3%), Llama 3.1 8B (60.0%), Rocinante 12B (20.0%), ByteDance Seed 1.6 Flash (0.0%), LFM2 24B (0.0%).
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| o4 Mini High | 100% | 100% | 100% | |
| GPT-5.2 | 100% | 100% | 100% | |
| Claude Opus 4.5 | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.5s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0008 | 2.6s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0010 | 2.4s | 100% | |
| Stealth: Aurora Alpha | 100% | — | 3.3s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0001 | 4.9s | 100% | |
| Claude 3 Haiku | 100% | $0.0007 | 4.2s | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 5.5s | 100% | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0017 | 3.7s | 100% | |
| Grok 4.20 (Beta) | 100% | $0.0026 | 2.5s | 100% | |
| Stealth: Healer Alpha | 100% | $0.0000 | 6.7s | 100% | |
| GPT-5.4 Mini | 100% | $0.0024 | 3.2s | 100% | |
| Ministral 3 8B | 100% | $0.0002 | 6.6s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0007 | 6.0s | 100% | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0025 | 3.2s | 100% | |
| GPT-5.4 Nano | 100% | $0.0014 | 5.1s | 100% | |
| Grok 4 Fast | 100% | $0.0004 | 6.6s | 100% | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0013 | 5.5s | 100% | |
| Gemma 3 4B | 100% | $0.0001 | 8.0s | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | $0.0029 | 3.8s | 100% | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 8.0s | 100% | |
| Model | Total â–¼ | Create alternate prose sections |
|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% |
| Gemini 3.1 Pro (Preview) | 100% | 100% |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% |
| GPT-5.4 (Reasoning) | 100% | 100% |
| GPT-5 Mini | 100% | 100% |
| Claude Opus 4.6 | 100% | 100% |
| GPT-5 | 100% | 100% |
| Qwen 3.5 397B A17B | 100% | 100% |
| Qwen 3.5 122B | 100% | 100% |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% |
| GPT-5.4 (Reasoning, Low) | 100% | 100% |
| Z.AI GLM 5 | 100% | 100% |
| Claude Sonnet 4.6 | 100% | 100% |
| MoonshotAI: Kimi K2.5 | 100% | 100% |
| Qwen 3.5 27B | 100% | 100% |
Create alternate prose sections
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Aurora Alpha | 100% | — | 3.3s | |
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.5s | |
| Ministral 3B | 80% | $0.0000 | 2.1s | |
| Inception Mercury | 97% | $0.0003 | 1.3s | |
| Inception Mercury 2 | 97% | $0.0004 | 1.1s | |
| Mistral NeMO | 97% | $0.0001 | 3.2s | |
| Ministral 8B | 80% | $0.0001 | 6.9s | |
| Ministral 3 3B | 90% | $0.0001 | 3.6s | |
| Mistral Small 3.2 24B | 100% | $0.0001 | 4.9s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0008 | 2.6s | |
| Llama 3.1 8B | 60% | $0.0001 | 4.1s | |
| GPT-4.1 Nano | 90% | $0.0001 | 4.0s | |
| Gemini 2.5 Flash | 100% | $0.0010 | 2.4s | |
| GPT-4.1 Mini | 97% | $0.0006 | 4.3s | |
| Claude 3 Haiku | 100% | $0.0007 | 4.2s | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 5.5s | |
| Arcee AI: Trinity Mini | 97% | $0.0002 | 6.4s | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 8.0s | |
| Ministral 3 8B | 100% | $0.0002 | 6.6s | |
| Stealth: Healer Alpha | 100% | $0.0000 | 6.7s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
| Z.AI GLM 5 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.5 | 100% | 100% | 100% | |
| Qwen 3.5 27B | 100% | 100% | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | 100% | 100% | |
| Gemini 3 Flash (Preview, Reasoning) | 100% | 100% | 100% | |
| o4 Mini High | 100% | 100% | 100% | |
| GPT-5.2 | 100% | 100% | 100% | |
| Claude Opus 4.5 | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.5s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0008 | 2.6s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0010 | 2.4s | 100% | |
| Stealth: Aurora Alpha | 100% | — | 3.3s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0001 | 4.9s | 100% | |
| Claude 3 Haiku | 100% | $0.0007 | 4.2s | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 5.5s | 100% | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0017 | 3.7s | 100% | |
| Grok 4.20 (Beta) | 100% | $0.0026 | 2.5s | 100% | |
| Stealth: Healer Alpha | 100% | $0.0000 | 6.7s | 100% | |
| GPT-5.4 Mini | 100% | $0.0024 | 3.2s | 100% | |
| Ministral 3 8B | 100% | $0.0002 | 6.6s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0007 | 6.0s | 100% | |
| GPT-5.4 Mini (Reasoning, Low) | 100% | $0.0025 | 3.2s | 100% | |
| GPT-5.4 Nano | 100% | $0.0014 | 5.1s | 100% | |
| Grok 4 Fast | 100% | $0.0004 | 6.6s | 100% | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0013 | 5.5s | 100% | |
| Gemma 3 4B | 100% | $0.0001 | 8.0s | 100% | |
| GPT-5.4 Mini (Reasoning) | 100% | $0.0029 | 3.8s | 100% | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 8.0s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Match blue prose section | ||
| 100.0% | Match green prose section | ||
| 100.0% | Match red prose section |