Tool usage within Novelcrafter
Output messages that are related to tool usage within Novelcrafter
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Aurora Alpha | 100% | — | 3.3s | |
| Ministral 3B | 80% | $0.0000 | 2.1s | |
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.5s | |
| Inception Mercury | 97% | $0.0003 | 1.3s | |
| Inception Mercury 2 | 97% | $0.0004 | 1.1s | |
| Mistral NeMO | 97% | $0.0001 | 3.2s | |
| Ministral 8B | 80% | $0.0001 | 6.9s | |
| Ministral 3 3B | 90% | $0.0001 | 3.6s | |
| Mistral Small 3.2 24B | 100% | $0.0001 | 4.9s | |
| Llama 3.1 8B | 60% | $0.0001 | 4.1s | |
| GPT-4.1 Nano | 90% | $0.0001 | 4.0s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0008 | 2.6s | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0007 | 5.1s | |
| Gemini 3.1 Flash Lite | 100% | $0.0007 | 5.6s | |
| Gemini 2.5 Flash | 100% | $0.0010 | 2.4s | |
| GPT-4.1 Mini | 97% | $0.0006 | 4.3s | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 5.5s | |
| Arcee AI: Trinity Mini | 97% | $0.0002 | 6.4s | |
| Claude 3 Haiku | 100% | $0.0007 | 4.2s | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 8.0s | |
Cost vs Performance
Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.
22 low-scoring outliers hidden: Z.AI GLM 5 Turbo (90.0%), Qwen 3.5 35B (90.0%), Z.AI GLM 4.7 Flash (90.0%), Nemotron 3 Super (90.0%), DeepSeek V3 (2025-03-24) (90.0%), Grok 4.20 (90.0%), Grok 4.3 (90.0%), Ministral 3 14B (90.0%), GPT-4.1 Nano (90.0%), Cohere Command R+ (Aug. 2024) (90.0%), Ministral 3 3B (90.0%), Llama 3.1 70B (86.7%), Qwen 3.5 Flash (80.0%), Mistral Small Creative (80.0%), Ministral 8B (80.0%), Ministral 3B (80.0%), Nemotron 3 Nano (73.3%), Qwen 3.6 35B (66.7%), Llama 3.1 8B (60.0%), Rocinante 12B (20.0%), ByteDance Seed 1.6 Flash (0.0%), LFM2 24B (0.0%).
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| Grok 4.3 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Gemma 4 26B (Reasoning) | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.5s | 100% | |
| Stealth: Aurora Alpha | 100% | — | 3.3s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0008 | 2.6s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0010 | 2.4s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0001 | 4.9s | 100% | |
| Claude 3 Haiku | 100% | $0.0007 | 4.2s | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 5.5s | 100% | |
| Stealth: Healer Alpha | 100% | $0.0000 | 6.7s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0007 | 5.1s | 100% | |
| Ministral 3 8B | 100% | $0.0002 | 6.6s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0007 | 5.6s | 100% | |
| Grok 4 Fast | 100% | $0.0004 | 6.6s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0007 | 6.0s | 100% | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0017 | 3.7s | 100% | |
| GPT-5.4 Nano | 100% | $0.0014 | 5.1s | 100% | |
| Gemma 3 4B | 100% | $0.0001 | 8.0s | 100% | |
| Grok 4.20 (Beta) | 100% | $0.0026 | 2.5s | 100% | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0013 | 5.5s | 100% | |
| GPT-5.4 Mini | 100% | $0.0024 | 3.2s | 100% | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 8.0s | 100% | |
| Model | Total â–¼ | Create alternate prose sections |
|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% |
| Qwen3.6 Max Preview | 100% | 100% |
| Gemini 3.1 Pro (Preview) | 100% | 100% |
| Z.AI GLM 5.1 | 100% | 100% |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% |
| Grok 4.3 (Reasoning) | 100% | 100% |
| GPT-5.4 (Reasoning) | 100% | 100% |
| Claude Opus 4.7 (Reasoning) | 100% | 100% |
| GPT-5.5 (Reasoning) | 100% | 100% |
| GPT-5 Mini | 100% | 100% |
| GPT-5.5 (Reasoning, Low) | 100% | 100% |
| Claude Opus 4.6 | 100% | 100% |
| MoonshotAI: Kimi K2.6 | 100% | 100% |
| GPT-5 | 100% | 100% |
| Qwen 3.5 397B A17B | 100% | 100% |
Create alternate prose sections
Performance Score Distribution (Top 20)
Click a model name to view its detail page.
Price-Performance Score Distribution (Top 20)
Click a model name to view its detail page.
| Score | Cost | Time | ||
|---|---|---|---|---|
| Stealth: Aurora Alpha | 100% | — | 3.3s | |
| Ministral 3B | 80% | $0.0000 | 2.1s | |
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.5s | |
| Inception Mercury | 97% | $0.0003 | 1.3s | |
| Inception Mercury 2 | 97% | $0.0004 | 1.1s | |
| Mistral NeMO | 97% | $0.0001 | 3.2s | |
| Ministral 8B | 80% | $0.0001 | 6.9s | |
| Ministral 3 3B | 90% | $0.0001 | 3.6s | |
| Mistral Small 3.2 24B | 100% | $0.0001 | 4.9s | |
| Llama 3.1 8B | 60% | $0.0001 | 4.1s | |
| GPT-4.1 Nano | 90% | $0.0001 | 4.0s | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0008 | 2.6s | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0007 | 5.1s | |
| Gemini 3.1 Flash Lite | 100% | $0.0007 | 5.6s | |
| Gemini 2.5 Flash | 100% | $0.0010 | 2.4s | |
| GPT-4.1 Mini | 97% | $0.0006 | 4.3s | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 5.5s | |
| Arcee AI: Trinity Mini | 97% | $0.0002 | 6.4s | |
| Claude 3 Haiku | 100% | $0.0007 | 4.2s | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 8.0s | |
Most Stable Models (Top 20)
Ranked by stability (median × consistency). Click a model name to view its detail page.
| Score | Consistency | Stability | ||
|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 100% | 100% | 100% | |
| Qwen3.6 Max Preview | 100% | 100% | 100% | |
| Gemini 3.1 Pro (Preview) | 100% | 100% | 100% | |
| Z.AI GLM 5.1 | 100% | 100% | 100% | |
| Claude Sonnet 4.6 (Reasoning) | 100% | 100% | 100% | |
| Grok 4.3 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning) | 100% | 100% | 100% | |
| Claude Opus 4.7 (Reasoning) | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning) | 100% | 100% | 100% | |
| GPT-5 Mini | 100% | 100% | 100% | |
| GPT-5.5 (Reasoning, Low) | 100% | 100% | 100% | |
| Claude Opus 4.6 | 100% | 100% | 100% | |
| MoonshotAI: Kimi K2.6 | 100% | 100% | 100% | |
| GPT-5 | 100% | 100% | 100% | |
| Qwen 3.5 397B A17B | 100% | 100% | 100% | |
| Gemma 4 31B (Reasoning) | 100% | 100% | 100% | |
| Qwen 3.5 122B | 100% | 100% | 100% | |
| Gemma 4 26B (Reasoning) | 100% | 100% | 100% | |
| Grok 4.20 (Beta, Reasoning) | 100% | 100% | 100% | |
| GPT-5.4 (Reasoning, Low) | 100% | 100% | 100% | |
Top Overall Models (Top 20)
Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.
| Score | Cost | Speed | Stability | ||
|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | 100% | $0.0002 | 1.5s | 100% | |
| Stealth: Aurora Alpha | 100% | — | 3.3s | 100% | |
| Gemini 3.1 Flash Lite (Preview) | 100% | $0.0008 | 2.6s | 100% | |
| Gemini 2.5 Flash | 100% | $0.0010 | 2.4s | 100% | |
| Mistral Small 3.2 24B | 100% | $0.0001 | 4.9s | 100% | |
| Claude 3 Haiku | 100% | $0.0007 | 4.2s | 100% | |
| GPT-4o Mini (temp=0) | 100% | $0.0003 | 5.5s | 100% | |
| Stealth: Healer Alpha | 100% | $0.0000 | 6.7s | 100% | |
| Gemini 3.1 Flash Lite (Reasoning) | 100% | $0.0007 | 5.1s | 100% | |
| Ministral 3 8B | 100% | $0.0002 | 6.6s | 100% | |
| Gemini 3.1 Flash Lite | 100% | $0.0007 | 5.6s | 100% | |
| Grok 4 Fast | 100% | $0.0004 | 6.6s | 100% | |
| Gemini 2.5 Flash Lite (Reasoning) | 100% | $0.0007 | 6.0s | 100% | |
| Gemini 2.5 Flash (Reasoning) | 100% | $0.0017 | 3.7s | 100% | |
| GPT-5.4 Nano | 100% | $0.0014 | 5.1s | 100% | |
| Gemma 3 4B | 100% | $0.0001 | 8.0s | 100% | |
| Grok 4.20 (Beta) | 100% | $0.0026 | 2.5s | 100% | |
| GPT-5.4 Nano (Reasoning) | 100% | $0.0013 | 5.5s | 100% | |
| GPT-5.4 Mini | 100% | $0.0024 | 3.2s | 100% | |
| GPT-4o Mini (temp=1) | 100% | $0.0003 | 8.0s | 100% | |
| Median | Evaluator | Top 3 | Flop 3 |
|---|---|---|---|
| 100.0% | Match blue prose section | ||
| 100.0% | Match green prose section | ||
| 100.0% | Match red prose section |