Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Price-Performance Score Distribution (Top 20)

Click a model name to view its detail page.

ScoreCostTime
Stealth: Aurora Alpha100%—3.3s
Ministral 3B80%$0.00002.1s
Gemini 2.5 Flash Lite100%$0.00021.5s
Inception Mercury97%$0.00031.3s
Inception Mercury 297%$0.00041.1s
Mistral NeMO97%$0.00013.2s
Ministral 8B80%$0.00016.9s
Ministral 3 3B90%$0.00013.6s
Mistral Small 3.2 24B100%$0.00014.9s
Llama 3.1 8B60%$0.00014.1s
GPT-4.1 Nano90%$0.00014.0s
Gemini 3.1 Flash Lite (Preview)100%$0.00082.6s
Gemini 3.1 Flash Lite (Reasoning)100%$0.00075.1s
Gemini 3.1 Flash Lite100%$0.00075.6s
Gemini 2.5 Flash100%$0.00102.4s
GPT-4.1 Mini97%$0.00064.3s
GPT-4o Mini (temp=0)100%$0.00035.5s
Arcee AI: Trinity Mini97%$0.00026.4s
Claude 3 Haiku100%$0.00074.2s
GPT-4o Mini (temp=1)100%$0.00038.0s
0.700.800.901.00

Cost vs Performance

Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.

22 low-scoring outliers hidden: Z.AI GLM 5 Turbo (90.0%), Qwen 3.5 35B (90.0%), Z.AI GLM 4.7 Flash (90.0%), Nemotron 3 Super (90.0%), DeepSeek V3 (2025-03-24) (90.0%), Grok 4.20 (90.0%), Grok 4.3 (90.0%), Ministral 3 14B (90.0%), GPT-4.1 Nano (90.0%), Cohere Command R+ (Aug. 2024) (90.0%), Ministral 3 3B (90.0%), Llama 3.1 70B (86.7%), Qwen 3.5 Flash (80.0%), Mistral Small Creative (80.0%), Ministral 8B (80.0%), Ministral 3B (80.0%), Nemotron 3 Nano (73.3%), Qwen 3.6 35B (66.7%), Llama 3.1 8B (60.0%), Rocinante 12B (20.0%), ByteDance Seed 1.6 Flash (0.0%), LFM2 24B (0.0%).

Most Stable Models (Top 20)

Ranked by stability (median × consistency). Click a model name to view its detail page.

ScoreConsistencyStability
Claude Opus 4.6 (Reasoning)100%100%100%
Qwen3.6 Max Preview100%100%100%
Gemini 3.1 Pro (Preview)100%100%100%
Z.AI GLM 5.1100%100%100%
Claude Sonnet 4.6 (Reasoning)100%100%100%
Grok 4.3 (Reasoning)100%100%100%
GPT-5.4 (Reasoning)100%100%100%
Claude Opus 4.7 (Reasoning)100%100%100%
GPT-5.5 (Reasoning)100%100%100%
GPT-5 Mini100%100%100%
GPT-5.5 (Reasoning, Low)100%100%100%
Claude Opus 4.6100%100%100%
MoonshotAI: Kimi K2.6100%100%100%
GPT-5100%100%100%
Qwen 3.5 397B A17B100%100%100%
Gemma 4 31B (Reasoning)100%100%100%
Qwen 3.5 122B100%100%100%
Gemma 4 26B (Reasoning)100%100%100%
Grok 4.20 (Beta, Reasoning)100%100%100%
GPT-5.4 (Reasoning, Low)100%100%100%
100%

Top Overall Models (Top 20)

Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.

ScoreCostSpeedStability
Gemini 2.5 Flash Lite100%$0.00021.5s100%
Stealth: Aurora Alpha100%—3.3s100%
Gemini 3.1 Flash Lite (Preview)100%$0.00082.6s100%
Gemini 2.5 Flash100%$0.00102.4s100%
Mistral Small 3.2 24B100%$0.00014.9s100%
Claude 3 Haiku100%$0.00074.2s100%
GPT-4o Mini (temp=0)100%$0.00035.5s100%
Stealth: Healer Alpha100%$0.00006.7s100%
Gemini 3.1 Flash Lite (Reasoning)100%$0.00075.1s100%
Ministral 3 8B100%$0.00026.6s100%
Gemini 3.1 Flash Lite100%$0.00075.6s100%
Grok 4 Fast100%$0.00046.6s100%
Gemini 2.5 Flash Lite (Reasoning)100%$0.00076.0s100%
Gemini 2.5 Flash (Reasoning)100%$0.00173.7s100%
GPT-5.4 Nano100%$0.00145.1s100%
Gemma 3 4B100%$0.00018.0s100%
Grok 4.20 (Beta)100%$0.00262.5s100%
GPT-5.4 Nano (Reasoning)100%$0.00135.5s100%
GPT-5.4 Mini100%$0.00243.2s100%
GPT-4o Mini (temp=1)100%$0.00038.0s100%
100%
1–15 of 147
Page 1 / 10

Create alternate prose sections