Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Price-Performance Score Distribution (Top 20)

Click a model name to view its detail page.

ScoreCostTime
Stealth: Aurora Alpha100%—3.3s
Gemini 2.5 Flash Lite100%$0.00021.5s
Ministral 3B80%$0.00002.1s
Inception Mercury97%$0.00031.3s
Inception Mercury 297%$0.00041.1s
Mistral NeMO97%$0.00013.2s
Ministral 8B80%$0.00016.9s
Ministral 3 3B90%$0.00013.6s
Mistral Small 3.2 24B100%$0.00014.9s
Gemini 3.1 Flash Lite (Preview)100%$0.00082.6s
Llama 3.1 8B60%$0.00014.1s
GPT-4.1 Nano90%$0.00014.0s
Gemini 2.5 Flash100%$0.00102.4s
GPT-4.1 Mini97%$0.00064.3s
Claude 3 Haiku100%$0.00074.2s
GPT-4o Mini (temp=0)100%$0.00035.5s
Arcee AI: Trinity Mini97%$0.00026.4s
GPT-4o Mini (temp=1)100%$0.00038.0s
Ministral 3 8B100%$0.00026.6s
Stealth: Healer Alpha100%$0.00006.7s
0.700.800.901.00

Cost vs Performance

Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.

19 low-scoring outliers hidden: Z.AI GLM 5 Turbo (90.0%), Qwen 3.5 35B (90.0%), Z.AI GLM 4.7 Flash (90.0%), Nemotron 3 Super (90.0%), DeepSeek V3 (2025-03-24) (90.0%), Ministral 3 14B (90.0%), GPT-4.1 Nano (90.0%), Cohere Command R+ (Aug. 2024) (90.0%), Ministral 3 3B (90.0%), Llama 3.1 70B (86.7%), Qwen 3.5 Flash (80.0%), Mistral Small Creative (80.0%), Ministral 8B (80.0%), Ministral 3B (80.0%), Nemotron 3 Nano (73.3%), Llama 3.1 8B (60.0%), Rocinante 12B (20.0%), ByteDance Seed 1.6 Flash (0.0%), LFM2 24B (0.0%).

Most Stable Models (Top 20)

Ranked by stability (median × consistency). Click a model name to view its detail page.

ScoreConsistencyStability
Claude Opus 4.6 (Reasoning)100%100%100%
Gemini 3.1 Pro (Preview)100%100%100%
Claude Sonnet 4.6 (Reasoning)100%100%100%
GPT-5.4 (Reasoning)100%100%100%
GPT-5 Mini100%100%100%
Claude Opus 4.6100%100%100%
GPT-5100%100%100%
Qwen 3.5 397B A17B100%100%100%
Qwen 3.5 122B100%100%100%
Grok 4.20 (Beta, Reasoning)100%100%100%
GPT-5.4 (Reasoning, Low)100%100%100%
Z.AI GLM 5100%100%100%
Claude Sonnet 4.6100%100%100%
MoonshotAI: Kimi K2.5100%100%100%
Qwen 3.5 27B100%100%100%
GPT-5.4 Mini (Reasoning)100%100%100%
Gemini 3 Flash (Preview, Reasoning)100%100%100%
o4 Mini High100%100%100%
GPT-5.2100%100%100%
Claude Opus 4.5100%100%100%
100%

Top Overall Models (Top 20)

Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.

ScoreCostSpeedStability
Gemini 2.5 Flash Lite100%$0.00021.5s100%
Gemini 3.1 Flash Lite (Preview)100%$0.00082.6s100%
Gemini 2.5 Flash100%$0.00102.4s100%
Stealth: Aurora Alpha100%—3.3s100%
Mistral Small 3.2 24B100%$0.00014.9s100%
Claude 3 Haiku100%$0.00074.2s100%
GPT-4o Mini (temp=0)100%$0.00035.5s100%
Gemini 2.5 Flash (Reasoning)100%$0.00173.7s100%
Grok 4.20 (Beta)100%$0.00262.5s100%
Stealth: Healer Alpha100%$0.00006.7s100%
GPT-5.4 Mini100%$0.00243.2s100%
Ministral 3 8B100%$0.00026.6s100%
Gemini 2.5 Flash Lite (Reasoning)100%$0.00076.0s100%
GPT-5.4 Mini (Reasoning, Low)100%$0.00253.2s100%
GPT-5.4 Nano100%$0.00145.1s100%
Grok 4 Fast100%$0.00046.6s100%
GPT-5.4 Nano (Reasoning)100%$0.00135.5s100%
Gemma 3 4B100%$0.00018.0s100%
GPT-5.4 Mini (Reasoning)100%$0.00293.8s100%
GPT-4o Mini (temp=1)100%$0.00038.0s100%
100%
1–15 of 118
Page 1 / 8

Create alternate prose sections