Match red prose section

Test: Tool usage within Novelcrafter

Avg. Score
96.3%
Scenarios
1

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 2.5 Flash Lite100.0%$0.00021.5s100%
2Inception Mercury100.0%$0.00031.3s100%
3Mistral NeMO100.0%$0.00013.2s100%
4Stealth: Aurora Alpha100.0%3.3s100%
5Gemini 3.1 Flash Lite (Preview)100.0%$0.00082.6s100%
6GPT-4.1 Nano100.0%$0.00014.0s100%
7Gemini 2.5 Flash100.0%$0.00102.4s100%
8Mistral Small 3.2 24B100.0%$0.00014.9s100%
9GPT-4.1 Mini100.0%$0.00064.3s100%
10Claude 3 Haiku100.0%$0.00074.2s100%
11GPT-4o Mini (temp=0)100.0%$0.00035.5s100%
12Stealth: Healer Alpha100.0%$0.00006.7s100%
13Gemini 3.1 Flash Lite (Reasoning)100.0%$0.00075.1s100%
14Ministral 3 8B100.0%$0.00026.6s100%
15Mistral Small 4100.0%$0.00046.2s100%
16Gemini 3.1 Flash Lite100.0%$0.00075.6s100%
17Grok 4 Fast100.0%$0.00046.6s100%
18Gemini 2.5 Flash Lite (Reasoning)100.0%$0.00076.0s100%
19Gemini 2.5 Flash (Reasoning)100.0%$0.00173.7s100%
20DeepSeek V4 Flash (Reasoning)100.0%$0.00017.5s100%
21Gemini 3 Flash (Preview)100.0%$0.00154.6s100%
22GPT-5.4 Nano100.0%$0.00145.1s100%
23Gemma 3 4B100.0%$0.00018.0s100%
24Grok 4.20 (Beta)100.0%$0.00262.5s100%
25GPT-5.4 Nano (Reasoning)100.0%$0.00135.5s100%
26GPT-5.4 Mini100.0%$0.00243.2s100%
27GPT-4o Mini (temp=1)100.0%$0.00038.0s100%
28GPT-5.4 Mini (Reasoning, Low)100.0%$0.00253.2s100%
29Mistral Small 4 (Reasoning)100.0%$0.00067.7s100%
30Grok 4.1 Fast100.0%$0.00058.4s100%
31GPT-5.4 Mini (Reasoning)100.0%$0.00293.8s100%
32Nemotron 3 Super100.0%$0.000010.3s100%
33Arcee AI: Trinity Large (Preview)100.0%$0.000010.9s100%
34Qwen 2.5 72B100.0%$0.000310.8s100%
35Xiaomi MIMO v2.5100.0%$0.00148.4s100%
36Mistral Large100.0%$0.00267.0s100%
37Claude Haiku 4.5100.0%$0.00315.9s100%
38Gemma 4 26B100.0%$0.000212.9s100%
39DeepSeek V3 (2024-12-26)100.0%$0.000412.7s100%
40Qwen3 235B A22B Instruct 2507100.0%$0.000313.2s100%
41GPT-OSS 120B100.0%$0.000313.4s100%
42GPT-4.1100.0%$0.00396.3s100%
43Gemini 3 Flash (Preview, Reasoning)100.0%$0.00337.8s100%
44GPT-4o, Aug. 6th (temp=0)100.0%$0.00465.3s100%
45GPT-4o, Aug. 6th (temp=1)100.0%$0.00484.8s100%
46Gemini 3.5 Flash (Reasoning, Minimal)100.0%$0.00523.9s100%
47WizardLM 2 8x22b100.0%$0.000813.8s100%
48Gemma 3 12B100.0%$0.000215.4s100%
49DeepSeek-V2 Chat100.0%$0.000115.5s100%
50GPT-5.4 Nano (Reasoning, Low)100.0%$0.001412.8s100%
51Hermes 3 70B100.0%$0.000315.3s100%
52Llama 3.1 Nemotron 70B100.0%$0.000315.6s100%
53GPT-5 Mini100.0%$0.002312.2s100%
54Stealth: Hunter Alpha100.0%$0.000017.4s100%
55Mistral Medium 3.1100.0%$0.001414.3s100%
56Z.AI GLM 4.5100.0%$0.001514.7s100%
57Qwen 3 32B100.0%$0.000517.2s100%
58Mistral Large 2100.0%$0.002911.7s100%
59MiniMax M2.7100.0%$0.001215.6s100%
60Mistral Large 3100.0%$0.001216.3s100%
61Z.AI GLM 4.5 Air100.0%$0.000817.5s100%
62Writer: Palmyra X5100.0%$0.003713.0s100%
63Qwen 3.6 Flash100.0%$0.003514.0s100%
64Xiaomi MIMO v2.5 Pro100.0%$0.002516.3s100%
65DeepSeek V4 Flash100.0%$0.000223.3s100%
66DeepSeek V3.1100.0%$0.000522.7s100%
67Qwen 3.5 Plus (2026-02-15)100.0%$0.001421.3s100%
68DeepSeek V4 Pro100.0%$0.001222.0s100%
69o4 Mini100.0%$0.005313.5s100%
70Gemma 3 27B100.0%$0.000225.3s100%
71MiniMax M2.5100.0%$0.001223.7s100%
72Qwen 3.5 122B100.0%$0.005514.2s100%
73Hermes 3 405B100.0%$0.000026.6s100%
74Gemma 4 26B (Reasoning)100.0%$0.000426.0s100%
75GPT-4o, May 13th (temp=0)100.0%$0.00848.7s100%
76GPT-4o, May 13th (temp=1)100.0%$0.00858.6s100%
77DeepSeek V4 Pro (Reasoning)100.0%$0.001723.7s100%
78o4 Mini High100.0%$0.005914.9s100%
79Aion 2.0100.0%$0.001823.9s100%
80Qwen 3.5 27B100.0%$0.004019.2s100%
81Gemma 4 31B100.0%$0.000227.9s100%
82DeepSeek V3.2100.0%$0.000328.8s100%
83Z.AI GLM 5100.0%$0.002325.8s100%
84Claude Sonnet 4100.0%$0.009411.4s100%
85Claude Sonnet 4.6100.0%$0.009611.3s100%
86Qwen 3.5 9B100.0%$0.000433.1s100%
87Z.AI GLM 5.1100.0%$0.003726.7s100%
88Claude Sonnet 4.5100.0%$0.01012.4s100%
89Gemini 3.5 Flash (Reasoning)100.0%$0.0137.6s100%
90Claude 3.7 Sonnet100.0%$0.01112.5s100%
91Claude Sonnet 4.6 (Reasoning)100.0%$0.01113.1s100%
92Claude 3.5 Sonnet100.0%$0.009416.7s100%
93Gemini 2.5 Pro100.0%$0.01211.2s100%
94MoonshotAI: Kimi K2.5100.0%$0.003931.8s100%
95Grok 4.20 (Beta, Reasoning)100.0%$0.01411.4s100%
96Gemma 4 31B (Reasoning)100.0%$0.000443.6s100%
97GPT-5 Nano100.0%$0.001642.4s100%
98Claude Opus 4.5100.0%$0.01612.8s100%
99Z.AI GLM 4.7100.0%$0.002841.8s100%
100Grok 4.20 (Reasoning)100.0%$0.007332.7s100%
101Grok 4100.0%$0.01222.9s100%
102ByteDance Seed 1.6100.0%$0.003343.2s100%
103Qwen 3.5 Plus (2026-04-20)100.0%$0.005340.2s100%
104Gemini 3 Pro (Preview)100.0%$0.01714.8s100%
105Gemini 3.1 Pro (Preview)100.0%$0.01818.9s100%
106GPT-5.4 (Reasoning, Low)100.0%$0.01623.6s100%
107ByteDance Seed 2.0 Lite100.0%$0.004450.6s100%
108GPT-5.4100.0%$0.01726.6s100%
109Qwen3.7 Max100.0%$0.01632.8s100%
110Qwen 3.5 397B A17B100.0%$0.006654.0s100%
111Inception Mercury 290.0%$0.00041.1s40%
112Ministral 3 3B90.0%$0.00013.6s40%
113ByteDance Seed 2.0 Mini100.0%$0.00111.1m100%
114Qwen 3.6 27B100.0%$0.008951.9s100%
115GPT-5.4 (Reasoning)100.0%$0.01831.7s100%
116Arcee AI: Trinity Mini90.0%$0.00026.4s40%
117Grok 4.3 (Reasoning)100.0%$0.009453.3s100%
118Grok 4.390.0%$0.00115.7s40%
119Z.AI GLM 4.6100.0%$0.00381.1m100%
120GPT-5.5100.0%$0.02519.6s100%
121Llama 3.1 70B90.0%$0.00058.3s40%
122GPT-5.5 (Reasoning, Low)100.0%$0.02619.5s100%
123Ministral 3 14B90.0%$0.00039.7s40%
124Grok 4.2090.0%$0.00157.8s40%
125GPT-5.5 (Reasoning)100.0%$0.02720.7s100%
126Claude Opus 4.6100.0%$0.02623.1s100%
127MoonshotAI: Kimi K2.6100.0%$0.009359.5s100%
128DeepSeek V3 (2025-03-24)90.0%$0.000513.1s40%
129Z.AI GLM 5 Turbo90.0%$0.00239.4s40%
130Claude Opus 4.7100.0%$0.03215.4s100%
131GPT-5.1100.0%$0.02043.1s100%
132Cohere Command R+ (Aug. 2024)90.0%$0.005110.5s40%
133Qwen 3.5 35B90.0%$0.004014.6s40%
134Z.AI GLM 4.7 Flash90.0%$0.000523.1s40%
135Claude Opus 4.6 (Reasoning)100.0%$0.02927.5s100%
136GPT-5100.0%$0.02245.8s100%
137Claude Opus 4.7 (Reasoning)100.0%$0.03616.6s100%
138Ministral 3B80.0%$0.00002.1s20%
139Ministral 8B80.0%$0.00016.9s20%
140Mistral Small Creative80.0%$0.00046.5s20%
141Nemotron 3 Nano80.0%$0.00017.3s20%
142GPT-5.2100.0%$0.02850.8s100%
143Claude Opus 4100.0%$0.04320.4s100%
144Qwen 3.5 Flash80.0%$0.001025.2s20%
145Llama 3.1 8B60.0%$0.00014.1s2%
146Qwen 3.6 35B70.0%$0.003018.5s8%
147Qwen3.6 Max Preview100.0%$0.0231.6m100%
148Rocinante 12B30.0%$0.000212.4s0%
149LFM2 24B0.0%$0.00018.7s0%
150ByteDance Seed 1.6 Flash0.0%$0.000613.5s0%
96.27%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Qwen3.7 Max100100100100100100100100100100100.0%
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Qwen3.6 Max Preview100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5.1100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Qwen 3.6 Flash100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100100100100100100.0%
Claude Opus 4.7100100100100100100100100100100100.0%
Qwen 3.6 27B100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
MiniMax M2.7100100100100100100100100100100100.0%
GPT-5.5100100100100100100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
MiniMax M2.5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100100100100100100.0%
Stealth: Hunter Alpha100100100100100100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100100100100100100.0%
Gemma 4 31B100100100100100100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100100100100100100.0%
GPT-OSS 120B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Qwen 3.5 9B100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
Stealth: Healer Alpha100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100100100100100100.0%
Gemma 4 26B100100100100100100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100100100100100100.0%
Nemotron 3 Super100100100100100100100100100100100.0%
GPT-5.4100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Grok 4.20 (Beta)100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
DeepSeek V4 Pro100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-5.4 Mini100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Qwen 3 32B100100100100100100100100100100100.0%
DeepSeek V4 Flash100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Inception Mercury100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Mistral Small 4100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
GPT-5.4 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100090.0%
Qwen 3.5 35B100100100100100100100100100090.0%
Z.AI GLM 4.7 Flash100100100100100100100100100090.0%
Inception Mercury 2100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
Grok 4.20100100100100100100100100100090.0%
Grok 4.3100100100100100100100100100090.0%
Llama 3.1 70B100100100100100100100100100090.0%
Ministral 3 14B100100100100100100100100100090.0%
Arcee AI: Trinity Mini100100100100100100100100100090.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100090.0%
Ministral 3 3B100100100100100100100100100090.0%
Qwen 3.5 Flash1001001001001001001001000080.0%
Nemotron 3 Nano1001001001001001001001000080.0%
Mistral Small Creative1001001001001001001001000080.0%
Ministral 8B1001001001001001001001000080.0%
Ministral 3B1001001001001001001001000080.0%
Qwen 3.6 35B10010010010010010010000070.0%
Llama 3.1 8B100100100100100100000060.0%
Rocinante 12B100100100000000030.0%
ByteDance Seed 1.6 Flash00000000000.0%
LFM2 24B00000000000.0%