Match blue prose section

Test: Tool usage within Novelcrafter

Avg. Score
94.0%
Scenarios
1

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 2.5 Flash Lite100.0%$0.00021.5s100%
2Gemini 2.5 Flash100.0%$0.00102.4s100%
3Stealth: Aurora Alpha100.0%3.3s100%
4Mistral Small 3.2 24B100.0%$0.00014.9s100%
5Claude 3 Haiku100.0%$0.00074.2s100%
6GPT-4o Mini (temp=0)100.0%$0.00035.5s100%
7Arcee AI: Trinity Mini100.0%$0.00026.4s100%
8Ministral 3 8B100.0%$0.00026.6s100%
9Grok 4 Fast100.0%$0.00046.6s100%
10Gemma 3 4B100.0%$0.00018.0s100%
11GPT-4o Mini (temp=1)100.0%$0.00038.0s100%
12Grok 4.1 Fast100.0%$0.00058.4s100%
13Claude 3.5 Haiku100.0%$0.00186.4s100%
14Claude Haiku 4.5100.0%$0.00315.9s100%
15Mistral Large100.0%$0.00267.0s100%
16Qwen 2.5 72B100.0%$0.000310.8s100%
17GPT-4o, Aug. 6th (temp=1)100.0%$0.00484.8s100%
18GPT-4.1100.0%$0.00396.3s100%
19GPT-4o, Aug. 6th (temp=0)100.0%$0.00465.3s100%
20DeepSeek V3 (2024-12-26)100.0%$0.000412.7s100%
21Gemma 3 12B100.0%$0.000215.4s100%
22DeepSeek-V2 Chat100.0%$0.000115.5s100%
23Hermes 3 70B100.0%$0.000315.3s100%
24GPT-5 Mini100.0%$0.002312.2s100%
25Llama 3.1 Nemotron 70B100.0%$0.000315.6s100%
26Mistral Large 2100.0%$0.002911.7s100%
27Z.AI GLM 4.5100.0%$0.001514.7s100%
28Mistral Large 3100.0%$0.001216.3s100%
29Writer: Palmyra X5100.0%$0.003713.0s100%
30GPT-4o, May 13th (temp=0)100.0%$0.00848.7s100%
31GPT-4o, May 13th (temp=1)100.0%$0.00858.6s100%
32o4 Mini100.0%$0.005313.5s100%
33Qwen 3.5 Plus (2026-02-15)100.0%$0.001421.3s100%
34DeepSeek V3.1100.0%$0.000522.7s100%
35o4 Mini High100.0%$0.005914.9s100%
36Gemma 3 27B100.0%$0.000225.3s100%
37Claude Sonnet 4100.0%$0.009411.4s100%
38Claude Sonnet 4.6100.0%$0.009611.3s100%
39Hermes 3 405B100.0%$0.000026.6s100%
40Claude Sonnet 4.5100.0%$0.01012.4s100%
41Claude 3.7 Sonnet100.0%$0.01112.5s100%
42DeepSeek V3.2100.0%$0.000328.8s100%
43Z.AI GLM 5100.0%$0.002325.8s100%
44Gemini 2.5 Pro100.0%$0.01211.2s100%
45Claude 3.5 Sonnet100.0%$0.009416.7s100%
46Claude Opus 4.5100.0%$0.01612.8s100%
47MoonshotAI: Kimi K2.5100.0%$0.003931.8s100%
48Gemini 3 Pro (Preview)100.0%$0.01714.8s100%
49Grok 4100.0%$0.01222.9s100%
50GPT-5 Nano100.0%$0.001642.4s100%
51Gemini 3.1 Pro (Preview)100.0%$0.01818.9s100%
52Z.AI GLM 4.7100.0%$0.002841.8s100%
53Mistral NeMO90.0%$0.00013.2s40%
54Ministral 3 3B90.0%$0.00013.6s40%
55GPT-4.1 Mini90.0%$0.00064.3s40%
56Gemini 3 Flash (Preview)90.0%$0.00154.6s40%
57Ministral 3 14B90.0%$0.00039.7s40%
58DeepSeek V3 (2025-03-24)90.0%$0.000513.1s40%
59WizardLM 2 8x22b90.0%$0.000813.8s40%
60Claude Opus 4.6100.0%$0.02623.1s100%
61Mistral Medium 3.190.0%$0.001414.3s40%
62Qwen 3.5 397B A17B100.0%$0.006654.0s100%
63Cohere Command R+ (Aug. 2024)90.0%$0.005110.5s40%
64Ministral 3B80.0%$0.00002.1s20%
65Z.AI GLM 4.7 Flash90.0%$0.000523.1s40%
66GPT-4.1 Nano80.0%$0.00014.0s20%
67Minimax M2.590.0%$0.001223.7s40%
68Ministral 8B80.0%$0.00016.9s20%
69Mistral Small Creative80.0%$0.00046.5s20%
70Z.AI GLM 4.6100.0%$0.00381.1m100%
71Llama 3.1 70B80.0%$0.00058.3s20%
72Arcee AI: Trinity Large (Preview)80.0%$0.000010.9s20%
73GPT-5100.0%$0.02245.8s100%
74Claude Opus 4100.0%$0.04320.4s100%
75GPT-5.2100.0%$0.02850.8s100%
76ByteDance Seed 1.690.0%$0.003343.2s40%
77Llama 3.1 8B60.0%$0.00014.1s2%
78GPT-5.190.0%$0.02043.1s40%
79Rocinante 12B10.0%$0.000212.4s0%
80ByteDance Seed 1.6 Flash0.0%$0.000613.5s0%
94.00%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100090.0%
Minimax M2.5100100100100100100100100100090.0%
ByteDance Seed 1.6100100100100100100100100100090.0%
Gemini 3 Flash (Preview)100100100100100100100100100090.0%
Z.AI GLM 4.7 Flash100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
GPT-4.1 Mini100100100100100100100100100090.0%
Mistral Medium 3.1100100100100100100100100100090.0%
Ministral 3 14B100100100100100100100100100090.0%
Ministral 3 3B100100100100100100100100100090.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100090.0%
Mistral NeMO100100100100100100100100100090.0%
WizardLM 2 8x22b100100100100100100100100100090.0%
Llama 3.1 70B1001001001001001001001000080.0%
Mistral Small Creative1001001001001001001001000080.0%
Arcee AI: Trinity Large (Preview)1001001001001001001001000080.0%
GPT-4.1 Nano1001001001001001001001000080.0%
Ministral 8B1001001001001001001001000080.0%
Ministral 3B1001001001001001001001000080.0%
Llama 3.1 8B100100100100100100000060.0%
Rocinante 12B10000000000010.0%
ByteDance Seed 1.6 Flash00000000000.0%