NC Bench
Overview
About
Tests
Models
Dialogue tags
Various tasks related to dialogue tags in text.
Write 500 words with 30% dialogue
0-shot
Creative writing
Rule following
Model
Run 1
Run 2
Run 3
Run 4
Run 5
Run 6
Run 7
Run 8
Run 9
Run 10
Total
o4 Mini High
100%
100%
100%
100%
89%
89%
85%
50%
50%
25%
79%
o4 Mini
100%
100%
100%
80%
76%
61%
55%
53%
51%
43%
72%
Claude Opus 4.5
96%
65%
65%
60%
58%
52%
50%
50%
50%
47%
59%
Claude Opus 4.6
90%
72%
50%
50%
40%
18%
18%
1%
1%
1%
34%
MoonshotAI: Kimi K2.5
94%
91%
55%
39%
27%
27%
0%
0%
0%
0%
33%
Claude 3.7 Sonnet
50%
50%
49%
43%
43%
41%
38%
18%
0%
0%
33%
GPT-4o, Aug. 6th (temp=0)
50%
43%
43%
43%
43%
41%
34%
5%
0%
0%
30%
Inflection 3 (Productivity)
98%
97%
52%
26%
10%
8%
7%
0%
0%
0%
30%
Claude Opus 4
50%
50%
50%
49%
34%
34%
22%
5%
0%
0%
29%
Claude 3.0 Sonnet
52%
51%
50%
49%
42%
26%
15%
0%
0%
0%
28%
Claude Sonnet 4
87%
54%
50%
42%
17%
9%
8%
5%
0%
0%
27%
Toppy M 7B
50%
47%
45%
43%
42%
42%
1%
0%
0%
0%
27%
Z.AI GLM 4.7
67%
50%
43%
43%
28%
19%
14%
3%
0%
0%
27%
lzlv 70B
49%
45%
44%
43%
40%
29%
15%
2%
1%
0%
27%
Gemini 3 Flash (Preview)
50%
50%
50%
49%
34%
30%
5%
0%
0%
0%
27%
Hermes 2 Theta 8B
49%
48%
44%
43%
42%
28%
0%
0%
0%
0%
26%
Gemini 2.5 Pro
98%
94%
41%
14%
2%
0%
0%
0%
0%
0%
25%
GPT-4o Mini (temp=0)
50%
50%
47%
22%
22%
18%
14%
14%
10%
0%
24%
Mistral Large
49%
49%
45%
36%
35%
28%
0%
0%
0%
0%
24%
Hermes 3 405B
50%
50%
45%
38%
30%
14%
10%
0%
0%
0%
24%
Inflection 3 (PI)
97%
53%
49%
13%
3%
3%
1%
0%
0%
0%
22%
Claude 3.5 Sonnet (new)
49%
47%
47%
41%
18%
7%
3%
3%
1%
1%
22%
Claude Haiku 4.5
50%
49%
41%
38%
34%
3%
1%
0%
0%
0%
22%
AI21 Jamba 1.5 Large
50%
45%
41%
27%
23%
2%
1%
0%
0%
0%
19%
Llama 3.2 11B (Vision)
49%
45%
34%
22%
18%
18%
0%
0%
0%
0%
19%
Claude 2.0
50%
50%
38%
26%
10%
0%
0%
0%
0%
0%
17%
Gemini 2.5 Flash Lite
50%
47%
45%
30%
0%
0%
0%
0%
0%
0%
17%
Mistral Medium
50%
48%
47%
18%
4%
1%
1%
0%
0%
0%
17%
Claude 3 Haiku
79%
71%
15%
3%
1%
0%
0%
0%
0%
0%
17%
Gemini 3 Pro (Preview)
50%
45%
43%
27%
0%
0%
0%
0%
0%
0%
17%
Claude Sonnet 4.5
47%
38%
37%
11%
7%
7%
6%
5%
0%
0%
16%
Z.AI GLM 4.7 Flash
50%
50%
45%
1%
0%
0%
0%
0%
0%
0%
15%
AI21 Jamba
50%
50%
44%
1%
1%
0%
0%
0%
0%
0%
15%
GPT-4o, May 13th (temp=0)
45%
23%
22%
16%
15%
10%
8%
3%
1%
0%
14%
Ministral 8B
49%
48%
41%
0%
0%
0%
0%
0%
0%
0%
14%
Llama 3.2 90B (Vision)
87%
48%
0%
0%
0%
0%
0%
0%
0%
0%
13%
Llama 3 Euryale 70B v2.1
54%
51%
12%
4%
3%
1%
1%
0%
0%
0%
13%
Phi-3.5 Mini 128k
44%
43%
26%
10%
1%
0%
0%
0%
0%
0%
12%
Llama 3.2 3B
45%
41%
34%
0%
0%
0%
0%
0%
0%
0%
12%
Phi-3 Medium 128k
35%
33%
22%
11%
10%
1%
0%
0%
0%
0%
11%
Cohere Command R+ (Apr. 2024)
50%
49%
7%
3%
0%
0%
0%
0%
0%
0%
11%
MythoMax 13B
51%
50%
5%
0%
0%
0%
0%
0%
0%
0%
11%
Llama 3.1 Euryale 70B v2.2
50%
42%
14%
0%
0%
0%
0%
0%
0%
0%
11%
Gemma 2 27B
52%
50%
0%
0%
0%
0%
0%
0%
0%
0%
10%
AI21 Jamba 1.5 Mini
48%
43%
10%
0%
0%
0%
0%
0%
0%
0%
10%
GPT-4o, May 13th (temp=1)
47%
31%
13%
6%
4%
0%
0%
0%
0%
0%
10%
Hermes 3 70B
41%
31%
26%
1%
0%
0%
0%
0%
0%
0%
10%
MN GRAND Gutenberg Lyra4 12B Madness
50%
49%
0%
0%
0%
0%
0%
0%
0%
0%
10%
GPT-4 Turbo
38%
25%
21%
7%
2%
1%
0%
0%
0%
0%
9%
GPT-4o, Aug. 6th (temp=1)
43%
29%
14%
2%
0%
0%
0%
0%
0%
0%
9%
Llama 3.1 405B
50%
34%
0%
0%
0%
0%
0%
0%
0%
0%
8%
Qwen 2 72B
45%
34%
0%
0%
0%
0%
0%
0%
0%
0%
8%
Mistral NeMO
44%
26%
6%
2%
0%
0%
0%
0%
0%
0%
8%
DeepSeek-V2 Chat
49%
26%
2%
1%
0%
0%
0%
0%
0%
0%
8%
Cohere Command R+ (Aug. 2024)
41%
30%
5%
0%
0%
0%
0%
0%
0%
0%
8%
Llama 3.2 1B
49%
24%
1%
0%
0%
0%
0%
0%
0%
0%
7%
Z.AI GLM 4.6
67%
3%
0%
0%
0%
0%
0%
0%
0%
0%
7%
Ministral 3B
67%
3%
0%
0%
0%
0%
0%
0%
0%
0%
7%
GPT-4.1 Mini
45%
11%
7%
0%
0%
0%
0%
0%
0%
0%
6%
Phi-3 Mini 128k
48%
7%
1%
0%
0%
0%
0%
0%
0%
0%
6%
Llama 3.1 8B
53%
2%
0%
0%
0%
0%
0%
0%
0%
0%
5%
Mistral Nemo 12B Celeste
26%
26%
1%
1%
0%
0%
0%
0%
0%
0%
5%
WizardLM 2 8x22b
49%
2%
2%
0%
0%
0%
0%
0%
0%
0%
5%
Gemini Flash 1.5
49%
2%
1%
1%
0%
0%
0%
0%
0%
0%
5%
MythoMist 7B
49%
3%
0%
0%
0%
0%
0%
0%
0%
0%
5%
Gemini 2.5 Flash
50%
1%
0%
0%
0%
0%
0%
0%
0%
0%
5%
Claude 3.5 Haiku
20%
19%
9%
1%
0%
0%
0%
0%
0%
0%
5%
GPT-4o Mini (temp=1)
41%
7%
2%
0%
0%
0%
0%
0%
0%
0%
5%
Qwen 2.5 72B
47%
2%
1%
0%
0%
0%
0%
0%
0%
0%
5%
Rocinante 12B
43%
1%
0%
0%
0%
0%
0%
0%
0%
0%
4%
Llama 3 70B
43%
0%
0%
0%
0%
0%
0%
0%
0%
0%
4%
Qwen 2 7B
39%
2%
1%
0%
0%
0%
0%
0%
0%
0%
4%
GPT-4.1
41%
1%
0%
0%
0%
0%
0%
0%
0%
0%
4%
Llama 3.1 Nemotron 70B
38%
0%
0%
0%
0%
0%
0%
0%
0%
0%
4%
Magnum v2 72B
36%
1%
0%
0%
0%
0%
0%
0%
0%
0%
4%
Llama 3.1 70B
18%
7%
1%
1%
0%
0%
0%
0%
0%
0%
3%
Goliath 120B
24%
1%
0%
0%
0%
0%
0%
0%
0%
0%
3%
Mistral Large 2
26%
0%
0%
0%
0%
0%
0%
0%
0%
0%
3%
Gemini Pro 1.5
21%
0%
0%
0%
0%
0%
0%
0%
0%
0%
2%
Liquid: LFM 40B MoE
10%
9%
0%
0%
0%
0%
0%
0%
0%
0%
2%
Claude 3.5 Sonnet
10%
5%
1%
1%
0%
0%
0%
0%
0%
0%
2%
Fimbulvetr 11B v2
14%
1%
0%
0%
0%
0%
0%
0%
0%
0%
2%
EVA Qwen 2.5 14B
14%
1%
0%
0%
0%
0%
0%
0%
0%
0%
1%
Z.AI GLM 4.5
14%
0%
0%
0%
0%
0%
0%
0%
0%
0%
1%
Lumimaid v0.2 8B
6%
0%
0%
0%
0%
0%
0%
0%
0%
0%
1%
Mistral Small Creative
3%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Magnum 72B
2%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Sao10K L3.1 70B Hanami x1
1%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Writer: Palmyra X5
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Llama 3 TenyxChat-DaybreakStorywriter 70B
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Gemma 2 9B
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
GPT-4.1 Nano
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Claude 2.1
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
13.67%