NC Bench
Overview
About
Tests
Models
Dialogue tags
Various tasks related to dialogue tags in text.
Write 500 words with 50% dialogue
0-shot
Creative writing
Rule following
Model
Run 1
Run 2
Run 3
Run 4
Run 5
Run 6
Run 7
Run 8
Run 9
Run 10
Total
Claude Opus 4.6
94%
92%
92%
89%
72%
64%
58%
50%
50%
49%
71%
o4 Mini
100%
100%
90%
71%
59%
50%
50%
49%
45%
38%
65%
o4 Mini High
100%
91%
91%
62%
52%
50%
50%
50%
22%
0%
57%
Claude Opus 4.5
68%
67%
66%
66%
54%
50%
49%
22%
10%
0%
45%
MoonshotAI: Kimi K2.5
100%
60%
45%
39%
25%
24%
24%
16%
7%
0%
34%
Llama 3.2 11B (Vision)
77%
66%
61%
49%
43%
0%
0%
0%
0%
0%
30%
Claude 3.7 Sonnet
50%
49%
49%
48%
41%
30%
26%
2%
1%
0%
30%
Cohere Command R+ (Apr. 2024)
50%
50%
49%
48%
47%
43%
0%
0%
0%
0%
29%
Claude Opus 4
59%
50%
50%
47%
47%
18%
10%
7%
1%
0%
29%
Claude 2.0
90%
50%
50%
49%
20%
18%
0%
0%
0%
0%
28%
GPT-4o Mini (temp=0)
50%
47%
47%
47%
41%
34%
7%
2%
0%
0%
27%
Claude 3.5 Sonnet (new)
49%
49%
49%
43%
41%
22%
10%
1%
0%
0%
26%
Claude Sonnet 4
48%
47%
45%
44%
23%
4%
4%
1%
0%
0%
22%
Cohere Command R+ (Aug. 2024)
85%
79%
48%
2%
1%
0%
0%
0%
0%
0%
22%
Hermes 3 70B
81%
59%
39%
22%
13%
0%
0%
0%
0%
0%
21%
Llama 3.1 8B
59%
50%
30%
30%
22%
22%
0%
0%
0%
0%
21%
Qwen 2 7B
50%
42%
41%
29%
23%
22%
0%
0%
0%
0%
21%
GPT-4 Turbo
69%
64%
50%
11%
6%
5%
0%
0%
0%
0%
20%
Gemini 2.5 Flash Lite
50%
49%
43%
41%
10%
7%
0%
0%
0%
0%
20%
Claude Sonnet 4.5
48%
47%
46%
38%
13%
3%
1%
0%
0%
0%
20%
Llama 3 70B
58%
41%
35%
32%
26%
0%
0%
0%
0%
0%
19%
GPT-4o, Aug. 6th (temp=1)
50%
46%
45%
26%
15%
7%
1%
1%
0%
0%
19%
GPT-4.1
50%
48%
34%
26%
22%
3%
0%
0%
0%
0%
18%
Phi-3 Medium 128k
57%
49%
41%
20%
11%
2%
1%
0%
0%
0%
18%
Mistral Medium
87%
50%
39%
5%
0%
0%
0%
0%
0%
0%
18%
Mistral Large
51%
50%
26%
24%
15%
9%
3%
2%
0%
0%
18%
Llama 3.2 3B
49%
47%
44%
27%
13%
1%
0%
0%
0%
0%
18%
Rocinante 12B
100%
33%
24%
16%
3%
1%
0%
0%
0%
0%
18%
Claude Haiku 4.5
52%
47%
45%
26%
5%
0%
0%
0%
0%
0%
18%
Inflection 3 (Productivity)
72%
47%
24%
24%
7%
0%
0%
0%
0%
0%
17%
Ministral 8B
48%
39%
36%
29%
20%
1%
0%
0%
0%
0%
17%
Hermes 3 405B
49%
48%
38%
31%
5%
1%
0%
0%
0%
0%
17%
Qwen 2 72B
50%
48%
39%
29%
5%
0%
0%
0%
0%
0%
17%
Magnum 72B
58%
50%
50%
11%
1%
1%
0%
0%
0%
0%
17%
Z.AI GLM 4.7
70%
60%
18%
16%
4%
0%
0%
0%
0%
0%
17%
Gemini 2.5 Pro
67%
50%
34%
7%
0%
0%
0%
0%
0%
0%
16%
WizardLM 2 8x22b
50%
41%
33%
21%
3%
3%
0%
0%
0%
0%
15%
Gemini 2.5 Flash
49%
45%
43%
14%
0%
0%
0%
0%
0%
0%
15%
GPT-4o, Aug. 6th (temp=0)
44%
38%
37%
30%
0%
0%
0%
0%
0%
0%
15%
Hermes 2 Theta 8B
50%
40%
38%
10%
5%
4%
1%
0%
0%
0%
15%
Llama 3.1 Euryale 70B v2.2
42%
40%
37%
18%
3%
0%
0%
0%
0%
0%
14%
Gemini Flash 1.5
64%
47%
29%
0%
0%
0%
0%
0%
0%
0%
14%
Fimbulvetr 11B v2
49%
29%
26%
15%
15%
3%
1%
0%
0%
0%
14%
Inflection 3 (PI)
48%
48%
26%
10%
5%
0%
0%
0%
0%
0%
14%
Phi-3 Mini 128k
48%
35%
26%
20%
5%
4%
0%
0%
0%
0%
14%
AI21 Jamba 1.5 Large
52%
48%
32%
3%
0%
0%
0%
0%
0%
0%
14%
Llama 3.1 405B
50%
50%
32%
0%
0%
0%
0%
0%
0%
0%
13%
GPT-4o Mini (temp=1)
49%
41%
34%
1%
0%
0%
0%
0%
0%
0%
12%
Llama 3 Euryale 70B v2.1
54%
36%
31%
0%
0%
0%
0%
0%
0%
0%
12%
Llama 3.1 70B
50%
34%
34%
4%
0%
0%
0%
0%
0%
0%
12%
Mistral Large 2
57%
50%
4%
4%
3%
0%
0%
0%
0%
0%
12%
Ministral 3B
48%
48%
16%
3%
3%
1%
0%
0%
0%
0%
12%
GPT-4o, May 13th (temp=1)
47%
44%
18%
8%
0%
0%
0%
0%
0%
0%
12%
Gemma 2 27B
49%
38%
19%
7%
3%
0%
0%
0%
0%
0%
12%
AI21 Jamba
50%
50%
13%
1%
0%
0%
0%
0%
0%
0%
11%
Gemini 3 Flash (Preview)
48%
43%
22%
0%
0%
0%
0%
0%
0%
0%
11%
Llama 3.2 1B
43%
42%
26%
1%
0%
0%
0%
0%
0%
0%
11%
Mistral NeMO
32%
31%
22%
20%
2%
1%
0%
0%
0%
0%
11%
Phi-3.5 Mini 128k
46%
22%
21%
14%
2%
2%
0%
0%
0%
0%
11%
MythoMist 7B
50%
50%
1%
1%
0%
0%
0%
0%
0%
0%
10%
MythoMax 13B
50%
30%
18%
3%
0%
0%
0%
0%
0%
0%
10%
GPT-4.1 Mini
43%
38%
7%
5%
3%
2%
1%
0%
0%
0%
10%
Liquid: LFM 40B MoE
47%
37%
12%
0%
0%
0%
0%
0%
0%
0%
10%
EVA Qwen 2.5 14B
71%
20%
4%
0%
0%
0%
0%
0%
0%
0%
10%
Gemini 3 Pro (Preview)
48%
41%
5%
0%
0%
0%
0%
0%
0%
0%
9%
Goliath 120B
43%
31%
14%
4%
1%
0%
0%
0%
0%
0%
9%
Llama 3.2 90B (Vision)
49%
22%
12%
3%
3%
2%
2%
1%
0%
0%
9%
Claude 3.5 Haiku
41%
27%
8%
3%
3%
1%
0%
0%
0%
0%
8%
Toppy M 7B
50%
31%
1%
0%
0%
0%
0%
0%
0%
0%
8%
Lumimaid v0.2 8B
43%
23%
10%
6%
0%
0%
0%
0%
0%
0%
8%
AI21 Jamba 1.5 Mini
41%
23%
14%
1%
0%
0%
0%
0%
0%
0%
8%
Z.AI GLM 4.5
46%
18%
0%
0%
0%
0%
0%
0%
0%
0%
6%
Claude 3.5 Sonnet
31%
19%
4%
3%
1%
1%
0%
0%
0%
0%
6%
Mistral Nemo 12B Celeste
31%
23%
0%
0%
0%
0%
0%
0%
0%
0%
5%
Gemma 2 9B
49%
0%
0%
0%
0%
0%
0%
0%
0%
0%
5%
DeepSeek-V2 Chat
45%
3%
0%
0%
0%
0%
0%
0%
0%
0%
5%
Magnum v2 72B
26%
14%
2%
1%
0%
0%
0%
0%
0%
0%
4%
Claude 3 Haiku
38%
1%
1%
0%
0%
0%
0%
0%
0%
0%
4%
Gemini Pro 1.5
26%
9%
4%
0%
0%
0%
0%
0%
0%
0%
4%
Claude 3.0 Sonnet
21%
8%
4%
2%
0%
0%
0%
0%
0%
0%
3%
Qwen 2.5 72B
14%
12%
5%
0%
0%
0%
0%
0%
0%
0%
3%
Writer: Palmyra X5
30%
0%
0%
0%
0%
0%
0%
0%
0%
0%
3%
Z.AI GLM 4.7 Flash
14%
2%
0%
0%
0%
0%
0%
0%
0%
0%
2%
lzlv 70B
10%
4%
0%
0%
0%
0%
0%
0%
0%
0%
1%
MN GRAND Gutenberg Lyra4 12B Madness
12%
0%
0%
0%
0%
0%
0%
0%
0%
0%
1%
Llama 3.1 Nemotron 70B
11%
1%
0%
0%
0%
0%
0%
0%
0%
0%
1%
GPT-4.1 Nano
11%
0%
0%
0%
0%
0%
0%
0%
0%
0%
1%
Sao10K L3.1 70B Hanami x1
8%
1%
0%
0%
0%
0%
0%
0%
0%
0%
1%
GPT-4o, May 13th (temp=0)
7%
0%
0%
0%
0%
0%
0%
0%
0%
0%
1%
Z.AI GLM 4.6
1%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Llama 3 TenyxChat-DaybreakStorywriter 70B
1%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Mistral Small Creative
1%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Claude 2.1
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
14.88%