NC Bench
Overview
About
Tests
Models
Dialogue tags
Various tasks related to dialogue tags in text.
Write 200 words with 90% dialogue
0-shot
Creative writing
Rule following
Model
Run 1
Run 2
Run 3
Run 4
Run 5
Run 6
Run 7
Run 8
Run 9
Run 10
Total
GPT-4o Mini (temp=0)
100%
100%
100%
100%
100%
100%
99%
98%
97%
90%
98%
Claude Opus 4.6
100%
100%
100%
99%
99%
99%
97%
90%
89%
86%
96%
Claude Opus 4.5
99%
98%
96%
96%
95%
91%
87%
67%
52%
49%
83%
Claude Opus 4
99%
99%
95%
93%
89%
86%
81%
68%
64%
21%
79%
MoonshotAI: Kimi K2.5
100%
97%
95%
87%
84%
81%
60%
53%
51%
50%
76%
Gemini 2.5 Flash Lite
95%
88%
88%
88%
84%
76%
67%
67%
50%
49%
75%
o4 Mini High
99%
97%
89%
82%
77%
75%
68%
53%
50%
49%
74%
Claude Haiku 4.5
100%
98%
93%
87%
87%
60%
55%
52%
50%
44%
73%
DeepSeek-V2 Chat
99%
92%
92%
87%
83%
72%
50%
50%
49%
49%
72%
GPT-4o Mini (temp=1)
95%
68%
68%
68%
68%
68%
68%
68%
66%
44%
68%
GPT-4o, Aug. 6th (temp=0)
68%
68%
68%
68%
68%
68%
68%
68%
68%
66%
68%
Claude 3.7 Sonnet
95%
84%
67%
67%
64%
59%
55%
52%
52%
50%
64%
GPT-4o, Aug. 6th (temp=1)
68%
68%
68%
68%
66%
66%
65%
62%
56%
49%
64%
Z.AI GLM 4.7
98%
93%
93%
68%
66%
65%
59%
50%
18%
18%
63%
GPT-4o, May 13th (temp=0)
68%
68%
68%
68%
68%
68%
68%
68%
52%
18%
62%
Llama 3.1 70B
90%
72%
72%
68%
68%
64%
64%
62%
53%
2%
61%
GPT-4 Turbo
91%
80%
79%
76%
67%
54%
49%
47%
44%
27%
61%
Llama 3.2 90B (Vision)
99%
94%
80%
80%
77%
51%
50%
50%
13%
3%
60%
Gemini 3 Flash (Preview)
94%
68%
68%
65%
64%
62%
59%
52%
32%
32%
60%
GPT-4.1 Nano
100%
99%
96%
92%
66%
63%
41%
18%
14%
2%
59%
GPT-4.1
68%
68%
68%
68%
66%
66%
52%
50%
48%
34%
59%
Llama 3.2 3B
100%
99%
71%
67%
50%
50%
49%
45%
44%
10%
58%
Qwen 2.5 72B
100%
100%
79%
72%
53%
50%
50%
50%
20%
0%
57%
Gemma 2 9B
95%
85%
75%
59%
57%
51%
50%
47%
20%
19%
56%
o4 Mini
100%
94%
82%
72%
55%
50%
40%
34%
23%
5%
55%
Claude 3.5 Sonnet
92%
91%
65%
65%
56%
56%
44%
40%
20%
18%
55%
Ministral 8B
95%
95%
78%
68%
51%
50%
49%
44%
18%
0%
55%
Writer: Palmyra X5
93%
93%
53%
51%
51%
50%
50%
50%
49%
0%
54%
Claude Sonnet 4.5
86%
61%
57%
50%
50%
50%
50%
49%
49%
36%
54%
AI21 Jamba 1.5 Large
99%
99%
97%
68%
59%
47%
34%
18%
14%
0%
54%
Llama 3 TenyxChat-DaybreakStorywriter 70B
75%
68%
68%
68%
64%
59%
50%
32%
19%
18%
52%
AI21 Jamba 1.5 Mini
100%
84%
80%
80%
54%
49%
35%
31%
5%
1%
52%
Gemini 2.5 Flash
68%
68%
66%
56%
50%
50%
49%
44%
29%
18%
50%
Mistral NeMO
50%
50%
50%
50%
50%
50%
50%
49%
47%
46%
49%
Llama 3.1 405B
85%
78%
77%
76%
60%
55%
30%
22%
1%
0%
48%
Magnum 72B
100%
99%
92%
50%
50%
48%
37%
3%
0%
0%
48%
Phi-3 Mini 128k
99%
68%
53%
51%
50%
49%
45%
43%
14%
0%
47%
Gemini 2.5 Pro
93%
67%
63%
63%
49%
45%
28%
18%
18%
18%
46%
GPT-4o, May 13th (temp=1)
68%
68%
66%
52%
52%
44%
44%
23%
23%
18%
46%
Z.AI GLM 4.6
77%
52%
51%
51%
50%
48%
48%
43%
18%
18%
46%
Llama 3.1 Nemotron 70B
97%
52%
52%
51%
51%
50%
50%
26%
19%
0%
45%
Z.AI GLM 4.5
91%
76%
50%
50%
50%
50%
28%
18%
18%
10%
44%
GPT-4.1 Mini
68%
68%
68%
67%
44%
40%
23%
19%
18%
18%
43%
Z.AI GLM 4.7 Flash
95%
62%
55%
50%
50%
47%
18%
18%
18%
18%
43%
Llama 3.2 11B (Vision)
93%
80%
53%
50%
50%
45%
33%
20%
1%
0%
42%
Hermes 3 70B
82%
72%
60%
50%
50%
48%
28%
22%
2%
0%
41%
Qwen 2 72B
90%
66%
50%
49%
47%
46%
30%
21%
13%
0%
41%
Llama 3 70B
68%
68%
68%
52%
50%
26%
20%
18%
18%
18%
41%
AI21 Jamba
97%
51%
50%
50%
50%
49%
48%
3%
1%
1%
40%
Claude 3.0 Sonnet
91%
80%
73%
66%
38%
31%
14%
2%
0%
0%
40%
Hermes 3 405B
60%
55%
51%
51%
45%
43%
34%
26%
14%
10%
39%
Goliath 120B
58%
50%
50%
50%
50%
48%
41%
30%
10%
1%
39%
Phi-3.5 Mini 128k
91%
50%
49%
49%
48%
45%
38%
0%
0%
0%
37%
Cohere Command R+ (Apr. 2024)
92%
56%
50%
49%
47%
41%
9%
1%
1%
0%
34%
Mistral Nemo 12B Celeste
95%
57%
50%
49%
48%
34%
0%
0%
0%
0%
33%
Magnum v2 72B
50%
48%
47%
46%
45%
45%
44%
5%
0%
0%
33%
Gemini Flash 1.5
54%
51%
50%
50%
48%
47%
15%
0%
0%
0%
31%
Sao10K L3.1 70B Hanami x1
94%
50%
48%
41%
32%
29%
18%
1%
0%
0%
31%
Llama 3.1 8B
62%
50%
48%
45%
41%
26%
22%
18%
0%
0%
31%
EVA Qwen 2.5 14B
50%
50%
45%
45%
33%
30%
30%
25%
2%
0%
31%
Claude 2.0
68%
49%
49%
32%
19%
18%
18%
18%
18%
18%
31%
Inflection 3 (PI)
82%
52%
50%
50%
49%
14%
5%
0%
0%
0%
30%
Ministral 3B
84%
50%
46%
30%
30%
21%
18%
15%
0%
0%
29%
Mistral Medium
72%
68%
50%
43%
30%
22%
10%
0%
0%
0%
29%
MythoMist 7B
88%
55%
50%
49%
34%
7%
5%
0%
0%
0%
29%
Claude 3 Haiku
50%
50%
50%
47%
38%
22%
18%
7%
1%
0%
28%
Mistral Large
50%
50%
48%
41%
41%
38%
12%
1%
0%
0%
28%
Claude Sonnet 4
51%
44%
40%
21%
20%
20%
19%
19%
19%
18%
27%
Claude 3.5 Sonnet (new)
56%
36%
32%
28%
21%
21%
20%
19%
19%
18%
27%
Lumimaid v0.2 8B
68%
50%
49%
39%
18%
18%
18%
3%
0%
0%
26%
Gemini 3 Pro (Preview)
50%
49%
49%
47%
43%
5%
5%
3%
1%
0%
25%
Gemma 2 27B
72%
49%
32%
18%
18%
18%
12%
5%
3%
0%
23%
Mistral Large 2
50%
49%
49%
39%
24%
7%
2%
1%
1%
0%
22%
Claude 3.5 Haiku
54%
53%
27%
21%
16%
8%
5%
3%
1%
0%
19%
MythoMax 13B
75%
54%
50%
7%
0%
0%
0%
0%
0%
0%
19%
Rocinante 12B
50%
49%
46%
22%
10%
4%
0%
0%
0%
0%
18%
Llama 3.2 1B
68%
45%
26%
25%
9%
0%
0%
0%
0%
0%
17%
Llama 3.1 Euryale 70B v2.2
53%
51%
50%
4%
2%
0%
0%
0%
0%
0%
16%
Qwen 2 7B
50%
49%
48%
2%
0%
0%
0%
0%
0%
0%
15%
Fimbulvetr 11B v2
50%
50%
45%
0%
0%
0%
0%
0%
0%
0%
14%
Inflection 3 (Productivity)
41%
34%
14%
13%
4%
4%
3%
0%
0%
0%
11%
WizardLM 2 8x22b
50%
44%
6%
4%
1%
0%
0%
0%
0%
0%
11%
Mistral Small Creative
47%
23%
18%
4%
3%
0%
0%
0%
0%
0%
10%
Phi-3 Medium 128k
50%
36%
3%
0%
0%
0%
0%
0%
0%
0%
9%
Liquid: LFM 40B MoE
41%
26%
14%
6%
2%
0%
0%
0%
0%
0%
9%
lzlv 70B
68%
11%
0%
0%
0%
0%
0%
0%
0%
0%
8%
Cohere Command R+ (Aug. 2024)
31%
10%
10%
1%
0%
0%
0%
0%
0%
0%
5%
MN GRAND Gutenberg Lyra4 12B Madness
42%
10%
0%
0%
0%
0%
0%
0%
0%
0%
5%
Llama 3 Euryale 70B v2.1
18%
4%
2%
1%
0%
0%
0%
0%
0%
0%
3%
Hermes 2 Theta 8B
17%
0%
0%
0%
0%
0%
0%
0%
0%
0%
2%
Gemini Pro 1.5
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Toppy M 7B
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Claude 2.1
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
40.48%