NC Bench
Overview
About
Tests
Models
N-Length Sentences
Write sentences with exactly N words
Write sentences with 10 words each
0-shot
Rule following
Model
Run 1
Run 2
Run 3
Run 4
Run 5
Run 6
Run 7
Run 8
Run 9
Run 10
Total
Gemini 3 Pro (Preview)
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
MoonshotAI: Kimi K2.5
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
Gemini 3 Flash (Preview)
100%
100%
100%
100%
100%
100%
100%
100%
100%
98%
100%
Claude Opus 4.5
100%
100%
100%
100%
100%
100%
100%
100%
100%
97%
100%
Llama 3.1 405B
100%
100%
100%
100%
100%
100%
100%
100%
97%
97%
99%
o4 Mini High
100%
100%
100%
100%
100%
100%
100%
97%
96%
96%
99%
Claude Opus 4
100%
100%
100%
100%
100%
100%
100%
100%
100%
88%
99%
o4 Mini
100%
100%
100%
100%
100%
100%
97%
97%
97%
97%
99%
Llama 3 TenyxChat-DaybreakStorywriter 70B
100%
100%
100%
100%
100%
100%
100%
97%
95%
95%
99%
Llama 3.1 Nemotron 70B
100%
100%
100%
100%
100%
97%
97%
97%
95%
95%
98%
Claude 3.5 Sonnet (new)
100%
100%
100%
100%
97%
97%
97%
97%
97%
95%
98%
Llama 3.2 90B (Vision)
100%
100%
100%
100%
97%
97%
97%
97%
96%
91%
98%
Claude 3.5 Haiku
100%
100%
100%
100%
97%
97%
95%
95%
95%
92%
97%
Claude Sonnet 4.5
100%
100%
100%
100%
100%
98%
97%
96%
90%
87%
97%
Llama 3.1 70B
100%
100%
100%
100%
97%
97%
97%
95%
87%
87%
96%
Llama 3.1 8B
100%
100%
100%
98%
97%
96%
95%
93%
91%
77%
95%
Claude Opus 4.6
100%
100%
100%
100%
100%
100%
99%
98%
76%
74%
95%
Llama 3.2 11B (Vision)
100%
100%
100%
100%
100%
100%
96%
94%
80%
77%
95%
Llama 3.2 3B
100%
97%
97%
97%
95%
95%
92%
92%
90%
87%
94%
Llama 3 70B
100%
97%
97%
95%
95%
95%
92%
92%
90%
87%
94%
GPT-4o, Aug. 6th (temp=1)
100%
97%
95%
95%
95%
92%
92%
90%
90%
90%
94%
Gemini 2.5 Pro
100%
97%
96%
96%
95%
95%
93%
88%
88%
84%
93%
GPT-4o, Aug. 6th (temp=0)
100%
95%
95%
95%
92%
92%
92%
92%
90%
90%
93%
GPT-4.1 Mini
97%
97%
96%
95%
92%
92%
92%
90%
88%
84%
92%
GPT-4o Mini (temp=1)
97%
95%
95%
95%
92%
92%
92%
92%
90%
83%
92%
Claude 3.7 Sonnet
100%
94%
94%
93%
93%
93%
93%
91%
91%
81%
92%
GPT-4 Turbo
98%
97%
94%
94%
91%
91%
88%
88%
87%
87%
92%
GPT-4.1
100%
95%
95%
95%
92%
92%
92%
90%
83%
79%
91%
Claude 3.5 Sonnet
100%
100%
97%
95%
92%
90%
90%
86%
83%
78%
91%
Claude Sonnet 4
100%
100%
100%
100%
97%
95%
90%
88%
74%
64%
91%
Z.AI GLM 4.7
100%
100%
100%
100%
100%
100%
100%
90%
74%
39%
90%
Z.AI GLM 4.7 Flash
100%
100%
100%
100%
96%
96%
79%
77%
74%
74%
90%
GPT-4o Mini (temp=0)
92%
90%
90%
90%
90%
90%
90%
87%
84%
84%
89%
GPT-4o, May 13th (temp=0)
97%
97%
97%
97%
92%
92%
87%
87%
70%
63%
88%
Hermes 3 405B
97%
97%
92%
92%
90%
88%
87%
87%
84%
67%
88%
GPT-4o, May 13th (temp=1)
97%
94%
92%
92%
90%
87%
83%
80%
80%
65%
86%
Qwen 2.5 72B
100%
96%
94%
92%
91%
88%
85%
83%
65%
37%
83%
Sao10K L3.1 70B Hanami x1
97%
95%
95%
92%
90%
81%
72%
69%
66%
61%
82%
Magnum v2 72B
90%
90%
87%
87%
84%
84%
78%
78%
78%
57%
81%
Claude 3.0 Sonnet
100%
100%
90%
83%
79%
79%
74%
74%
70%
57%
81%
Llama 3.1 Euryale 70B v2.2
95%
93%
93%
86%
79%
77%
75%
75%
70%
60%
80%
Claude Haiku 4.5
89%
89%
86%
86%
82%
77%
74%
74%
74%
67%
80%
Mistral Small Creative
90%
90%
90%
87%
80%
79%
75%
70%
65%
63%
79%
Qwen 2 72B
94%
86%
85%
84%
84%
79%
77%
75%
65%
59%
79%
Writer: Palmyra X5
90%
89%
87%
79%
78%
77%
73%
73%
54%
48%
75%
GPT-4.1 Nano
83%
83%
82%
80%
79%
77%
72%
68%
66%
55%
74%
Claude 3 Haiku
97%
84%
84%
83%
83%
75%
74%
63%
61%
28%
73%
AI21 Jamba 1.5 Large
100%
100%
87%
85%
83%
74%
74%
74%
30%
25%
73%
Magnum 72B
95%
92%
90%
87%
76%
76%
76%
57%
51%
24%
72%
Lumimaid v0.2 8B
89%
87%
80%
80%
79%
76%
75%
63%
48%
25%
70%
Llama 3 Euryale 70B v2.1
88%
87%
87%
87%
83%
83%
74%
65%
42%
0%
70%
Z.AI GLM 4.6
100%
96%
90%
80%
77%
70%
55%
42%
35%
18%
66%
Phi-3 Medium 128k
80%
78%
77%
71%
67%
61%
59%
56%
53%
52%
66%
Cohere Command R+ (Aug. 2024)
95%
84%
81%
70%
70%
63%
56%
56%
18%
10%
60%
Phi-3.5 Mini 128k
76%
73%
70%
69%
65%
63%
56%
55%
49%
26%
60%
Gemini 2.5 Flash Lite
88%
78%
76%
61%
61%
60%
50%
39%
37%
22%
57%
WizardLM 2 8x22b
77%
74%
66%
66%
59%
57%
54%
52%
46%
19%
57%
Gemma 2 9B
83%
62%
60%
59%
57%
55%
43%
42%
40%
25%
52%
Inflection 3 (PI)
68%
64%
63%
58%
56%
52%
50%
48%
29%
17%
50%
Qwen 2 7B
84%
74%
74%
62%
61%
44%
29%
20%
19%
16%
48%
Gemini 2.5 Flash
69%
62%
62%
54%
52%
50%
45%
41%
27%
14%
48%
Llama 3.2 1B
100%
79%
74%
61%
46%
40%
32%
30%
7%
1%
47%
Cohere Command R+ (Apr. 2024)
83%
70%
70%
70%
53%
35%
35%
11%
9%
1%
44%
Phi-3 Mini 128k
80%
59%
57%
56%
45%
40%
34%
31%
23%
0%
42%
Gemma 2 27B
82%
48%
45%
43%
43%
40%
39%
32%
21%
21%
41%
Claude 2.1
70%
70%
61%
40%
38%
33%
27%
26%
25%
13%
40%
AI21 Jamba 1.5 Mini
74%
74%
68%
59%
52%
30%
30%
7%
7%
0%
40%
Mistral Large 2
81%
70%
57%
46%
43%
28%
19%
19%
17%
14%
39%
Claude 2.0
74%
71%
59%
55%
44%
39%
16%
14%
10%
5%
39%
Toppy M 7B
62%
57%
48%
42%
42%
37%
32%
23%
21%
0%
36%
Gemini Pro 1.5
55%
49%
47%
41%
37%
35%
34%
31%
28%
0%
36%
Gemini Flash 1.5
86%
57%
45%
39%
31%
30%
28%
23%
13%
5%
36%
Hermes 3 70B
81%
57%
42%
41%
40%
33%
24%
8%
7%
6%
34%
Mistral Large
65%
51%
51%
42%
38%
26%
21%
21%
16%
4%
33%
Z.AI GLM 4.5
56%
54%
48%
39%
39%
39%
25%
11%
10%
5%
33%
Inflection 3 (Productivity)
56%
53%
52%
46%
46%
31%
12%
11%
9%
7%
32%
EVA Qwen 2.5 14B
67%
61%
61%
34%
31%
31%
10%
6%
4%
0%
30%
Mistral Medium
73%
53%
50%
35%
34%
18%
9%
5%
2%
2%
28%
lzlv 70B
60%
52%
52%
45%
28%
26%
9%
5%
1%
0%
28%
Mistral NeMO
57%
48%
45%
24%
24%
15%
14%
11%
10%
9%
26%
Ministral 3B
39%
32%
31%
30%
30%
27%
26%
21%
7%
6%
25%
AI21 Jamba
62%
53%
52%
32%
22%
6%
5%
3%
3%
0%
24%
Mistral Nemo 12B Celeste
34%
31%
28%
27%
23%
22%
19%
19%
19%
15%
24%
Rocinante 12B
56%
43%
35%
32%
26%
21%
14%
3%
1%
0%
23%
Fimbulvetr 11B v2
50%
46%
32%
28%
24%
21%
14%
2%
1%
0%
22%
Ministral 8B
43%
36%
34%
28%
28%
16%
15%
10%
0%
0%
21%
MythoMist 7B
51%
30%
27%
26%
19%
18%
9%
9%
0%
0%
19%
Goliath 120B
58%
51%
25%
24%
24%
3%
1%
0%
0%
0%
18%
MN GRAND Gutenberg Lyra4 12B Madness
31%
26%
20%
20%
19%
18%
13%
13%
12%
7%
18%
Hermes 2 Theta 8B
36%
34%
26%
21%
15%
7%
3%
1%
1%
1%
14%
DeepSeek-V2 Chat
25%
23%
18%
18%
15%
13%
10%
8%
7%
4%
14%
Liquid: LFM 40B MoE
46%
30%
20%
15%
7%
4%
1%
1%
1%
0%
13%
MythoMax 13B
37%
15%
4%
2%
1%
0%
0%
0%
0%
0%
6%
65.02%