NC Bench
Overview
About
Tests
Models
N-Length Sentences
Write sentences with exactly N words
Write sentences with 20 words each
0-shot
Rule following
Model
Run 1
Run 2
Run 3
Run 4
Run 5
Run 6
Run 7
Run 8
Run 9
Run 10
Total
MoonshotAI: Kimi K2.5
100%
100%
100%
100%
100%
100%
100%
100%
94%
87%
98%
Gemini 3 Pro (Preview)
100%
100%
100%
100%
100%
100%
100%
100%
93%
80%
97%
Gemini 3 Flash (Preview)
100%
100%
100%
100%
100%
95%
95%
95%
94%
90%
97%
o4 Mini High
100%
100%
100%
100%
100%
100%
95%
92%
87%
83%
96%
o4 Mini
100%
100%
100%
100%
100%
100%
92%
92%
84%
83%
95%
Z.AI GLM 4.7
100%
100%
100%
100%
100%
100%
100%
100%
100%
51%
95%
Z.AI GLM 4.7 Flash
100%
100%
100%
94%
93%
90%
84%
74%
74%
60%
87%
Llama 3.1 405B
100%
100%
92%
92%
84%
84%
84%
76%
70%
69%
85%
Claude Opus 4.5
96%
92%
88%
88%
84%
80%
80%
75%
74%
60%
82%
GPT-4.1
93%
92%
92%
89%
87%
85%
84%
60%
59%
57%
80%
Llama 3.2 3B
100%
100%
92%
87%
76%
76%
72%
67%
60%
49%
78%
GPT-4o, May 13th (temp=0)
100%
100%
90%
88%
84%
80%
69%
64%
44%
18%
74%
Llama 3 70B
100%
93%
84%
83%
76%
70%
69%
69%
47%
32%
72%
Claude 3.5 Sonnet (new)
92%
84%
76%
76%
76%
69%
69%
67%
58%
52%
72%
GPT-4o Mini (temp=0)
72%
72%
72%
72%
72%
72%
72%
72%
72%
68%
71%
Gemini 2.5 Pro
89%
87%
80%
74%
74%
72%
69%
64%
40%
40%
69%
Llama 3.2 90B (Vision)
95%
83%
80%
80%
76%
67%
61%
54%
53%
34%
68%
Claude Opus 4
83%
75%
70%
70%
66%
64%
61%
61%
61%
53%
66%
GPT-4o Mini (temp=1)
100%
80%
80%
72%
70%
65%
59%
55%
34%
29%
64%
GPT-4o, May 13th (temp=1)
92%
87%
80%
74%
72%
57%
51%
48%
39%
13%
61%
Llama 3 TenyxChat-DaybreakStorywriter 70B
76%
70%
69%
69%
59%
59%
58%
55%
50%
38%
60%
Qwen 2.5 72B
100%
61%
61%
61%
61%
61%
61%
61%
40%
35%
60%
GPT-4.1 Nano
76%
76%
76%
75%
64%
59%
59%
57%
30%
26%
60%
Claude Opus 4.6
100%
93%
80%
61%
55%
47%
47%
37%
37%
32%
59%
Llama 3.1 70B
74%
72%
65%
61%
60%
59%
51%
46%
43%
38%
57%
Sao10K L3.1 70B Hanami x1
76%
75%
74%
58%
52%
46%
44%
40%
36%
25%
53%
Claude Sonnet 4.5
84%
69%
69%
62%
60%
58%
44%
42%
17%
2%
51%
GPT-4.1 Mini
87%
82%
67%
59%
59%
42%
38%
34%
32%
4%
50%
Claude Sonnet 4
83%
77%
72%
72%
66%
60%
23%
15%
12%
3%
48%
Qwen 2 72B
72%
67%
63%
59%
55%
47%
36%
34%
24%
12%
47%
Claude 3.5 Haiku
84%
83%
76%
54%
49%
47%
27%
16%
15%
3%
45%
Llama 3.1 8B
87%
84%
74%
48%
45%
33%
30%
25%
21%
5%
45%
Llama 3.1 Euryale 70B v2.2
83%
74%
66%
64%
59%
57%
47%
1%
0%
0%
45%
Claude 3.7 Sonnet
59%
53%
52%
52%
50%
47%
45%
27%
24%
4%
41%
Llama 3.1 Nemotron 70B
66%
55%
52%
45%
43%
37%
30%
29%
27%
27%
41%
GPT-4 Turbo
57%
49%
45%
43%
43%
38%
38%
37%
27%
22%
40%
Writer: Palmyra X5
67%
62%
53%
45%
42%
37%
22%
18%
8%
0%
35%
GPT-4o, Aug. 6th (temp=1)
66%
52%
50%
41%
36%
34%
29%
22%
14%
11%
35%
Llama 3.2 11B (Vision)
86%
59%
42%
40%
39%
39%
14%
12%
9%
2%
34%
Llama 3 Euryale 70B v2.1
56%
50%
44%
37%
37%
37%
22%
14%
10%
7%
31%
Claude 3.5 Sonnet
62%
54%
53%
44%
19%
18%
16%
13%
6%
1%
28%
Claude Haiku 4.5
71%
66%
52%
50%
33%
10%
0%
0%
0%
0%
28%
Gemma 2 9B
47%
45%
44%
42%
31%
18%
15%
14%
3%
0%
26%
Magnum 72B
61%
57%
43%
41%
29%
12%
9%
1%
0%
0%
25%
Qwen 2 7B
100%
61%
29%
27%
14%
14%
1%
1%
0%
0%
25%
Magnum v2 72B
49%
49%
39%
38%
32%
19%
14%
4%
0%
0%
24%
Gemma 2 27B
44%
43%
39%
28%
27%
20%
14%
12%
5%
0%
23%
Mistral Small Creative
44%
40%
30%
28%
22%
21%
16%
13%
3%
0%
22%
Llama 3.2 1B
69%
61%
35%
15%
15%
15%
5%
0%
0%
0%
21%
Claude 2.0
35%
33%
28%
27%
26%
14%
12%
12%
10%
10%
21%
MythoMist 7B
100%
37%
15%
12%
7%
6%
3%
1%
0%
0%
18%
Gemini Pro 1.5
44%
30%
28%
25%
25%
15%
4%
3%
0%
0%
18%
Inflection 3 (Productivity)
40%
30%
24%
20%
18%
13%
12%
11%
0%
0%
17%
Inflection 3 (PI)
37%
33%
32%
23%
16%
14%
7%
3%
0%
0%
16%
Gemini Flash 1.5
27%
25%
20%
20%
17%
13%
13%
13%
12%
0%
16%
GPT-4o, Aug. 6th (temp=0)
25%
21%
17%
15%
13%
13%
12%
12%
12%
11%
15%
Z.AI GLM 4.6
59%
36%
13%
6%
5%
5%
2%
1%
0%
0%
13%
Toppy M 7B
36%
32%
22%
20%
12%
2%
0%
0%
0%
0%
13%
Lumimaid v0.2 8B
52%
30%
14%
12%
10%
5%
0%
0%
0%
0%
12%
Gemini 2.5 Flash Lite
57%
34%
4%
4%
2%
0%
0%
0%
0%
0%
10%
WizardLM 2 8x22b
41%
21%
19%
5%
5%
1%
1%
0%
0%
0%
9%
Phi-3 Medium 128k
32%
15%
15%
10%
7%
5%
3%
0%
0%
0%
9%
MN GRAND Gutenberg Lyra4 12B Madness
17%
16%
11%
10%
10%
7%
6%
6%
2%
0%
9%
AI21 Jamba 1.5 Mini
61%
20%
5%
0%
0%
0%
0%
0%
0%
0%
9%
Ministral 3B
32%
19%
15%
11%
4%
0%
0%
0%
0%
0%
8%
Hermes 3 70B
28%
16%
15%
8%
7%
7%
0%
0%
0%
0%
8%
lzlv 70B
25%
20%
18%
7%
7%
0%
0%
0%
0%
0%
8%
Hermes 2 Theta 8B
32%
18%
15%
8%
3%
2%
0%
0%
0%
0%
8%
Claude 3 Haiku
27%
15%
13%
10%
4%
0%
0%
0%
0%
0%
7%
Phi-3 Mini 128k
23%
20%
8%
8%
3%
1%
1%
0%
0%
0%
7%
Fimbulvetr 11B v2
19%
10%
9%
8%
7%
2%
0%
0%
0%
0%
6%
Cohere Command R+ (Aug. 2024)
25%
19%
3%
0%
0%
0%
0%
0%
0%
0%
5%
Mistral Nemo 12B Celeste
12%
12%
11%
9%
0%
0%
0%
0%
0%
0%
4%
Liquid: LFM 40B MoE
25%
14%
1%
0%
0%
0%
0%
0%
0%
0%
4%
Hermes 3 405B
20%
16%
0%
0%
0%
0%
0%
0%
0%
0%
4%
Goliath 120B
15%
9%
5%
3%
0%
0%
0%
0%
0%
0%
3%
EVA Qwen 2.5 14B
21%
12%
0%
0%
0%
0%
0%
0%
0%
0%
3%
Claude 2.1
15%
10%
5%
0%
0%
0%
0%
0%
0%
0%
3%
Phi-3.5 Mini 128k
15%
14%
0%
0%
0%
0%
0%
0%
0%
0%
3%
Cohere Command R+ (Apr. 2024)
16%
12%
0%
0%
0%
0%
0%
0%
0%
0%
3%
Ministral 8B
13%
7%
2%
2%
1%
0%
0%
0%
0%
0%
2%
AI21 Jamba
15%
5%
3%
0%
0%
0%
0%
0%
0%
0%
2%
Mistral Medium
8%
3%
2%
0%
0%
0%
0%
0%
0%
0%
1%
MythoMax 13B
3%
3%
2%
0%
0%
0%
0%
0%
0%
0%
1%
Mistral Large
6%
0%
0%
0%
0%
0%
0%
0%
0%
0%
1%
Mistral NeMO
2%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Claude 3.0 Sonnet
1%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Gemini 2.5 Flash
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Rocinante 12B
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
DeepSeek-V2 Chat
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Z.AI GLM 4.5
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
AI21 Jamba 1.5 Large
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Mistral Large 2
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
32.96%