GPT-5.1 - NC Bench

openai/gpt-5.1

GPT-5.1

via OpenRouter

Release Date

Nov 13th, 2025

Parameters

–

Context Size

400k

Benchmark Cost

$7.77

Speed

49.3 tok/s

Creative writing

62.70%

Rule following

89.92%

Utility

89.46%

Mathematics

100.00%

Tooling

96.18%

Language

82.45%

Logic

91.94%

Codex Violation Detection

Detects factual inconsistencies between a story bible and prose passages. The model must output structured XML identifying each violation with paragraph number and substring.

Scenario	Run 1	Run 2	Run 3	Run 4	Run 5	Run 6	Run 7	Run 8	Run 9	Run 10	Total
matrix
Large codex (40 entries), long passage (1,019 words) 0-shot ToolingUtilityLogicRule following	95%	95%	93%	93%	92%	91%	91%	90%	87%	86%	91%
Large codex (40 entries), short passage (165 words) 0-shot ToolingUtilityLogicRule following	96%	96%	96%	96%	96%	93%	92%	92%	92%	91%	94%
Small codex (7 entries), long passage (734 words) 0-shot ToolingUtilityLogicRule following	100%	100%	100%	100%	95%	95%	95%	95%	95%	95%	97%
Small codex (7 entries), short passage (165 words) 0-shot ToolingUtilityLogicRule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
tiers
5 codex entries 0-shot ToolingUtilityLogicRule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
10 codex entries 0-shot ToolingUtilityLogicRule following	100%	100%	100%	100%	92%	92%	92%	92%	92%	92%	95%
20 codex entries 0-shot ToolingUtilityLogicRule following	97%	97%	97%	97%	97%	97%	97%	94%	92%	92%	96%
40 codex entries 0-shot ToolingUtilityLogicRule following	100%	100%	100%	100%	97%	97%	97%	97%	95%	93%	98%
96.37%

Data extraction

Extract key details from a given block of text.

Scenario	Run 1	Run 2	Run 3	Run 4	Run 5	Run 6	Run 7	Run 8	Run 9	Run 10	Total
All valid emails 0-shot Utility	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Contextual pronoun 0-shot Utility	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Fruits excluding citrus 0-shot Utility	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Future event time 0-shot UtilityLogic	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Guess the pet 0-shot UtilityLogic	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Highest-rated movie 0-shot Utility	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Indirect birth year 0-shot UtilityMathematicsLogic	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
What instrument does Lucy play? 0-shot UtilityLogic	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
What's the color of the car? 0-shot UtilityLogic	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
What's the correct time? 0-shot UtilityLogic	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
Who's the sister? 0-shot UtilityLogic	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Who's the tallest? 0-shot UtilityLogic	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
91.67%

Dialogue tags

Various tasks related to dialogue tags in text.

Scenario	Run 1	Run 2	Run 3	Run 4	Run 5	Run 6	Run 7	Run 8	Run 9	Run 10	Total
dialogue-200
Write 200 words with 10% dialogue 0-shot Creative writingRule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Write 200 words with 50% dialogue 0-shot Creative writingRule following	100%	100%	100%	100%	100%	100%	100%	100%	96%	50%	95%
Write 200 words with 90% dialogue 0-shot Creative writingRule following	100%	100%	95%	89%	85%	68%	68%	68%	68%	42%	78%
dialogue-500
Write 500 words with 30% dialogue 0-shot Creative writingRule following	100%	49%	48%	35%	21%	2%	0%	0%	0%	0%	25%
Write 500 words with 50% dialogue 0-shot Creative writingRule following	87%	83%	64%	50%	2%	0%	0%	0%	0%	0%	29%
Write 500 words with 70% dialogue 0-shot Creative writingRule following	50%	41%	34%	24%	5%	3%	2%	1%	0%	0%	16%
Ungrouped
Write unattributed dialogue 0-shot Creative writingRule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	61%	96%
62.70%

Language Comprehension

Does the model understand more than just English?

Scenario	Run 1	Run 2	Run 3	Run 4	Run 5	Total
Asking for directions (Dutch) 0-shot Language	100%	100%	100%	100%	100%	100%
Asking for directions (German) 0-shot Language	100%	100%	100%	100%	0%	80%
Friend got new kittens (German) 0-shot Language	100%	100%	0%	0%	0%	40%
Friend got new kittens (Tagalog) 0-shot Language	100%	100%	100%	100%	0%	80%
75.00%

Language Writing

Can the model generate text in different languages?

Scenario	Run 1	Run 2	Run 3	Run 4	Run 5	Total
Character dialogue (French) in a story 0-shot Language	94%	86%	85%	83%	81%	86%
Character dialogue (German) in a story 0-shot Language	91%	91%	88%	84%	75%	86%
Character dialogue (Hindi) in a story 0-shot Language	100%	100%	100%	100%	50%	90%
Character dialogue (Italian) in a story 0-shot Language	98%	98%	94%	83%	77%	90%
Character dialogue (Spanish) in a story 0-shot Language	98%	94%	88%	86%	85%	90%
88.40%

N-Length Sentences

Write sentences with exactly N words

Scenario	Run 1	Run 2	Run 3	Run 4	Run 5	Run 6	Run 7	Run 8	Run 9	Run 10	Total
Write sentences with 5 words each 0-shot Rule following	100%	100%	100%	100%	100%	100%	99%	99%	99%	92%	99%
Write sentences with 10 words each 0-shot Rule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	98%	100%
Write sentences with 20 words each 0-shot Rule following	100%	100%	100%	100%	100%	100%	100%	100%	90%	88%	98%
98.86%

Novel outline

Handle questions about the outline of a novel in various formats

Scenario	Run 1	Run 2	Run 3	Run 4	Run 5	Run 6	Run 7	Run 8	Run 9	Run 10	Total
outline-count
Count acts 0-shot ToolingUtility	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Count chapters 0-shot ToolingUtility	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Count scenes 0-shot ToolingUtility	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
pov-count
Count point of views for Jack and Olivia 0-shot ToolingUtility	100%	100%	100%	100%	100%	100%	100%	100%	50%	0%	85%
Count point of views for Jack Harper 0-shot ToolingUtility	100%	100%	100%	100%	100%	100%	100%	100%	100%	0%	90%
Count point of views for Olivia 0-shot ToolingUtility	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
95.83%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario	Run 1	Run 2	Run 3	Run 4	Run 5	Run 6	Run 7	Run 8	Run 9	Run 10	Total
Create alternate prose sections 0-shot ToolingUtility	100%	100%	100%	100%	100%	100%	100%	100%	100%	67%	97%
96.67%

Voice/dialogue sheets

Extract dialogue from given text as voice sheets.

Scenario	Run 1	Run 2	Run 3	Run 4	Run 5	Run 6	Run 7	Run 8	Run 9	Run 10	Total
Multiple speakers 0-shot Utility	100%	100%	0%	0%	0%	0%	0%	0%	0%	0%	20%
Simple 0-shot Utility	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
Simple (1-shot) 1-shot Utility	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Simple (5-shot) Few-shot Utility	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Unattributed dialogue 0-shot Utility	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
64.00%

Write N of X

Write exactly N words/sentences/paragraphs...

Scenario	Run 1	Run 2	Run 3	Run 4	Run 5	Run 6	Run 7	Run 8	Run 9	Run 10	Total
paragraphs
1 paragraph summary 0-shot Rule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
3 paragraph summary 0-shot Rule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
5 paragraph summary 0-shot Rule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
sentences
1 sentence summary 0-shot Rule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
3 sentence summary 0-shot Rule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
10 sentence summary 0-shot Rule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	98%	100%
20 sentence summary 0-shot Rule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	92%	99%
50 sentence summary 0-shot Rule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
words
10 word summary 0-shot Rule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
20 word summary 0-shot Rule following	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
50 word summary 0-shot Rule following	100%	100%	100%	100%	100%	100%	98%	98%	98%	92%	99%
100 word summary 0-shot Rule following	100%	100%	100%	100%	100%	98%	98%	92%	92%	92%	97%
200 word summary 0-shot Rule following	100%	98%	98%	98%	98%	92%	92%	77%	77%	27%	86%
98.54%