openai/gpt-5.1

GPT-5.1

Release Date

Nov 13th, 2025

Parameters

Context Size

400k

Benchmark Cost

$7.77

Speed

49.3 tok/s

Creative writing

62.70%

Rule following

89.92%

Utility

89.46%

Mathematics

100.00%

Tooling

96.18%

Language

82.45%

Logic

91.94%

Codex Violation Detection

Detects factual inconsistencies between a story bible and prose passages. The model must output structured XML identifying each violation with paragraph number and substring.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
matrix
0-shot ToolingUtilityLogicRule following
95%95%93%93%92%91%91%90%87%86%91%
0-shot ToolingUtilityLogicRule following
96%96%96%96%96%93%92%92%92%91%94%
0-shot ToolingUtilityLogicRule following
100%100%100%100%95%95%95%95%95%95%97%
0-shot ToolingUtilityLogicRule following
100%100%100%100%100%100%100%100%100%100%100%
tiers
0-shot ToolingUtilityLogicRule following
100%100%100%100%100%100%100%100%100%100%100%
0-shot ToolingUtilityLogicRule following
100%100%100%100%92%92%92%92%92%92%95%
0-shot ToolingUtilityLogicRule following
97%97%97%97%97%97%97%94%92%92%96%
0-shot ToolingUtilityLogicRule following
100%100%100%100%97%97%97%97%95%93%98%
96.37%

Dialogue tags

Various tasks related to dialogue tags in text.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
dialogue-200
0-shot Creative writingRule following
100%100%100%100%100%100%100%100%100%100%100%
0-shot Creative writingRule following
100%100%100%100%100%100%100%100%96%50%95%
0-shot Creative writingRule following
100%100%95%89%85%68%68%68%68%42%78%
dialogue-500
0-shot Creative writingRule following
100%49%48%35%21%2%0%0%0%0%25%
0-shot Creative writingRule following
87%83%64%50%2%0%0%0%0%0%29%
0-shot Creative writingRule following
50%41%34%24%5%3%2%1%0%0%16%
Ungrouped
0-shot Creative writingRule following
100%100%100%100%100%100%100%100%100%61%96%
62.70%

Language Comprehension

Does the model understand more than just English?

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Total
0-shot Language
100%100%100%100%100%100%
0-shot Language
100%100%100%100%0%80%
0-shot Language
100%100%0%0%0%40%
0-shot Language
100%100%100%100%0%80%
75.00%

Language Writing

Can the model generate text in different languages?

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Total
0-shot Language
94%86%85%83%81%86%
0-shot Language
91%91%88%84%75%86%
0-shot Language
100%100%100%100%50%90%
0-shot Language
98%98%94%83%77%90%
0-shot Language
98%94%88%86%85%90%
88.40%

N-Length Sentences

Write sentences with exactly N words

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot Rule following
100%100%100%100%100%100%99%99%99%92%99%
0-shot Rule following
100%100%100%100%100%100%100%100%100%98%100%
0-shot Rule following
100%100%100%100%100%100%100%100%90%88%98%
98.86%

Novel outline

Handle questions about the outline of a novel in various formats

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
outline-count
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
pov-count
0-shot ToolingUtility
100%100%100%100%100%100%100%100%50%0%85%
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%0%90%
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
95.83%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%67%97%
96.67%

Voice/dialogue sheets

Extract dialogue from given text as voice sheets.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot Utility
100%100%0%0%0%0%0%0%0%0%20%
0-shot Utility
0%0%0%0%0%0%0%0%0%0%0%
1-shot Utility
100%100%100%100%100%100%100%100%100%100%100%
Few-shot Utility
100%100%100%100%100%100%100%100%100%100%100%
0-shot Utility
100%100%100%100%100%100%100%100%100%100%100%
64.00%