openai/gpt-5

GPT-5

Release Date

Aug 7th, 2025

Parameters

Context Size

400k

Benchmark Cost

$13.52

Speed

55.7 tok/s

Creative writing

83.95%

Rule following

93.82%

Utility

81.93%

Mathematics

100.00%

Tooling

82.79%

Language

84.73%

Logic

91.68%

Codex Violation Detection

Detects factual inconsistencies between a story bible and prose passages. The model must output structured XML identifying each violation with paragraph number and substring.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
matrix
0-shot ToolingUtilityLogicRule following
93%93%93%92%92%91%91%90%90%90%92%
0-shot ToolingUtilityLogicRule following
98%98%96%96%96%96%96%94%93%92%96%
0-shot ToolingUtilityLogicRule following
100%100%95%95%95%95%90%90%90%86%94%
0-shot ToolingUtilityLogicRule following
100%100%100%100%100%100%100%100%100%100%100%
tiers
0-shot ToolingUtilityLogicRule following
100%100%100%100%100%100%91%91%91%91%96%
0-shot ToolingUtilityLogicRule following
100%100%100%100%92%92%92%92%92%92%95%
0-shot ToolingUtilityLogicRule following
97%97%97%97%97%97%97%97%97%92%97%
0-shot ToolingUtilityLogicRule following
100%100%100%100%100%100%97%95%92%92%98%
95.86%

Dialogue tags

Various tasks related to dialogue tags in text.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
dialogue-200
0-shot Creative writingRule following
100%100%100%100%100%100%100%100%100%100%100%
0-shot Creative writingRule following
100%100%100%100%100%100%100%100%100%49%95%
0-shot Creative writingRule following
100%97%89%68%68%50%49%49%45%2%62%
dialogue-500
0-shot Creative writingRule following
100%100%100%100%100%99%98%95%39%5%83%
0-shot Creative writingRule following
100%100%100%100%100%100%99%98%97%50%94%
0-shot Creative writingRule following
99%91%85%78%67%62%45%5%0%0%53%
Ungrouped
0-shot Creative writingRule following
100%100%100%100%100%100%100%100%100%100%100%
83.95%

Language Comprehension

Does the model understand more than just English?

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Total
0-shot Language
100%100%100%100%100%100%
0-shot Language
100%100%100%100%0%80%
0-shot Language
100%100%100%0%0%60%
0-shot Language
100%100%100%100%0%80%
80.00%

Language Writing

Can the model generate text in different languages?

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Total
0-shot Language
90%89%87%87%81%87%
0-shot Language
97%92%91%80%79%88%
0-shot Language
100%100%100%100%50%90%
0-shot Language
97%93%90%88%83%90%
0-shot Language
97%88%88%88%79%88%
88.51%

N-Length Sentences

Write sentences with exactly N words

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot Rule following
100%100%100%100%100%100%100%100%100%100%100%
0-shot Rule following
100%100%100%100%100%100%100%100%99%98%100%
0-shot Rule following
100%100%100%100%100%100%100%100%96%96%99%
99.60%

Novel outline

Handle questions about the outline of a novel in various formats

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
outline-count
0-shot ToolingUtility
100%100%100%100%100%100%0%0%0%0%60%
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%0%90%
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
pov-count
0-shot ToolingUtility
100%100%100%100%100%100%100%100%50%0%85%
0-shot ToolingUtility
100%100%0%0%0%0%0%0%0%0%20%
0-shot ToolingUtility
100%100%0%0%0%0%0%0%0%0%20%
62.50%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
100.00%

Voice/dialogue sheets

Extract dialogue from given text as voice sheets.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot Utility
0%0%0%0%0%0%0%0%0%0%0%
0-shot Utility
0%0%0%0%0%0%0%0%0%0%0%
1-shot Utility
100%100%100%100%100%100%100%100%100%100%100%
Few-shot Utility
100%100%100%100%100%100%100%100%100%0%90%
0-shot Utility
100%100%100%100%100%100%100%100%100%100%100%
58.00%