openai/gpt-5-mini

GPT-5 Mini

Release Date

Aug 7th, 2025

Parameters

Context Size

400k

Benchmark Cost

$2.27

Speed

77.9 tok/s

Creative writing

81.05%

Rule following

93.20%

Utility

90.78%

Mathematics

100.00%

Tooling

95.67%

Language

91.65%

Logic

91.57%

Codex Violation Detection

Detects factual inconsistencies between a story bible and prose passages. The model must output structured XML identifying each violation with paragraph number and substring.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
matrix
0-shot ToolingUtilityLogicRule following
95%94%94%93%93%92%92%92%92%91%93%
0-shot ToolingUtilityLogicRule following
96%94%94%94%94%93%91%89%83%78%91%
0-shot ToolingUtilityLogicRule following
95%95%90%90%90%90%90%90%90%86%91%
0-shot ToolingUtilityLogicRule following
100%100%100%100%100%100%100%100%100%100%100%
tiers
0-shot ToolingUtilityLogicRule following
100%100%100%100%100%100%100%100%100%100%100%
0-shot ToolingUtilityLogicRule following
100%92%92%92%92%92%92%92%92%92%93%
0-shot ToolingUtilityLogicRule following
92%89%89%89%89%89%89%88%86%85%89%
0-shot ToolingUtilityLogicRule following
100%100%100%100%100%100%100%100%95%95%99%
94.38%

Dialogue tags

Various tasks related to dialogue tags in text.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
dialogue-200
0-shot Creative writingRule following
100%100%100%100%100%100%100%100%100%99%100%
0-shot Creative writingRule following
100%100%100%100%100%100%100%100%50%50%90%
0-shot Creative writingRule following
100%100%72%50%50%50%50%50%50%49%62%
dialogue-500
0-shot Creative writingRule following
100%100%100%99%97%96%96%88%73%42%89%
0-shot Creative writingRule following
100%100%100%99%86%79%72%50%50%49%78%
0-shot Creative writingRule following
100%100%100%99%99%50%50%50%49%0%70%
Ungrouped
0-shot Creative writingRule following
100%100%100%100%100%100%61%61%61%0%78%
81.05%

Language Comprehension

Does the model understand more than just English?

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Total
0-shot Language
100%100%100%100%0%80%
0-shot Language
100%100%100%100%100%100%
0-shot Language
100%100%100%100%0%80%
0-shot Language
100%100%100%100%100%100%
90.00%

Language Writing

Can the model generate text in different languages?

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Total
0-shot Language
100%100%96%94%89%96%
0-shot Language
95%94%94%88%85%91%
0-shot Language
100%100%95%94%55%89%
0-shot Language
100%100%100%96%94%98%
0-shot Language
100%100%92%81%81%91%
92.97%

N-Length Sentences

Write sentences with exactly N words

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot Rule following
100%100%100%100%100%100%100%100%100%100%100%
0-shot Rule following
100%100%100%100%100%100%100%100%98%97%100%
0-shot Rule following
100%100%100%100%100%100%95%93%90%83%96%
98.56%

Novel outline

Handle questions about the outline of a novel in various formats

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
outline-count
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
pov-count
0-shot ToolingUtility
100%100%100%100%100%100%100%100%50%50%90%
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%0%90%
96.67%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
100.00%

Voice/dialogue sheets

Extract dialogue from given text as voice sheets.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot Utility
0%0%0%0%0%0%0%0%0%0%0%
0-shot Utility
100%100%100%100%100%100%100%0%0%0%70%
1-shot Utility
100%100%100%100%100%100%100%100%100%100%100%
Few-shot Utility
100%100%100%100%100%100%100%100%100%0%90%
0-shot Utility
100%100%100%100%100%100%100%100%100%100%100%
72.00%