openai/gpt-5

GPT-5

Release Date

Aug 7th, 2025

Context Size

400k

Reasoning

Yes

Benchmark Cost

$28.06

Speed

55.9 tok/s

Categories

20%40%60%80%100%Creative Writing86.9%Tooling100.0%Language91.5%Utility93.5%Reasoning95.7%Text Editing98.9%Rule Following77.1%Hallucination91.8%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
888784838285%
929189888789%
908685848486%
898787878687%
878483838284%
898686858586%
Detailed Writing Rules86.14%
genre
838178767679%
868582828083%
848482797881%
878786838385%
858580797180%
858383837982%
genre81.77%
Novelcrafter Default Prompt
878482827782%
919085848186%
868484828284%
938988878488%
898783828185%
868686858385%
Novelcrafter Default Prompt84.98%
84.30%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
979797969596%
999999999999%
989898989698%
1009999999999%
98.03%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9999999999999098%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9898989897979697%
100100100100100100100100%
10010010010093898996%
Generic Prompt99.00%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
10010010010010010099100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9999999998979698%
100100100100100100100100%
100100100100100100100100%
Specific Prompt99.80%
99.40%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%