openai/gpt-5.4

GPT-5.4 (Reasoning)

Release Date

Mar 5th, 2026

Context Size

1m

Reasoning

Yes

Benchmark Cost

$22.15

Speed

48.7 tok/s

Categories

20%40%60%80%100%Creative Writing91.2%Tooling100.0%Language94.9%Utility96.9%Reasoning94.8%Text Editing98.4%Rule Following79.3%Hallucination90.4%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
949392929092%
949393929293%
898988858487%
969292908691%
918988878488%
929190908991%
Detailed Writing Rules90.15%
genre
938989898689%
938887878688%
898988888788%
949393919192%
939190908891%
949392898791%
genre89.98%
Novelcrafter Default Prompt
949090898990%
969393918592%
888686868486%
959493919093%
919089888689%
959592928993%
Novelcrafter Default Prompt90.37%
90.17%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
989898989798%
989898989798%
989898979797%
949494949494%
96.63%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
10099999999999699%
1001001009999999899%
100100100100100100100100%
100100100100100997496%
9898989797979697%
100100100100100100100100%
9992898888888189%
Generic Prompt97.86%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
100100989898989698%
100100100100100100100100%
1001001001001009999100%
Specific Prompt99.80%
98.83%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%