openai/gpt-5.4

GPT-5.4

Release Date

Mar 5th, 2026

Context Size

1m

Reasoning

No

Benchmark Cost

$9.35

Speed

51.1 tok/s

Categories

20%40%60%80%100%Creative Writing90.9%Tooling97.6%Language81.5%Utility81.9%Reasoning93.9%Text Editing96.7%Rule Following58.1%Hallucination73.8%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
919190888789%
929189898990%
949391898691%
949490908992%
959392908892%
949393929293%
Detailed Writing Rules90.98%
genre
878684828184%
908989878688%
929189888789%
959292908992%
949393919092%
949292919192%
genre89.58%
Novelcrafter Default Prompt
949291898690%
969190878690%
898988888588%
939292908991%
939189878389%
939392919192%
Novelcrafter Default Prompt89.96%
90.17%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
979494939394%
959595949395%
979797969296%
878787868486%
92.70%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100898997%
100100100100100100100100%
9999999999999999%
100100100100999999100%
100100100100100100100100%
10010010010099999799%
9696969692919094%
100100100100100100100100%
10093918989898991%
Generic Prompt97.79%
Specific Prompt
1001001008989898994%
100100100100100100100100%
9999999999999899%
10010010010010010099100%
100100100100100100100100%
10010010010010010099100%
9897979796969597%
100100100100100100100100%
100100100100100100100100%
Specific Prompt98.77%
98.28%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
1001001001001001001001001006797%
96.67%