openai/gpt-5.1

GPT-5.1

Release Date

Nov 13th, 2025

Context Size

400k

Reasoning

Yes

Benchmark Cost

$17.12

Speed

60.3 tok/s

Categories

20%40%60%80%100%Creative Writing87.2%Tooling98.0%Language93.6%Utility95.3%Reasoning95.1%Text Editing98.5%Rule Following74.1%Hallucination98.4%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
878786858586%
949289888589%
888684838385%
919191908990%
908786848286%
929189898690%
Detailed Writing Rules87.70%
genre
838379787680%
878483818083%
858585838284%
868685858385%
898483827983%
878684807883%
genre82.98%
Novelcrafter Default Prompt
898988868687%
919088888488%
878686837984%
939292898590%
908787858387%
919089888388%
Novelcrafter Default Prompt87.50%
86.06%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
979695959596%
999999989898%
969696969495%
999896969597%
96.61%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
1001001001001001008998%
100100100100100100100100%
9999999999999899%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9898979696959596%
100100100100100100100100%
1001001009393898995%
Generic Prompt98.73%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
1001001009999999899%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9998989898989798%
100100100100100100100100%
100100100100999999100%
Specific Prompt99.67%
99.20%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
1001001001001001001001001006797%
96.67%