openai/gpt-4.1

GPT-4.1

Release Date

Apr 14th, 2025

Context Size

1m

Reasoning

No

Benchmark Cost

$5.27

Speed

118.7 tok/s

Categories

20%40%60%80%100%Creative Writing81.2%Tooling98.9%Language93.9%Utility90.6%Reasoning88.5%Text Editing94.4%Rule Following66.8%Hallucination95.2%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
828282787379%
858280787580%
868584837983%
848279787680%
818079797479%
858380797781%
Detailed Writing Rules80.39%
genre
867777777679%
868381797581%
868181817881%
848379767579%
848380787780%
878685817683%
genre80.68%
Novelcrafter Default Prompt
818178747478%
838279767379%
858377777580%
858282797380%
838281787680%
858583837983%
Novelcrafter Default Prompt79.89%
80.32%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
929191919091%
959594939093%
969696908793%
919190848187%
91.17%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
1001001009797979498%
100100100100100100100100%
9999999998989899%
10010010010010010099100%
100100100100100100100100%
100100100100100100100100%
9494949390817789%
100100100100100100100100%
9793898989898890%
Generic Prompt97.32%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9493929292929192%
100100100100100100100100%
100100100100100100100100%
Specific Prompt99.14%
98.23%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%