openai/o4-mini

o4 Mini

Release Date

Apr 16th, 2025

Context Size

200k

Reasoning

Yes

Benchmark Cost

$13.16

Speed

106.2 tok/s

Categories

20%40%60%80%100%Creative Writing82.0%Tooling100.0%Language80.0%Utility96.3%Reasoning94.4%Text Editing90.6%Rule Following64.6%Hallucination98.8%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
818079787679%
848482787881%
848484838384%
878584838284%
908382818183%
868581797782%
Detailed Writing Rules82.11%
genre
797978757377%
817574736974%
848383817681%
838382797681%
828177756977%
828280787780%
genre78.17%
Novelcrafter Default Prompt
887876757378%
837978767478%
878481818083%
908484818184%
878383827983%
828281797880%
Novelcrafter Default Prompt80.94%
80.41%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
979796949295%
969696949295%
969696959596%
999896929095%
95.15%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9898989897969397%
1001001001001009999100%
100100100100100100100100%
1001001001001009999100%
9181807877494772%
1001001009999979799%
9797929188888291%
Generic Prompt95.30%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
10010010010010099586%
10010010010010010099100%
100100100100100100100100%
10010010010010010099100%
9999989898979698%
100100100100100100100100%
100100100100100969098%
Specific Prompt98.02%
96.66%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%