openai/o4-mini-high

o4 Mini High

Release Date

Apr 16th, 2025

Context Size

200k

Reasoning

Yes

Benchmark Cost

$25.42

Speed

108.9 tok/s

Categories

20%40%60%80%100%Creative Writing82.7%Tooling100.0%Language79.8%Utility98.7%Reasoning95.0%Text Editing94.4%Rule Following72.7%Hallucination99.1%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
848177757478%
848180797881%
838181807981%
898583838184%
868180797981%
878684807883%
Detailed Writing Rules81.40%
genre
787575726974%
818178767678%
848180807881%
868280787680%
807878777177%
828281817981%
genre78.48%
Novelcrafter Default Prompt
828281807881%
888381817682%
948985838287%
878481808082%
858484838384%
848484837883%
Novelcrafter Default Prompt83.01%
80.96%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
989897979697%
989797969596%
979797969697%
999998959497%
96.85%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9999989897969598%
10010010010010010099100%
100100100100100100100100%
10010010010010010099100%
9494929286834984%
100100100100100999799%
9696939292898992%
Generic Prompt97.04%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
10010010099999999100%
100100100100100100100100%
100100100100100100100100%
1001001001001009999100%
9998989796959296%
100100100100100100100100%
1001001001001001009399%
Specific Prompt99.43%
98.23%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%