gpt-4o-mini-2024-07-18

GPT-4o Mini (temp=1)
via openai

Release Date

Jul 18th, 2024

Context Size

128k

Reasoning

No

Benchmark Cost

$0.42

Speed

Categories

20%40%60%80%100%Creative Writing74.4%Tooling99.3%Language77.5%Utility82.2%Reasoning80.3%Text Editing85.8%Rule Following56.5%Hallucination76.8%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
767473727173%
797878757577%
817670696773%
808076767377%
777574726973%
878077767479%
Detailed Writing Rules75.26%
genre
838276766777%
817676737376%
747471696771%
817575757476%
747369696871%
797976727176%
genre74.30%
Novelcrafter Default Prompt
777675737074%
797574747275%
857574727175%
797977737076%
818080767378%
797875747476%
Novelcrafter Default Prompt75.65%
75.07%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
888787858286%
878684848485%
909088888488%
857977767679%
84.29%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9995959595959596%
100100100100100100100100%
100100969696969697%
7474747474747474%
8989887876767682%
100100100100100100100100%
9389898989898989%
Generic Prompt93.07%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
9999979696969597%
100100100100100100100100%
100100100100100100100100%
9191919191919191%
9190908989897688%
100100100100100100100100%
9393898989898990%
Specific Prompt96.20%
94.63%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%