gpt-4o-mini-2024-07-18

GPT-4o Mini (temp=0)
via openai

Release Date

Jul 18th, 2024

Context Size

128k

Reasoning

No

Benchmark Cost

$0.44

Speed

Categories

20%40%60%80%100%Creative Writing73.1%Tooling98.5%Language75.0%Utility81.4%Reasoning81.3%Text Editing84.6%Rule Following58.8%Hallucination73.6%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
767572696371%
767471716872%
747373727273%
757474727073%
777372727273%
797876767477%
Detailed Writing Rules73.19%
genre
777672706973%
747170666369%
777473727173%
777675747375%
817471686872%
767271706972%
genre72.32%
Novelcrafter Default Prompt
747472686671%
777575737375%
747271706971%
777373717173%
777776747275%
797472716773%
Novelcrafter Default Prompt72.93%
72.81%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
888887868587%
878684828184%
898989898989%
797979787778%
84.57%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9595959595959595%
100100100100100100100100%
9696969696969696%
7474747474747474%
7676767676767676%
100100100100100100100100%
8989898989898989%
Generic Prompt92.20%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
9797979796969696%
100100100100100100100100%
100100100100100100100100%
9191919191919191%
9292919191919191%
100100100100100100100100%
8989898989898989%
Specific Prompt96.45%
94.32%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%