openai/gpt-4o-2024-08-06

GPT-4o, Aug. 6th (temp=1)

Release Date

Aug 6th, 2024

Context Size

128k

Reasoning

No

Benchmark Cost

$8.66

Speed

115.5 tok/s

Categories

20%40%60%80%100%Creative Writing75.5%Tooling99.7%Language82.2%Utility82.4%Reasoning86.9%Text Editing86.7%Rule Following67.9%Hallucination79.5%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
807877726675%
848479787781%
808077767177%
807978787378%
838078767378%
827776757477%
Detailed Writing Rules77.44%
genre
777573696973%
848280787680%
807372727173%
797575716974%
797671706973%
848278767378%
genre75.25%
Novelcrafter Default Prompt
757270686770%
838080767679%
777776757376%
868078717077%
808074726975%
828077757478%
Novelcrafter Default Prompt75.78%
76.16%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
939291919191%
959594938993%
949493929193%
939291918891%
92.15%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9999989898979798%
100100100100100100100100%
100100100100100100100100%
100100100100999999100%
9189898681782076%
1001001009796908996%
9997979797969396%
Generic Prompt96.28%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
1001001001001009999100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9893928875742077%
100100100100100100100100%
100100100100100100100100%
Specific Prompt97.42%
96.85%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%