openai/gpt-4o-2024-08-06

GPT-4o, Aug. 6th (temp=0)

Release Date

Aug 6th, 2024

Context Size

128k

Reasoning

No

Benchmark Cost

$9.89

Speed

633.6 tok/s

Categories

20%40%60%80%100%Creative Writing73.6%Tooling99.9%Language75.0%Utility82.1%Reasoning87.6%Text Editing93.8%Rule Following74.2%Hallucination73.4%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
787573736773%
787474686371%
807773727175%
807978767578%
787675747475%
777776737275%
Detailed Writing Rules74.54%
genre
777164636368%
727272676670%
777575737274%
757474747374%
787674706873%
817673717174%
genre72.20%
Novelcrafter Default Prompt
747373726872%
787265636068%
777474746873%
818179767478%
747271696069%
857877747277%
Novelcrafter Default Prompt72.94%
73.22%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
949393939393%
949494939293%
949393939393%
949494949393%
93.29%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9999999999999999%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9393929291872081%
10096969696969697%
9797979797979797%
Generic Prompt97.01%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9999999592929295%
100100100100100100100100%
100100100100100100100100%
Specific Prompt99.50%
98.26%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%