openai/gpt-4o-2024-05-13

GPT-4o, May 13th (temp=1)

Release Date

May 13th, 2024

Context Size

128k

Reasoning

No

Benchmark Cost

$19.93

Speed

176.7 tok/s

Categories

20%40%60%80%100%Creative Writing75.9%Tooling99.7%Language92.5%Utility80.7%Reasoning86.0%Text Editing92.4%Rule Following69.9%Hallucination73.4%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
787773737275%
787878767377%
828177767578%
857978787679%
808078777678%
838279766978%
Detailed Writing Rules77.49%
genre
777673716572%
828076757277%
817976716574%
827675727175%
818175717076%
797675747375%
genre75.07%
Novelcrafter Default Prompt
787573717073%
818079777478%
828074736775%
797977767277%
828080797078%
817877727076%
Novelcrafter Default Prompt76.17%
76.24%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
939392918691%
959593939294%
959493919092%
949190898790%
91.84%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9999999898989898%
100100100100100100100100%
1001001001001001009399%
100100100100100100100100%
9189898679777684%
100100100100100999699%
10099979797969697%
Generic Prompt97.54%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
10099999999999999%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9594929090898991%
100100100100100100100100%
100100100100100100100100%
Specific Prompt98.99%
98.27%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%