openai/gpt-4o-2024-05-13

GPT-4o, May 13th (temp=0)

Release Date

May 13th, 2024

Context Size

128k

Reasoning

No

Benchmark Cost

$21.59

Speed

159.1 tok/s

Categories

20%40%60%80%100%Creative Writing74.9%Tooling99.2%Language98.7%Utility83.1%Reasoning88.6%Text Editing95.4%Rule Following73.2%Hallucination69.8%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
747371706771%
797471717073%
817978767277%
857978757378%
787877757376%
777775747275%
Detailed Writing Rules74.99%
genre
797473737174%
716765636265%
818080787579%
828077767478%
837675726875%
797573737375%
genre74.17%
Novelcrafter Default Prompt
767371696872%
726969646368%
797977747176%
847777757477%
777575747375%
807978767578%
Novelcrafter Default Prompt74.22%
74.46%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
949494939393%
969695959495%
959494949194%
999594949495%
94.45%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9999999999999999%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9491909090897889%
100100100100100100100100%
9797979797979697%
Generic Prompt98.26%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
100100100100999999100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9796969694929195%
100100100100100100100100%
100100100100100100100100%
Specific Prompt99.39%
98.82%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%