claude-opus-4-20250514

Claude Opus 4
via anthropic

Release Date

May 22nd, 2025

Context Size

200k

Reasoning

No

Benchmark Cost

$69.41

Speed

Categories

20%40%60%80%100%Creative Writing83.8%Tooling100.0%Language93.0%Utility88.8%Reasoning92.6%Text Editing97.3%Rule Following70.4%Hallucination75.7%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
918785797784%
908887858487%
949492908992%
929189868588%
948986858588%
918988857886%
Detailed Writing Rules87.49%
genre
787773736974%
827978787779%
868483787782%
858584837783%
888684828285%
878580787681%
genre80.54%
Novelcrafter Default Prompt
807878777277%
868685797683%
858484828083%
858581818082%
898786868286%
898584838285%
Novelcrafter Default Prompt82.56%
83.53%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
989898989898%
999897969697%
989797979797%
100100100999899%
97.85%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
1001001009999999599%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9696959594947792%
100100100100100100100100%
9796969696969596%
Generic Prompt98.58%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9998989796969597%
100100100100100100100100%
100100100100100100100100%
Specific Prompt99.67%
99.13%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%