claude-sonnet-4-20250514

Claude Sonnet 4
via anthropic

Release Date

May 22nd, 2025

Context Size

200k

Reasoning

No

Benchmark Cost

$12.94

Speed

Categories

20%40%60%80%100%Creative Writing79.2%Tooling100.0%Language91.3%Utility84.0%Reasoning94.5%Text Editing99.1%Rule Following81.5%Hallucination80.1%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
797671716773%
959087838087%
938986868688%
908886818085%
969190867888%
919089847786%
Detailed Writing Rules84.56%
genre
747468676469%
818078767578%
858179787880%
797974736574%
808079787779%
868275757578%
genre76.49%
Novelcrafter Default Prompt
757472696771%
838177777579%
888680797281%
878078777780%
898381797782%
807877767276%
Novelcrafter Default Prompt78.21%
79.75%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
989796959496%
969595959595%
989696969697%
999999999999%
96.69%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9999999999999999%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9797979696969697%
100100100100100100100100%
100100100100999999100%
Generic Prompt99.55%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
10010010010010010099100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
10099989897979698%
100100100100100100100100%
100100100100100100100100%
Specific Prompt99.77%
99.66%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%