anthropic/claude-3.5-sonnet

Claude 3.5 Sonnet

Release Date

Jun 20th, 2024

Context Size

200k

Reasoning

No

Benchmark Cost

$28.78

Speed

431.2 tok/s

Categories

20%40%60%80%100%Creative Writing78.7%Tooling100.0%Language85.6%Utility76.7%Reasoning90.3%Text Editing96.6%Rule Following69.7%Hallucination76.3%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
797877727175%
928382828084%
888585838184%
908582797282%
918585858186%
888683837984%
Detailed Writing Rules82.39%
genre
737371646469%
767574747174%
838181777680%
818078777678%
868382787781%
837974747376%
genre76.46%
Novelcrafter Default Prompt
807266615968%
807471696772%
797978767477%
858077767178%
878583807782%
828179797279%
Novelcrafter Default Prompt75.85%
78.24%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
969696949395%
969696969696%
979797979697%
959494949494%
95.57%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9999999999999999%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9189898888888889%
100100100100100100100100%
9999999999999999%
Generic Prompt98.54%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
9999999999999999%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9594949494949494%
10010010010010010097100%
100100100100100100100100%
Specific Prompt99.16%
98.85%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%