anthropic/claude-sonnet-4.6

Claude Sonnet 4.6 (Reasoning)

Release Date

Feb 17th, 2026

Context Size

1m

Benchmark Cost

$43.81

Speed

69.8 tok/s

Creative writing

79.98%

Rule following

91.28%

Utility

91.14%

Mathematics

100.00%

Tooling

92.20%

Language

98.79%

Logic

87.50%

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
Creative writingRule following
918786858487%
Creative writingRule following
959290888790%
Creative writingRule following
868685848385%
Creative writingRule following
938986848387%
Creative writingRule following
898988888087%
Creative writingRule following
939291908690%
Detailed Writing Rules87.64%
genre
Creative writingRule following
807877747476%
Creative writingRule following
888584817883%
Creative writingRule following
857874717176%
Creative writingRule following
817777746775%
Creative writingRule following
807776767477%
Creative writingRule following
838280797079%
genre77.68%
Novelcrafter Default Prompt
Creative writingRule following
848380787881%
Creative writingRule following
908782827884%
Creative writingRule following
858280777580%
Creative writingRule following
858584818083%
Creative writingRule following
838079777579%
Creative writingRule following
877877757378%
Novelcrafter Default Prompt80.72%
82.01%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
UtilityRule following
989897979797%
UtilityRule following
989898979797%
UtilityRule following
989898979798%
UtilityRule following
1009896969597%
97.39%

Codex Red Herring (False Positive Detection)

Tests whether models correctly report "no violations" when a codex is fully consistent with the prose passage. Models that hallucinate false violations (false positives) fail. Uses a 2×2 matrix of text length × codex size, with bare and detailed-entry variants.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
basic entries
ToolingUtilityLogicRule following
100100100100100100100100171783%
ToolingUtilityLogicRule following
100100100100100100100100100100100%
ToolingUtilityLogicRule following
100100100100100100100100100100100%
ToolingUtilityLogicRule following
100100100100100100100100252585%
basic entries92.08%
detailed entries
ToolingUtilityLogicRule following
1001001001001001001001001002593%
ToolingUtilityLogicRule following
100100100100100100100100100100100%
ToolingUtilityLogicRule following
100100252525252525251739%
ToolingUtilityLogicRule following
10010010010025252525252555%
detailed entries71.67%
81.88%

Codex Violation Detection

Detects factual inconsistencies between a story bible and prose passages. The model must output structured XML identifying each violation with paragraph number and substring.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
matrix
ToolingUtilityLogicRule following
9998989898979796969497%
ToolingUtilityLogicRule following
10097979797979796969697%
ToolingUtilityLogicRule following
1001001001001001009783838395%
ToolingUtilityLogicRule following
100100100100100100100100100100100%
matrix97.27%
tiers
ToolingUtilityLogicRule following
100100100100100100100100100100100%
ToolingUtilityLogicRule following
100100100100100868686868693%
ToolingUtilityLogicRule following
9696969696929292928893%
ToolingUtilityLogicRule following
100100100100100100100100100100100%
tiers96.62%
96.95%

Dialogue tags

Various tasks related to dialogue tags in text.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
dialogue-200
Creative writingRule following
1001001001001001001001001009199%
Creative writingRule following
100100100100100999999979399%
Creative writingRule following
10097907572636056535072%
dialogue-20089.72%
dialogue-500
Creative writingRule following
10098989795928659534282%
Creative writingRule following
9673605550463931302650%
Creative writingRule following
54463935222218183126%
dialogue-50052.68%
Ungrouped
Creative writingRule following
1001001001001001001001001006196%
74.75%

N-Length Sentences

Write sentences with exactly N words

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
Rule following
100100100100100100100100100100100%
Rule following
100100100100100100100100100100100%
Rule following
100100100100100100100100100100100%
100.00%

Novel outline

Handle questions about the outline of a novel in various formats

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
outline-count
ToolingUtility
100100100100100100100100100100100%
ToolingUtility
100100100100100100100100100100100%
ToolingUtility
100100100100100100100100100090%
outline-count96.67%
pov-count
ToolingUtility
100100100100100100100100100100100%
ToolingUtility
100100100100100100100100100100100%
ToolingUtility
100100100100100100100100100100100%
pov-count100.00%
98.33%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
10099999999999999%
Rule followingLanguage
100100999999999999%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguageLogic
9796969695959496%
Rule followingLanguage
10010010010010010099100%
Rule followingLanguage
10010010010099979799%
Generic Prompt99.22%
Specific Prompt
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguageLogic
10099999898989899%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Specific Prompt99.85%
99.54%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
ToolingUtility
100100100100100100100100100100100%
100.00%

Voice/dialogue sheets

Extract dialogue from given text as voice sheets.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100100100%
1-shot Utility
100100100100100100100100100090%
Few-shot Utility
100100100100100100000060%
Utility
100100100100100100100100100100100%
90.00%