anthropic/claude-opus-4.6

Claude Opus 4.6 (Reasoning)

Release Date

Feb 4th, 2026

Context Size

1m

Benchmark Cost

$44.01

Speed

54.5 tok/s

Creative writing

82.72%

Rule following

93.67%

Utility

95.22%

Mathematics

100.00%

Tooling

97.64%

Language

98.29%

Logic

91.95%

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
Creative writingRule following
878584848385%
Creative writingRule following
908988878688%
Creative writingRule following
908784838185%
Creative writingRule following
949492898992%
Creative writingRule following
919189898589%
Creative writingRule following
939389888890%
Detailed Writing Rules88.04%
genre
Creative writingRule following
828078767478%
Creative writingRule following
868383828283%
Creative writingRule following
818078787478%
Creative writingRule following
848481797681%
Creative writingRule following
828079757578%
Creative writingRule following
858281797881%
genre79.94%
Novelcrafter Default Prompt
Creative writingRule following
888685847984%
Creative writingRule following
878582797782%
Creative writingRule following
858380787781%
Creative writingRule following
898685848486%
Creative writingRule following
888786828085%
Creative writingRule following
838383828082%
Novelcrafter Default Prompt83.25%
83.74%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
UtilityRule following
999898989798%
UtilityRule following
1009999999799%
UtilityRule following
999999999999%
UtilityRule following
999999999999%
98.48%

Codex Red Herring (False Positive Detection)

Tests whether models correctly report "no violations" when a codex is fully consistent with the prose passage. Models that hallucinate false violations (false positives) fail. Uses a 2×2 matrix of text length × codex size, with bare and detailed-entry variants.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
basic entries
ToolingUtilityLogicRule following
100100100100100100100100100100100%
ToolingUtilityLogicRule following
100100100100100100100100100100100%
ToolingUtilityLogicRule following
100100100100100100100100100100100%
ToolingUtilityLogicRule following
100100100100100100100100100100100%
basic entries100.00%
detailed entries
ToolingUtilityLogicRule following
100100100100100100100100100100100%
ToolingUtilityLogicRule following
100100100100100100100100100100100%
ToolingUtilityLogicRule following
1001001001001001002525252570%
ToolingUtilityLogicRule following
100100100100100100100100252585%
detailed entries88.75%
94.38%

Codex Violation Detection

Detects factual inconsistencies between a story bible and prose passages. The model must output structured XML identifying each violation with paragraph number and substring.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
matrix
ToolingUtilityLogicRule following
9897979797979695949496%
ToolingUtilityLogicRule following
100100989898969494949497%
ToolingUtilityLogicRule following
100100100100100100100100929298%
ToolingUtilityLogicRule following
10010010010010010010010010097100%
matrix97.70%
tiers
ToolingUtilityLogicRule following
100100100100100100100100100100100%
ToolingUtilityLogicRule following
100100100100100100100100100100100%
ToolingUtilityLogicRule following
100100100100100100100100100100100%
ToolingUtilityLogicRule following
100100100100100100100100100100100%
tiers100.00%
98.85%

Dialogue tags

Various tasks related to dialogue tags in text.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
dialogue-200
Creative writingRule following
100100100100100100100100100100100%
Creative writingRule following
1001001009999999895615090%
Creative writingRule following
10010010010098989588818094%
dialogue-20094.64%
dialogue-500
Creative writingRule following
100100999687833836241368%
Creative writingRule following
868075665350504818153%
Creative writingRule following
9186865655534834342257%
dialogue-50058.90%
Ungrouped
Creative writingRule following
100100100100100100100100100100100%
80.09%

N-Length Sentences

Write sentences with exactly N words

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
Rule following
100100100100100100100100100100100%
Rule following
100100100100100100100100989499%
Rule following
100100100100100100100100100100100%
99.75%

Novel outline

Handle questions about the outline of a novel in various formats

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
outline-count
ToolingUtility
100100100100100100100100100100100%
ToolingUtility
100100100100100100100100100100100%
ToolingUtility
100100100100100100100100100100100%
outline-count100.00%
pov-count
ToolingUtility
100100100100100100100100100100100%
ToolingUtility
100100100100100100100100100100100%
ToolingUtility
100100100100100100100100100100100%
pov-count100.00%
100.00%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
9999999999999999%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguageLogic
9897979696959596%
Rule followingLanguage
9999999999999999%
Rule followingLanguage
1001001001001009999100%
Generic Prompt99.34%
Specific Prompt
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguageLogic
100100999998989899%
Rule followingLanguage
100100100100100100100100%
Rule followingLanguage
100100100100100100100100%
Specific Prompt99.86%
99.60%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
ToolingUtility
100100100100100100100100100100100%
100.00%

Voice/dialogue sheets

Extract dialogue from given text as voice sheets.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100100100%
1-shot Utility
100100100100100100100100100100100%
Few-shot Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100100100%
100.00%