Anthropic

Comparing 13 models from Anthropic.

Model Total ▼ Released Context CoTTooling Creative Writing Language Utility Reasoning Text Editing Rule Following Hallucination
Claude Opus 4.6 (Reasoning)95.02%Feb 4, 261m100.00%84.55%96.12%98.93%93.77%98.86%89.78%98.13%
Claude Sonnet 4.6 (Reasoning)93.66%Feb 17, 261m100.00%83.09%97.58%97.88%92.76%98.30%85.73%93.96%
Claude Opus 4.692.35%Feb 4, 261m100.00%83.59%96.13%90.72%93.33%98.35%83.11%93.60%
Claude Sonnet 4.691.15%Feb 17, 261m100.00%83.31%100.00%88.52%88.48%96.37%82.50%89.99%
Claude Opus 4.589.69%Nov 24, 25200k100.00%81.71%99.66%89.84%93.93%97.69%72.61%82.06%
Claude Sonnet 488.72%May 22, 25200k100.00%79.21%91.31%84.02%94.48%99.13%81.52%80.08%
Claude Sonnet 4.588.03%Sep 29, 251m100.00%84.19%92.39%83.78%92.50%99.02%76.80%75.57%
Claude Opus 487.69%May 22, 25200k100.00%83.79%93.01%88.81%92.59%97.25%70.37%75.68%
Claude Haiku 4.585.14%Oct 15, 25200k99.10%78.96%91.84%72.48%87.76%96.81%70.35%83.86%
Claude 3.5 Sonnet84.24%Jun 20, 24200k100.00%78.69%85.62%76.75%90.30%96.57%69.67%76.31%
Claude 3.5 Haiku83.73%Oct 22, 24200k99.69%75.28%82.12%82.57%82.23%64.18%100.00%
Claude 3.7 Sonnet83.39%Feb 19, 25200k99.32%76.31%92.95%62.54%89.94%97.12%73.78%75.18%
Claude 3 Haiku71.19%Mar 13, 24200k99.47%74.53%72.76%68.47%77.94%64.36%51.15%60.81%
Model Performance
Cost vs Performance

Compares total benchmark cost against overall score for Anthropic models. Quadrant lines are drawn at the median values.

1 low-scoring outlier hidden: Claude 3 Haiku (71.2%).

Cost Breakdown

Total benchmark cost per model, broken down by input, reasoning, and output tokens. Toggle between USD and token views.