Anthropic
Comparing 16 models from Anthropic.
| Model | Total ▼ | Released | Context | CoT | Tooling | Creative Writing | Language | Utility | Reasoning | Text Editing | Rule Following | Hallucination |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Claude Opus 4.6 (Reasoning) | 95.06% | Feb 4, 26 | 1m | ✓ | 100.00% | 84.55% | 96.12% | 98.93% | 93.19% | 98.86% | 89.78% | 99.06% |
| Claude Sonnet 4.6 (Reasoning) | 93.64% | Feb 17, 26 | 1m | ✓ | 100.00% | 83.09% | 97.58% | 97.88% | 89.61% | 98.30% | 85.73% | 96.98% |
| Claude Opus 4.7 (Reasoning) | 92.53% | Apr 16, 26 | 1m | ✓ | 100.00% | 84.73% | 98.77% | 97.87% | 88.52% | 97.58% | 74.04% | 98.69% |
| Claude Opus 4.8 (Reasoning) | 92.33% | May 27, 26 | 1m | ✓ | 99.53% | 85.25% | 96.38% | 99.26% | 91.76% | 98.78% | 70.27% | 97.41% |
| Claude Opus 4.6 | 92.31% | Feb 4, 26 | 1m | – | 100.00% | 83.59% | 96.13% | 90.72% | 89.75% | 98.35% | 83.11% | 96.80% |
| Claude Opus 4.8 (Reasoning, Low) | 91.89% | May 27, 26 | 1m | ✓ | 99.48% | 85.86% | 96.31% | 98.00% | 88.83% | 98.71% | 70.56% | 97.33% |
| Claude Sonnet 4.6 | 90.66% | Feb 17, 26 | 1m | – | 100.00% | 83.31% | 100.00% | 88.52% | 79.62% | 96.37% | 82.50% | 94.99% |
| Claude Sonnet 5 (Reasoning) | 90.40% | Jun 30, 26 | 1m | ✓ | 99.17% | 81.31% | 99.04% | 92.87% | 87.77% | 97.89% | 68.34% | 96.77% |
| Claude Sonnet 5 (Reasoning, Low) | 90.16% | Jun 30, 26 | 1m | ✓ | 99.52% | 81.43% | 98.06% | 93.53% | 87.78% | 97.93% | 66.62% | 96.44% |
| Claude Opus 4.7 | 89.90% | Apr 16, 26 | 1m | – | 99.48% | 84.74% | 92.32% | 95.77% | 88.29% | 97.55% | 68.08% | 92.95% |
| Claude Opus 4.5 | 89.60% | Nov 24, 25 | 200k | – | 100.00% | 81.71% | 99.66% | 89.84% | 84.25% | 97.69% | 72.61% | 91.03% |
| Claude Sonnet 4 | 87.64% | May 22, 25 | 200k | – | 100.00% | 79.21% | 91.31% | 84.02% | 75.86% | 99.13% | 81.52% | 90.04% |
| Claude Sonnet 4.5 | 87.54% | Sep 29, 25 | 1m | – | 97.75% | 84.19% | 92.39% | 83.78% | 78.61% | 99.02% | 76.80% | 87.78% |
| Claude Sonnet 5 | 87.34% | Jun 30, 26 | 1m | – | 95.99% | 82.58% | 95.50% | 88.57% | 75.35% | 97.65% | 74.36% | 88.74% |
| Claude Opus 4 | 87.22% | May 22, 25 | 200k | – | 100.00% | 83.79% | 93.01% | 88.81% | 76.72% | 97.25% | 70.37% | 87.84% |
| Claude Haiku 4.5 | 83.36% | Oct 15, 25 | 200k | – | 96.75% | 78.96% | 91.84% | 72.48% | 67.77% | 96.81% | 70.35% | 91.93% |
Model Performance
Cost vs Performance
Compares total benchmark cost against overall score for Anthropic models. Quadrant lines are drawn at the median values.
Cost Breakdown
Total benchmark cost per model, broken down by input, reasoning, and output tokens. Toggle between USD and token views.