xAI
Comparing 9 models from xAI.
| Model | Total ▼ | Released | Context | CoT | Tooling | Creative Writing | Language | Utility | Reasoning | Text Editing | Rule Following | Hallucination |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Grok 4.3 (Reasoning) | 93.60% | Apr 30, 26 | 2m | ✓ | 99.97% | 85.11% | 97.50% | 92.94% | 95.46% | 97.64% | 82.80% | 97.40% |
| Grok 4.20 (Beta, Reasoning) | 91.49% | Mar 12, 26 | 2m | ✓ | 100.00% | 84.50% | 99.08% | 95.41% | 82.64% | 98.69% | 75.31% | 96.28% |
| Grok 4.20 (Reasoning) | 91.39% | Mar 31, 26 | 2m | ✓ | 100.00% | 86.25% | 96.61% | 92.61% | 79.11% | 98.83% | 82.04% | 95.66% |
| Grok 4.1 Fast | 89.55% | Nov 19, 25 | 2m | ✓ | 100.00% | 82.14% | 88.76% | 84.12% | 93.58% | 97.87% | 70.87% | 99.02% |
| Grok 4 | 88.12% | Jul 9, 25 | 256k | ✓ | 99.99% | 77.34% | 90.61% | 89.67% | 96.01% | 98.76% | 63.09% | 89.45% |
| Grok 4 Fast | 86.15% | Sep 19, 25 | 2m | ✓ | 99.65% | 77.03% | 84.61% | 76.76% | 94.89% | 97.26% | 67.91% | 91.09% |
| Grok 4.20 (Beta) | 83.85% | Mar 12, 26 | 2m | – | 100.00% | 82.80% | 91.17% | 82.15% | 87.05% | 95.49% | 53.89% | 78.28% |
| Grok 4.20 | 81.70% | Mar 31, 26 | 2m | – | 94.00% | 83.44% | 78.86% | 84.11% | 84.81% | 95.63% | 59.71% | 73.01% |
| Grok 4.3 | 78.66% | Apr 30, 26 | 2m | – | 92.65% | 84.51% | 84.74% | 66.41% | 84.62% | 90.19% | 49.02% | 77.16% |
Model Performance
Cost vs Performance
Compares total benchmark cost against overall score for xAI models. Quadrant lines are drawn at the median values.
Cost Breakdown
Total benchmark cost per model, broken down by input, reasoning, and output tokens. Toggle between USD and token views.