xAI

Comparing 4 models from xAI.

Model	Total ▼	Released	Context	CoT	Tooling	Creative Writing	Language	Utility	Reasoning	Text Editing	Rule Following	Hallucination
Grok 4.3 (Reasoning)	90.99%	Apr 30, 26	1m	✓	98.47%	85.11%	97.50%	92.94%	75.64%	97.64%	82.80%	97.86%
Grok 4.20 (Reasoning)	90.87%	Mar 31, 26	2m	✓	100.00%	86.25%	96.61%	92.61%	74.28%	98.83%	82.04%	96.30%
Grok 4.20	81.21%	Mar 31, 26	2m	–	95.00%	83.44%	78.86%	84.11%	72.46%	95.63%	59.71%	80.48%
Grok 4.3	78.00%	Apr 30, 26	1m	–	91.21%	84.51%	84.74%	66.41%	70.17%	90.19%	49.02%	87.79%

Model Performance

Cost vs Performance

Compares total benchmark cost against overall score for xAI models. Quadrant lines are drawn at the median values.

Cost Breakdown

Total benchmark cost per model, broken down by input, reasoning, and output tokens. Toggle between USD and token views.

USD Tokens