xAI

Comparing 5 models from xAI.

Model Total ▼ Released Context CoTTooling Creative Writing Language Utility Reasoning Text Editing Rule Following Hallucination
Grok 4.20 (Beta, Reasoning)91.49%Mar 12, 262m100.00%84.50%99.08%95.41%82.64%98.69%75.31%96.28%
Grok 4.1 Fast89.55%Nov 19, 252m100.00%82.14%88.76%84.12%93.58%97.87%70.87%99.02%
Grok 488.12%Jul 9, 25256k99.99%77.34%90.61%89.67%96.01%98.76%63.09%89.45%
Grok 4 Fast86.15%Sep 19, 252m99.65%77.03%84.61%76.76%94.89%97.26%67.91%91.09%
Grok 4.20 (Beta)83.85%Mar 12, 262m100.00%82.80%91.17%82.15%87.05%95.49%53.89%78.28%
Model Performance
Cost vs Performance

Compares total benchmark cost against overall score for xAI models. Quadrant lines are drawn at the median values.

Cost Breakdown

Total benchmark cost per model, broken down by input, reasoning, and output tokens. Toggle between USD and token views.