Mistral AI

Comparing 12 models from Mistral AI.

Model	Total ▼	Released	Context	CoT	Tooling	Creative Writing	Language	Utility	Reasoning	Text Editing	Rule Following	Hallucination
Mistral Large 3	84.29%	Dec 1, 25	262k	–	97.22%	81.21%	92.02%	84.91%	75.26%	94.09%	64.41%	85.18%
Mistral Large 2	81.50%	Jul 24, 24	128k	–	97.31%	81.86%	85.22%	69.19%	75.76%	94.16%	63.05%	85.45%
Mistral Small 4 (Reasoning)	79.48%	Mar 16, 26	265k	✓	97.28%	81.67%	60.53%	85.61%	66.73%	90.58%	60.28%	93.18%
Mistral Small 3.2 24B	77.36%	Jun 20, 25	131k	–	96.82%	71.87%	72.77%	73.17%	63.87%	89.48%	64.08%	86.84%
Mistral Medium 3.1	76.08%	Aug 13, 25	131k	–	94.17%	81.70%	49.50%	80.13%	71.84%	93.77%	48.60%	88.93%
Mistral Small 4	75.23%	Mar 16, 26	265k	–	93.85%	81.12%	51.96%	78.28%	61.67%	91.00%	62.17%	81.83%
Ministral 3 14B	70.45%	Dec 2, 25	262k	–	87.84%	79.11%	30.00%	79.03%	67.43%	86.20%	50.83%	83.19%
Ministral 3 8B	69.98%	Dec 2, 25	262k	–	95.77%	77.26%	48.96%	74.43%	59.53%	78.52%	31.34%	94.02%
Ministral 3 3B	65.02%	Dec 2, 25	131k	–	89.41%	75.45%	68.10%	72.38%	54.06%	69.80%	15.87%	75.10%
Mistral NeMO	63.80%	Jul 18, 24	128k	–	79.34%	76.72%	80.80%	51.55%	44.87%	73.69%	34.11%	69.32%
Ministral 8B	63.77%	Oct 16, 24	128k	–	84.65%	76.87%	53.91%	46.82%	63.20%	77.52%	15.27%	91.89%
Ministral 3B	59.25%	Oct 16, 24	128k	–	84.70%	75.49%	42.25%	49.17%	53.21%	70.91%	24.45%	73.79%

Model Performance

Cost vs Performance

Compares total benchmark cost against overall score for Mistral AI models. Quadrant lines are drawn at the median values.

Cost Breakdown

Total benchmark cost per model, broken down by input, reasoning, and output tokens. Toggle between USD and token views.

USD Tokens