DeepSeek

Comparing 9 models from DeepSeek.

Model	Total ▼	Released	Context	CoT	Tooling	Creative Writing	Language	Utility	Reasoning	Text Editing	Rule Following	Hallucination
DeepSeek V4 Pro (Reasoning)	89.28%	Apr 24, 26	1m	✓	99.40%	82.99%	88.51%	93.24%	83.52%	98.56%	72.74%	95.25%
DeepSeek V4 Flash (Reasoning)	88.06%	Apr 24, 26	1m	✓	96.17%	83.03%	94.76%	87.53%	85.45%	96.15%	64.50%	96.92%
DeepSeek-V2 Chat	84.09%	May 6, 24	128k	–	98.80%	77.20%	100.00%	83.82%	70.75%	90.90%	68.78%	82.48%
DeepSeek V3 (2024-12-26)	82.62%	Dec 26, 24	163.8k	–	98.58%	77.88%	87.88%	81.87%	69.12%	93.58%	66.39%	85.69%
DeepSeek V3.1	82.35%	Aug 21, 25	163.8k	–	96.80%	77.45%	96.87%	76.65%	72.08%	87.27%	66.15%	85.55%
DeepSeek V3.2	82.22%	Dec 1, 25	163.8k	✓	99.99%	79.95%	85.01%	81.58%	75.48%	95.78%	53.75%	86.25%
DeepSeek V4 Pro	82.05%	Apr 24, 26	1m	–	99.57%	83.70%	72.80%	77.57%	71.72%	97.98%	63.74%	89.36%
DeepSeek V4 Flash	82.02%	Apr 24, 26	1m	–	94.00%	83.42%	88.50%	83.26%	68.68%	93.25%	57.32%	87.73%
DeepSeek V3 (2025-03-24)	79.93%	Mar 24, 25	163.8k	–	86.36%	82.34%	86.42%	80.62%	69.32%	89.57%	67.94%	76.88%

Model Performance

Cost vs Performance

Compares total benchmark cost against overall score for DeepSeek models. Quadrant lines are drawn at the median values.

Cost Breakdown

Total benchmark cost per model, broken down by input, reasoning, and output tokens. Toggle between USD and token views.

USD Tokens