DeepSeek

Comparing 9 models from DeepSeek.

Model Total ▼ Released Context CoTTooling Creative Writing Language Utility Reasoning Text Editing Rule Following Hallucination
DeepSeek V4 Pro (Reasoning)90.10%Apr 24, 261m99.28%82.99%88.51%93.24%94.61%98.56%72.74%90.88%
DeepSeek V4 Flash (Reasoning)89.01%Apr 24, 261m96.00%83.03%94.76%87.53%95.08%96.15%64.50%95.02%
DeepSeek-V2 Chat84.83%May 6, 24128k99.76%77.20%100.00%83.82%88.70%90.90%68.78%69.48%
DeepSeek V3 (2024-12-26)83.68%Dec 26, 24163.8k100.00%77.88%87.88%81.87%88.71%93.58%66.39%73.11%
DeepSeek V4 Pro82.63%Apr 24, 261m99.49%83.70%72.80%77.57%87.07%97.98%63.74%78.72%
DeepSeek V3.182.39%Aug 21, 25163.8k97.96%77.45%96.87%76.65%83.95%87.27%66.15%72.80%
DeepSeek V3.282.25%Dec 1, 25163.8k99.99%79.95%85.01%81.58%89.46%95.78%53.75%72.50%
DeepSeek V4 Flash82.02%Apr 24, 261m95.80%83.42%88.50%83.26%79.16%93.25%57.32%75.47%
DeepSeek V3 (2025-03-24)81.99%Mar 24, 25163.8k93.53%82.34%86.42%80.62%88.45%89.57%67.94%67.07%
Model Performance
Cost vs Performance

Compares total benchmark cost against overall score for DeepSeek models. Quadrant lines are drawn at the median values.

Cost Breakdown

Total benchmark cost per model, broken down by input, reasoning, and output tokens. Toggle between USD and token views.