OpenAI

Comparing 25 models from OpenAI.

Model Total ▼ Released Context CoTTooling Creative Writing Language Utility Reasoning Text Editing Rule Following Hallucination
GPT-5.4 (Reasoning)93.24%Mar 5, 261m100.00%91.17%94.90%96.89%94.78%98.42%79.29%90.43%
GPT-5 Mini92.62%Aug 7, 25400k99.99%80.48%96.49%98.39%94.36%97.13%76.44%97.71%
GPT-5.192.54%Nov 13, 25400k98.00%87.20%93.64%95.33%95.14%98.54%74.05%98.44%
GPT-591.93%Aug 7, 25400k100.00%86.87%91.50%93.53%95.67%98.90%77.13%91.84%
GPT-5.4 (Reasoning, Low)91.41%Mar 5, 261m99.96%90.51%90.79%95.32%94.34%98.01%70.02%92.29%
GPT-5.4 Mini (Reasoning)90.65%Mar 17, 26400k100.00%88.66%98.12%94.44%94.56%95.78%57.38%96.22%
o4 Mini High90.29%Apr 16, 25200k100.00%82.72%79.76%98.67%95.02%94.36%72.70%99.06%
GPT-5.290.26%Dec 10, 25400k100.00%80.36%91.19%96.22%94.54%97.54%67.10%95.09%
GPT-4.188.68%Apr 14, 251m98.86%81.24%93.91%90.57%88.46%94.40%66.78%95.24%
o4 Mini88.35%Apr 16, 25200k100.00%82.04%80.00%96.31%94.45%90.61%64.61%98.75%
GPT-5.4 Mini (Reasoning, Low)85.75%Mar 17, 26400k100.00%87.72%92.45%88.49%92.28%92.63%33.99%98.43%
GPT-4o, May 13th (temp=0)85.36%May 13, 24128k99.22%74.89%98.72%83.13%88.58%95.35%73.24%69.76%
GPT-5.484.32%Mar 5, 261m97.65%90.94%81.49%81.95%93.92%96.73%58.11%73.78%
GPT-4o, May 13th (temp=1)83.80%May 13, 24128k99.68%75.88%92.52%80.69%85.98%92.41%69.88%73.40%
GPT-4.1 Mini83.20%Apr 14, 251m97.92%74.52%89.64%82.30%85.83%95.62%58.59%81.14%
GPT-4o, Aug. 6th (temp=1)82.62%Aug 6, 24128k99.73%75.50%82.21%82.44%86.91%86.72%67.91%79.53%
GPT-5 Nano82.60%Aug 7, 25400k99.21%67.04%77.18%93.91%89.61%82.74%57.57%93.53%
GPT-4o, Aug. 6th (temp=0)82.45%Aug 6, 24128k99.95%73.65%75.00%82.11%87.59%93.77%74.19%73.35%
GPT-5.4 Mini82.43%Mar 17, 26400k99.86%88.10%88.75%79.37%88.04%90.60%46.32%78.40%
GPT-5.4 Nano (Reasoning)81.36%Mar 17, 26400k95.71%80.97%83.99%93.34%88.48%83.32%27.15%97.95%
GPT-5.4 Nano (Reasoning, Low)79.48%Mar 17, 26400k89.75%80.93%81.87%91.42%78.93%82.23%31.65%99.03%
GPT-4o Mini (temp=1)79.08%Jul 18, 24128k99.26%74.37%77.50%82.16%80.28%85.78%56.50%76.78%
GPT-4o Mini (temp=0)78.29%Jul 18, 24128k98.54%73.10%75.00%81.43%81.26%84.62%58.84%73.56%
GPT-5.4 Nano74.40%Mar 17, 26400k90.22%80.50%80.82%78.57%75.66%79.22%20.94%89.29%
GPT-4.1 Nano71.94%Apr 14, 251m81.37%71.81%78.95%68.45%70.24%76.06%40.88%87.73%
Model Performance
Cost vs Performance

Compares total benchmark cost against overall score for OpenAI models. Quadrant lines are drawn at the median values.

Cost Breakdown

Total benchmark cost per model, broken down by input, reasoning, and output tokens. Toggle between USD and token views.