OpenAI

Comparing 11 models from OpenAI.

Model Total ▼ Released Context SizeCreative writingRule followingUtilityMathematicsToolingLanguageLogic
o4 Mini High86.94%Apr 16, 25200k78.16%92.07%85.67%100.00%90.77%77.40%87.50%
o4 Mini85.85%Apr 16, 25200k66.30%87.60%87.00%100.00%92.31%79.31%88.75%
GPT-4.181.18%Apr 14, 251m41.20%74.44%89.33%100.00%87.69%89.65%82.50%
GPT-4o, Aug. 6th (temp=0)74.15%Aug 6, 24128k56.94%75.15%78.67%100.00%65.38%73.40%81.25%
GPT-4o, May 13th (temp=0)73.49%May 13, 24128k38.10%68.91%79.67%100.00%66.15%92.86%81.25%
GPT-4o, May 13th (temp=1)72.74%May 13, 24128k36.93%69.46%79.17%100.00%67.31%84.83%81.25%
GPT-4o, Aug. 6th (temp=1)71.70%Aug 6, 24128k49.45%73.25%75.00%100.00%60.77%78.17%81.25%
GPT-4.1 Mini70.71%Apr 14, 251m33.96%67.76%75.56%100.00%64.36%87.26%81.25%
GPT-4o Mini (temp=0)69.33%Jul 18, 24128k53.82%76.54%68.50%100.00%46.54%73.52%93.75%
GPT-4o Mini (temp=1)68.89%Jul 18, 24128k43.05%74.97%69.17%100.00%49.62%77.02%91.25%
GPT-4.1 Nano61.69%Apr 14, 251m30.15%64.20%60.50%100.00%47.69%77.34%86.88%
Model Performance
Cost vs Performance

Compares total benchmark cost against overall score for OpenAI models. Quadrant lines are drawn at the median values.