OpenAI - Vendors - NC Bench

OpenAI

Comparing 11 models from OpenAI.

Model	Total ▼	Released	Context	Size	Creative writing	Rule following	Utility	Mathematics	Tooling	Language	Logic
o4 Mini High	86.94%	Apr 16, 25	200k	–	78.16%	92.07%	85.67%	100.00%	90.77%	77.40%	87.50%
o4 Mini	85.85%	Apr 16, 25	200k	–	66.30%	87.60%	87.00%	100.00%	92.31%	79.31%	88.75%
GPT-4.1	81.18%	Apr 14, 25	1m	–	41.20%	74.44%	89.33%	100.00%	87.69%	89.65%	82.50%
GPT-4o, Aug. 6th (temp=0)	74.15%	Aug 6, 24	128k	–	56.94%	75.15%	78.67%	100.00%	65.38%	73.40%	81.25%
GPT-4o, May 13th (temp=0)	73.49%	May 13, 24	128k	–	38.10%	68.91%	79.67%	100.00%	66.15%	92.86%	81.25%
GPT-4o, May 13th (temp=1)	72.74%	May 13, 24	128k	–	36.93%	69.46%	79.17%	100.00%	67.31%	84.83%	81.25%
GPT-4o, Aug. 6th (temp=1)	71.70%	Aug 6, 24	128k	–	49.45%	73.25%	75.00%	100.00%	60.77%	78.17%	81.25%
GPT-4.1 Mini	70.71%	Apr 14, 25	1m	–	33.96%	67.76%	75.56%	100.00%	64.36%	87.26%	81.25%
GPT-4o Mini (temp=0)	69.33%	Jul 18, 24	128k	–	53.82%	76.54%	68.50%	100.00%	46.54%	73.52%	93.75%
GPT-4o Mini (temp=1)	68.89%	Jul 18, 24	128k	–	43.05%	74.97%	69.17%	100.00%	49.62%	77.02%	91.25%
GPT-4.1 Nano	61.69%	Apr 14, 25	1m	–	30.15%	64.20%	60.50%	100.00%	47.69%	77.34%	86.88%

Model Performance

Cost vs Performance

Compares total benchmark cost against overall score for OpenAI models. Quadrant lines are drawn at the median values.