OpenAI
Comparing 11 models from OpenAI.
| Model | Total ▼ | Released | Context | Size | Creative writing | Rule following | Utility | Mathematics | Tooling | Language | Logic |
|---|---|---|---|---|---|---|---|---|---|---|---|
| o4 Mini High | 86.94% | Apr 16, 25 | 200k | – | 78.16% | 92.07% | 85.67% | 100.00% | 90.77% | 77.40% | 87.50% |
| o4 Mini | 85.85% | Apr 16, 25 | 200k | – | 66.30% | 87.60% | 87.00% | 100.00% | 92.31% | 79.31% | 88.75% |
| GPT-4.1 | 81.18% | Apr 14, 25 | 1m | – | 41.20% | 74.44% | 89.33% | 100.00% | 87.69% | 89.65% | 82.50% |
| GPT-4o, Aug. 6th (temp=0) | 74.15% | Aug 6, 24 | 128k | – | 56.94% | 75.15% | 78.67% | 100.00% | 65.38% | 73.40% | 81.25% |
| GPT-4o, May 13th (temp=0) | 73.49% | May 13, 24 | 128k | – | 38.10% | 68.91% | 79.67% | 100.00% | 66.15% | 92.86% | 81.25% |
| GPT-4o, May 13th (temp=1) | 72.74% | May 13, 24 | 128k | – | 36.93% | 69.46% | 79.17% | 100.00% | 67.31% | 84.83% | 81.25% |
| GPT-4o, Aug. 6th (temp=1) | 71.70% | Aug 6, 24 | 128k | – | 49.45% | 73.25% | 75.00% | 100.00% | 60.77% | 78.17% | 81.25% |
| GPT-4.1 Mini | 70.71% | Apr 14, 25 | 1m | – | 33.96% | 67.76% | 75.56% | 100.00% | 64.36% | 87.26% | 81.25% |
| GPT-4o Mini (temp=0) | 69.33% | Jul 18, 24 | 128k | – | 53.82% | 76.54% | 68.50% | 100.00% | 46.54% | 73.52% | 93.75% |
| GPT-4o Mini (temp=1) | 68.89% | Jul 18, 24 | 128k | – | 43.05% | 74.97% | 69.17% | 100.00% | 49.62% | 77.02% | 91.25% |
| GPT-4.1 Nano | 61.69% | Apr 14, 25 | 1m | – | 30.15% | 64.20% | 60.50% | 100.00% | 47.69% | 77.34% | 86.88% |
Model Performance
Cost vs Performance
Compares total benchmark cost against overall score for OpenAI models. Quadrant lines are drawn at the median values.