Vendors
Model creators/vendors and how their models compare across the benchmark.
| Vendor | Models | Avg Score | Best Score ▼ | Best Model |
|---|---|---|---|---|
| Anthropic | 16 | 90.10% | 95.06% | Claude Opus 4.6 (Reasoning) |
| Qwen | 16 | 86.77% | 94.55% | Qwen3.7 Max |
| 21 | 84.62% | 94.08% | Gemini 3.1 Pro (Preview) | |
| OpenAI | 27 | 84.73% | 93.85% | GPT-5.4 (Reasoning) |
| Z.AI | 9 | 88.14% | 93.74% | Z.AI GLM 5.1 |
| MoonshotAI | 2 | 91.71% | 92.57% | MoonshotAI: Kimi K2.6 |
| xAI | 4 | 85.27% | 90.99% | Grok 4.3 (Reasoning) |
| minimax | 3 | 87.80% | 90.45% | MiniMax M3 |
| bytedance-seed | 4 | 82.62% | 89.59% | ByteDance Seed 1.6 |
| DeepSeek | 9 | 83.63% | 89.28% | DeepSeek V4 Pro (Reasoning) |
| aion-labs | 1 | 86.66% | 86.66% | Aion 2.0 |
| xiaomi | 2 | 85.00% | 86.05% | Xiaomi MIMO v2.5 Pro |
| Mistral AI | 12 | 72.18% | 84.29% | Mistral Large 3 |
| inception | 1 | 81.99% | 81.99% | Inception Mercury 2 |
| NVIDIA | 3 | 76.97% | 81.69% | Nemotron 3 Super |
| Nous Research | 2 | 75.27% | 80.80% | Hermes 3 405B |
| Writer | 1 | 78.11% | 78.11% | Writer: Palmyra X5 |
| Meta | 1 | 77.41% | 77.41% | Llama 3.1 70B |
| TheDrummer | 1 | 72.68% | 72.68% | Cydonia 24B V4.1 |
| Microsoft | 1 | 71.45% | 71.45% | WizardLM 2 8x22b |
| arcee-ai | 1 | 67.68% | 67.68% | Arcee AI: Trinity Mini |
| Cohere | 1 | 67.04% | 67.04% | Cohere Command R+ (Aug. 2024) |