Vendors
Model creators/vendors and how their models compare across the benchmark.
| Vendor | Models | Avg Score | Best Score ▼ | Best Model |
|---|---|---|---|---|
| Anthropic | 13 | 87.23% | 95.02% | Claude Opus 4.6 (Reasoning) |
| 13 | 84.01% | 94.37% | Gemini 3.1 Pro (Preview) | |
| OpenAI | 19 | 85.95% | 93.24% | GPT-5.4 (Reasoning) |
| Qwen | 8 | 86.99% | 91.73% | Qwen 3.5 397B A17B |
| Z.AI | 5 | 88.02% | 91.23% | Z.AI GLM 5 |
| MoonshotAI | 1 | 91.04% | 91.04% | MoonshotAI: Kimi K2.5 |
| bytedance-seed | 4 | 83.92% | 90.70% | ByteDance Seed 1.6 |
| xAI | 3 | 87.94% | 89.55% | Grok 4.1 Fast |
| aion-labs | 1 | 89.21% | 89.21% | Aion 2.0 |
| minimax | 1 | 88.71% | 88.71% | Minimax M2.5 |
| openrouter | 3 | 85.68% | 87.33% | Stealth: Hunter Alpha |
| Mistral AI | 12 | 73.37% | 85.43% | Mistral Large 3 |
| DeepSeek | 5 | 83.03% | 84.83% | DeepSeek-V2 Chat |
| NVIDIA | 3 | 79.02% | 84.56% | Nemotron 3 Super |
| inception | 2 | 81.67% | 83.85% | Inception Mercury 2 |
| Nous Research | 2 | 77.71% | 82.86% | Hermes 3 405B |
| Writer | 1 | 79.57% | 79.57% | Writer: Palmyra X5 |
| Meta | 2 | 70.88% | 78.40% | Llama 3.1 70B |
| arcee-ai | 2 | 72.12% | 73.33% | Arcee AI: Trinity Large (Preview) |
| Microsoft | 1 | 71.07% | 71.07% | WizardLM 2 8x22b |
| Cohere | 1 | 69.03% | 69.03% | Cohere Command R+ (Aug. 2024) |
| Liquid AI | 1 | 58.77% | 58.77% | LFM2 24B |
| TheDrummer | 1 | 54.55% | 54.55% | Rocinante 12B |