Vendors
Model creators/vendors and how their models compare across the benchmark.
| Vendor | Models | Avg Score | Best Score ▼ | Best Model |
|---|---|---|---|---|
| Anthropic | 13 | 87.23% | 95.02% | Claude Opus 4.6 (Reasoning) |
| 13 | 84.01% | 94.37% | Gemini 3.1 Pro (Preview) | |
| Z.AI | 6 | 89.06% | 94.27% | Z.AI GLM 5 Turbo |
| OpenAI | 25 | 85.08% | 93.24% | GPT-5.4 (Reasoning) |
| Qwen | 10 | 85.83% | 91.73% | Qwen 3.5 397B A17B |
| xAI | 5 | 87.83% | 91.49% | Grok 4.20 (Beta, Reasoning) |
| MoonshotAI | 1 | 91.04% | 91.04% | MoonshotAI: Kimi K2.5 |
| bytedance-seed | 4 | 83.92% | 90.70% | ByteDance Seed 1.6 |
| aion-labs | 1 | 89.21% | 89.21% | Aion 2.0 |
| minimax | 2 | 88.90% | 89.10% | MiniMax M2.7 |
| openrouter | 3 | 85.69% | 87.34% | Stealth: Hunter Alpha |
| Mistral AI | 14 | 74.23% | 85.43% | Mistral Large 3 |
| DeepSeek | 5 | 83.03% | 84.83% | DeepSeek-V2 Chat |
| NVIDIA | 3 | 79.00% | 84.56% | Nemotron 3 Super |
| inception | 2 | 81.67% | 83.85% | Inception Mercury 2 |
| Nous Research | 2 | 77.71% | 82.86% | Hermes 3 405B |
| Writer | 1 | 79.57% | 79.57% | Writer: Palmyra X5 |
| Meta | 2 | 70.88% | 78.40% | Llama 3.1 70B |
| arcee-ai | 2 | 72.12% | 73.33% | Arcee AI: Trinity Large (Preview) |
| Microsoft | 1 | 71.07% | 71.07% | WizardLM 2 8x22b |
| Cohere | 1 | 69.03% | 69.03% | Cohere Command R+ (Aug. 2024) |
| Liquid AI | 1 | 58.77% | 58.77% | LFM2 24B |
| TheDrummer | 1 | 54.55% | 54.55% | Rocinante 12B |