Vendors
Model creators/vendors and how their models compare across the benchmark.
| Vendor | Models | Avg Score | Best Score ▼ | Best Model |
|---|---|---|---|---|
| MoonshotAI | 1 | 88.77% | 88.77% | MoonshotAI: Kimi K2.5 |
| Anthropic | 10 | 77.22% | 88.14% | Claude Opus 4.6 |
| OpenAI | 11 | 74.24% | 86.94% | o4 Mini High |
| 7 | 68.94% | 85.69% | Gemini 3 Pro (Preview) | |
| Z.AI | 5 | 80.58% | 85.57% | Z.AI GLM 4.7 |
| Meta | 8 | 62.10% | 75.12% | Llama 3.1 405B |
| DeepSeek | 1 | 71.45% | 71.45% | DeepSeek-V2 Chat |
| NVIDIA | 1 | 69.72% | 69.72% | Llama 3.1 Nemotron 70B |
| Envoid | 1 | 68.29% | 68.29% | Llama 3 TenyxChat-DaybreakStorywriter 70B |
| Writer | 1 | 66.98% | 66.98% | Writer: Palmyra X5 |
| Mistral AI | 6 | 54.91% | 65.49% | Mistral Large 2 |
| Nous Research | 2 | 62.21% | 64.82% | Hermes 3 405B |
| Qwen | 1 | 62.93% | 62.93% | Qwen 2.5 72B |
| Sao10K | 3 | 55.69% | 60.77% | Sao10K L3.1 70B Hanami x1 |
| Microsoft | 4 | 50.55% | 55.49% | Phi-3.5 Mini 128k |
| Inflection AI | 2 | 51.62% | 54.05% | Inflection 3 (Productivity) |
| Cohere | 1 | 50.95% | 50.95% | Cohere Command R+ (Aug. 2024) |
| Alpindale | 1 | 49.62% | 49.62% | Goliath 120B |
| TheDrummer | 1 | 47.32% | 47.32% | Rocinante 12B |
| NeverSleep | 1 | 44.80% | 44.80% | Lumimaid v0.2 8B |
| DavidAU | 1 | 38.49% | 38.49% | MN GRAND Gutenberg Lyra4 12B Madness |
| Gryphe | 1 | 35.63% | 35.63% | MythoMax 13B |