Qwen
Comparing 16 models from Qwen.
| Model | Total ▼ | Released | Context | CoT | Tooling | Creative Writing | Language | Utility | Reasoning | Text Editing | Rule Following | Hallucination |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.7 Max | 94.55% | May 21, 26 | 1m | ✓ | 100.00% | 85.39% | 97.05% | 99.54% | 83.47% | 98.08% | 95.76% | 97.15% |
| Qwen3.6 Max Preview | 93.72% | Apr 27, 26 | 262.1k | ✓ | 100.00% | 88.42% | 100.00% | 98.34% | 85.79% | 98.58% | 82.79% | 95.86% |
| Qwen 3.5 397B A17B | 91.09% | Feb 15, 26 | 128k | ✓ | 99.81% | 86.93% | 95.01% | 97.50% | 81.97% | 98.05% | 79.39% | 90.04% |
| Qwen 3.5 122B | 90.32% | Feb 25, 26 | 262k | ✓ | 99.33% | 83.02% | 95.01% | 96.36% | 79.24% | 96.31% | 80.00% | 93.29% |
| Qwen 3.5 27B | 90.05% | Feb 25, 26 | 262k | ✓ | 99.17% | 82.54% | 95.52% | 95.67% | 79.44% | 98.69% | 76.04% | 93.29% |
| Qwen 3.5 Plus (2026-04-20) | 89.79% | Apr 20, 26 | 1m | – | 95.33% | 85.18% | 97.14% | 96.42% | 80.60% | 97.70% | 67.53% | 98.38% |
| Qwen 3.6 Flash | 89.31% | Apr 27, 26 | 1m | ✓ | 99.78% | 86.02% | 89.33% | 96.09% | 79.51% | 96.09% | 71.50% | 96.13% |
| Qwen 3.6 27B | 88.33% | Apr 27, 26 | 262.1k | ✓ | 97.91% | 82.81% | 89.01% | 94.32% | 79.84% | 93.97% | 71.37% | 97.42% |
| Qwen 3.6 35B | 87.66% | Apr 27, 26 | 262.1k | ✓ | 82.67% | 85.97% | 93.56% | 96.20% | 76.37% | 95.10% | 77.34% | 94.07% |
| Qwen 3.5 35B | 87.01% | Feb 25, 26 | 262k | ✓ | 94.74% | 83.51% | 91.95% | 96.42% | 77.89% | 94.95% | 67.42% | 89.24% |
| Qwen 3.5 Plus (2026-02-15) | 86.17% | Feb 15, 26 | 1m | – | 99.78% | 77.07% | 95.10% | 86.65% | 81.81% | 98.10% | 64.21% | 86.62% |
| Qwen 3.5 Flash | 85.66% | Feb 25, 26 | 1m | ✓ | 89.39% | 83.81% | 91.94% | 96.11% | 79.34% | 92.80% | 63.19% | 88.70% |
| Qwen 3.5 9B | 84.05% | Mar 10, 26 | 262k | ✓ | 96.84% | 84.35% | 88.18% | 94.02% | 70.90% | 85.35% | 60.98% | 91.81% |
| Qwen 3 32B | 79.37% | Apr 28, 25 | 41k | – | 95.19% | 81.30% | 84.61% | 81.66% | 66.35% | 89.95% | 46.83% | 89.06% |
| Qwen3 235B A22B Instruct 2507 | 78.07% | Jul 21, 25 | 262.1k | – | 93.86% | 84.81% | 60.83% | 83.15% | 68.43% | 91.75% | 65.42% | 76.34% |
| Qwen 2.5 72B | 73.17% | Sep 19, 24 | 131.1k | – | 96.82% | 75.16% | 68.95% | 76.43% | 61.71% | 89.18% | 31.55% | 85.54% |
Model Performance
Cost vs Performance
Compares total benchmark cost against overall score for Qwen models. Quadrant lines are drawn at the median values.
2 low-scoring outliers hidden: Qwen 2.5 72B (73.2%), Qwen3 235B A22B Instruct 2507 (78.1%).
Cost Breakdown
Total benchmark cost per model, broken down by input, reasoning, and output tokens. Toggle between USD and token views.