Language Comprehension

Does the model understand more than just English?

Price-Performance Score Distribution (Top 20)

Click a model name to view its detail page.

ScoreCostTime
Mistral NeMO95%$0.0000625ms
Inception Mercury65%$0.0000562ms
Ministral 3 3B100%$0.0000833ms
Gemini 2.5 Flash Lite80%$0.0001648ms
Ministral 8B55%$0.0000540ms
GPT-5.4 Mini80%$0.0002747ms
GPT-4o Mini (temp=1)55%$0.0000942ms
Gemini 3.1 Flash Lite (Preview)95%$0.0001975ms
GPT-5.4 Nano70%$0.0001990ms
Inception Mercury 275%$0.0003771ms
Gemini 2.5 Flash75%$0.0002817ms
Stealth: Aurora Alpha85%—1.3s
Grok 4.20 (Beta)85%$0.0005726ms
Mistral Small 455%$0.00011.1s
Gemini 3 Flash (Preview)90%$0.00041.5s
Claude 3 Haiku65%$0.00011.3s
Gemma 3 4B70%$0.00001.7s
Mistral Small 3.2 24B75%$0.00002.4s
Llama 3.1 8B55%$0.00001.3s
GPT-4.1 Nano65%$0.00001.7s
0.500.600.700.800.901.00

Cost vs Performance

Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.

1 low-scoring outlier hidden: Ministral 3B (25.0%).

Most Stable Models (Top 20)

Ranked by stability (median × consistency). Click a model name to view its detail page.

ScoreConsistencyStability
Claude Opus 4.6 (Reasoning)100%100%100%
Z.AI GLM 5 Turbo100%100%100%
Claude Sonnet 4.6 (Reasoning)100%100%100%
Claude Opus 4.6100%100%100%
Qwen 3.5 397B A17B100%100%100%
Qwen 3.5 122B100%100%100%
Grok 4.20 (Beta, Reasoning)100%100%100%
Claude Sonnet 4.6100%100%100%
MoonshotAI: Kimi K2.5100%100%100%
Qwen 3.5 27B100%100%100%
ByteDance Seed 1.6100%100%100%
GPT-5.4 Mini (Reasoning)100%100%100%
Claude Opus 4.5100%100%100%
Aion 2.0100%100%100%
Z.AI GLM 4.6100%100%100%
MiniMax M2.7100%100%100%
Qwen 3.5 35B100%100%100%
Claude Opus 4100%100%100%
Qwen 3.5 Plus (2026-02-15)100%100%100%
Mistral Large 3100%100%100%
100%

Top Overall Models (Top 20)

Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.

ScoreCostSpeedStability
Ministral 3 3B100%$0.0000833ms100%
Mistral Large 3100%$0.00023.7s100%
DeepSeek V3 (2024-12-26)100%$0.00024.8s100%
GPT-4o, May 13th (temp=0)100%$0.00162.4s100%
Qwen 3.5 Plus (2026-02-15)100%$0.00034.8s100%
Mistral Large 2100%$0.00093.8s100%
DeepSeek-V2 Chat100%$0.00006.6s100%
Claude Sonnet 4.6100%$0.00213.1s100%
GPT-5.4 Mini (Reasoning)100%$0.00124.8s100%
DeepSeek V3 (2025-03-24)100%$0.00017.5s100%
Claude Opus 4.5100%$0.00353.7s100%
Claude Opus 4.6100%$0.00374.3s100%
Hermes 3 405B100%$0.000012.1s100%
Z.AI GLM 5 Turbo100%$0.00309.5s100%
Aion 2.0100%$0.001115.9s100%
ByteDance Seed 1.6100%$0.001315.8s100%
Claude Sonnet 4.6 (Reasoning)100%$0.00618.3s100%
Claude Opus 4.6 (Reasoning)100%$0.00857.3s100%
Qwen 3.5 122B100%$0.005313.6s100%
Grok 4.20 (Beta, Reasoning)100%$0.0117.1s100%
100%
Model Total â–¼Friend got new kittens (Tagalog)Friend got new kittens (German)Asking for directions (German)Asking for directions (Dutch)
Claude Opus 4.6 (Reasoning)100%100%100%100%100%
Z.AI GLM 5 Turbo100%100%100%100%100%
Claude Sonnet 4.6 (Reasoning)100%100%100%100%100%
Claude Opus 4.6100%100%100%100%100%
Qwen 3.5 397B A17B100%100%100%100%100%
Qwen 3.5 122B100%100%100%100%100%
Grok 4.20 (Beta, Reasoning)100%100%100%100%100%
Claude Sonnet 4.6100%100%100%100%100%
MoonshotAI: Kimi K2.5100%100%100%100%100%
Qwen 3.5 27B100%100%100%100%100%
ByteDance Seed 1.6100%100%100%100%100%
GPT-5.4 Mini (Reasoning)100%100%100%100%100%
Claude Opus 4.5100%100%100%100%100%
Aion 2.0100%100%100%100%100%
Z.AI GLM 4.6100%100%100%100%100%
1–15 of 118
Page 1 / 8

Friend got new kittens (Tagalog)

Friend got new kittens (German)

Asking for directions (German)

Asking for directions (Dutch)