Language Comprehension

Does the model understand more than just English?

Price-Performance Score Distribution (Top 20)

Click a model name to view its detail page.

ScoreCostTime
Mistral NeMO95%$0.0000625ms
Inception Mercury65%$0.0000562ms
Ministral 3 3B100%$0.0000833ms
Gemini 2.5 Flash Lite80%$0.0001648ms
Ministral 8B55%$0.0000540ms
GPT-4o Mini (temp=1)55%$0.0000942ms
Stealth: Aurora Alpha85%—1.3s
GPT-5.4 Nano70%$0.0001990ms
Gemini 3.1 Flash Lite (Preview)95%$0.0001975ms
Gemini 3.1 Flash Lite (Reasoning)95%$0.00021.6s
Gemini 3.1 Flash Lite85%$0.0001905ms
GPT-5.4 Mini80%$0.0002747ms
Gemma 3 4B70%$0.00001.7s
Mistral Small 455%$0.00011.1s
Arcee AI: Trinity Large (Preview)80%$0.00002.5s
Llama 3.1 8B55%$0.00001.3s
Mistral Small 3.2 24B75%$0.00002.4s
GPT-4.1 Nano65%$0.00001.7s
Inception Mercury 275%$0.0003771ms
Gemini 2.5 Flash75%$0.0002817ms
0.500.600.700.800.901.00

Cost vs Performance

Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.

1 low-scoring outlier hidden: Ministral 3B (25.0%).

Most Stable Models (Top 20)

Ranked by stability (median × consistency). Click a model name to view its detail page.

ScoreConsistencyStability
Claude Opus 4.6 (Reasoning)100%100%100%
Qwen3.6 Max Preview100%100%100%
Z.AI GLM 5 Turbo100%100%100%
Claude Sonnet 4.6 (Reasoning)100%100%100%
Claude Opus 4.7 (Reasoning)100%100%100%
GPT-5.5 (Reasoning)100%100%100%
GPT-5.5 (Reasoning, Low)100%100%100%
Claude Opus 4.6100%100%100%
MoonshotAI: Kimi K2.6100%100%100%
Qwen 3.5 397B A17B100%100%100%
Qwen 3.5 122B100%100%100%
Qwen 3.5 Plus (2026-04-20)100%100%100%
Grok 4.20 (Beta, Reasoning)100%100%100%
Claude Sonnet 4.6100%100%100%
MoonshotAI: Kimi K2.5100%100%100%
Qwen 3.5 27B100%100%100%
ByteDance Seed 1.6100%100%100%
GPT-5.4 Mini (Reasoning)100%100%100%
Claude Opus 4.5100%100%100%
Aion 2.0100%100%100%
100%

Top Overall Models (Top 20)

Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.

ScoreCostSpeedStability
Ministral 3 3B100%$0.0000833ms100%
Mistral Large 3100%$0.00023.7s100%
DeepSeek V3 (2024-12-26)100%$0.00024.8s100%
Qwen 3.5 Plus (2026-02-15)100%$0.00034.8s100%
Mistral Large 2100%$0.00093.8s100%
DeepSeek-V2 Chat100%$0.00006.6s100%
GPT-4o, May 13th (temp=0)100%$0.00162.4s100%
DeepSeek V3 (2025-03-24)100%$0.00017.5s100%
GPT-5.4 Mini (Reasoning)100%$0.00124.8s100%
Claude Sonnet 4.6100%$0.00213.1s100%
Hermes 3 405B100%$0.000012.1s100%
Claude Opus 4.5100%$0.00353.7s100%
Claude Opus 4.6100%$0.00374.3s100%
GPT-5.5 (Reasoning, Low)100%$0.00394.7s100%
Z.AI GLM 5 Turbo100%$0.00309.5s100%
Aion 2.0100%$0.001115.9s100%
Claude Opus 4.7 (Reasoning)100%$0.00563.2s100%
ByteDance Seed 1.6100%$0.001315.8s100%
GPT-5.5 (Reasoning)100%$0.00686.3s100%
Claude Sonnet 4.6 (Reasoning)100%$0.00618.3s100%
100%
Model Total â–¼Friend got new kittens (Tagalog)Friend got new kittens (German)Asking for directions (German)Asking for directions (Dutch)
Claude Opus 4.6 (Reasoning)100%100%100%100%100%
Qwen3.6 Max Preview100%100%100%100%100%
Z.AI GLM 5 Turbo100%100%100%100%100%
Claude Sonnet 4.6 (Reasoning)100%100%100%100%100%
Claude Opus 4.7 (Reasoning)100%100%100%100%100%
GPT-5.5 (Reasoning)100%100%100%100%100%
GPT-5.5 (Reasoning, Low)100%100%100%100%100%
Claude Opus 4.6100%100%100%100%100%
MoonshotAI: Kimi K2.6100%100%100%100%100%
Qwen 3.5 397B A17B100%100%100%100%100%
Qwen 3.5 122B100%100%100%100%100%
Qwen 3.5 Plus (2026-04-20)100%100%100%100%100%
Grok 4.20 (Beta, Reasoning)100%100%100%100%100%
Claude Sonnet 4.6100%100%100%100%100%
MoonshotAI: Kimi K2.5100%100%100%100%100%
1–15 of 147
Page 1 / 10

Friend got new kittens (Tagalog)

Friend got new kittens (German)

Asking for directions (German)

Asking for directions (Dutch)