Relationship tree

Extracts a deterministic XML family and relationship tree from cumulative literary prose.

Price-Performance Score Distribution (Top 20)

Click a model name to view its detail page.

ScoreCostTime
GPT-5.5 (Reasoning, Low)87%$0.09742.6s
Gemini 3 Flash (Preview)81%$0.00877.1s
Gemini 2.5 Pro89%$0.09958.2s
Gemini 2.5 Flash80%$0.00756.1s
GPT-5.4 (Reasoning, Low)88%$0.05335.5s
GPT-5.588%$0.10537.1s
Xiaomi MIMO v2.584%$0.00271.1m
Gemini 3.1 Flash Lite (Preview)78%$0.00463.2s
Grok 4.20 (Beta, Reasoning)83%$0.03230.1s
GPT-5.287%$0.06250.7s
Gemini 3.1 Flash Lite (Reasoning)78%$0.00333.2s
Gemma 4 26B79%$0.001827.2s
DeepSeek V4 Pro (Reasoning)85%$0.0202.3m
Qwen 3.6 Flash81%$0.01551.7s
Gemini 3.1 Flash Lite76%$0.00313.3s
Claude Opus 4.7 (Reasoning)89%$0.18425.0s
GPT-5.481%$0.03317.4s
Gemini 3 Flash (Preview, Reasoning)85%$0.03040.3s
Claude Sonnet 4.683%$0.08123.3s
DeepSeek V3.174%$0.005552.0s
0.800.901.00

Cost vs Performance

Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.

2 low-scoring outliers hidden: Llama 3.1 8B (38.0%), Gemma 3 12B (37.1%).

Most Stable Models (Top 20)

Ranked by stability (median × consistency). Click a model name to view its detail page.

ScoreConsistencyStability
GPT-5.4 (Reasoning)96%93%90%
Claude Opus 4.6 (Reasoning)95%93%88%
GPT-5.5 (Reasoning)94%93%87%
Claude Opus 4.8 (Reasoning)94%90%86%
Claude Sonnet 4.6 (Reasoning)92%88%83%
MoonshotAI: Kimi K2.692%89%82%
GPT-5.4 (Reasoning, Low)88%93%80%
Claude Opus 4.691%90%80%
GPT-5.287%92%79%
Z.AI GLM 5.191%88%79%
MiniMax M387%91%78%
Claude Opus 4.585%92%78%
Claude Opus 4.8 (Reasoning, Low)90%86%77%
Z.AI GLM 5 Turbo87%86%77%
GPT-5.481%95%77%
Grok 4.20 (Beta, Reasoning)83%92%77%
Claude Sonnet 4.579%97%77%
Claude Opus 4.7 (Reasoning)89%83%76%
Qwen 3.6 Flash81%94%76%
Gemini 3 Flash (Preview)81%94%76%
70%80%90%100%

Top Overall Models (Top 20)

Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.

ScoreCostSpeedStability
GPT-5.4 (Reasoning, Low)88%$0.05335.5s80%
Gemini 3 Flash (Preview)81%$0.00877.1s76%
Gemini 3.1 Flash Lite (Preview)78%$0.00463.2s76%
Gemini 2.5 Flash80%$0.00756.1s75%
GPT-5.287%$0.06250.7s79%
Grok 4.20 (Beta, Reasoning)83%$0.03230.1s77%
Gemma 4 26B79%$0.001827.2s76%
GPT-5.481%$0.03317.4s77%
Gemini 3.1 Flash Lite (Reasoning)78%$0.00333.2s74%
Qwen 3.6 Flash81%$0.01551.7s76%
Xiaomi MIMO v2.584%$0.00271.1m73%
Claude Opus 4.691%$0.15431.4s80%
Gemini 3.1 Flash Lite76%$0.00313.3s71%
GPT-5.4 (Reasoning)96%$0.1752.6m90%
Claude Sonnet 4.683%$0.08123.3s76%
GPT-5.588%$0.10537.1s75%
Inception Mercury 276%$0.009612.8s71%
ByteDance Seed 1.681%$0.0141.2m72%
Grok 4.20 (Reasoning)81%$0.02740.0s71%
GPT-5 Mini84%$0.0192.0m75%
70%80%90%100%
1–15 of 140
Page 1 / 10

Core relationship tree

Family relationship tree