Voice/dialogue sheets

Extract dialogue from given text as voice sheets.

Price-Performance Score Distribution (Top 20)

Click a model name to view its detail page.

ScoreCostTime
Stealth: Healer Alpha76%$0.00005.0s
DeepSeek V4 Flash62%$0.00013.7s
Gemini 2.5 Flash Lite66%$0.0001610ms
Mistral Small 3.2 24B72%$0.00012.1s
Stealth: Hunter Alpha70%$0.000018.7s
Mistral Small 460%$0.00011.3s
GPT-4o Mini (temp=0)60%$0.00013.7s
Llama 3.1 8B54%$0.0001919ms
Gemma 4 26B88%$0.00014.5s
Qwen3 235B A22B Instruct 250772%$0.00014.9s
DeepSeek-V2 Chat80%$0.00018.2s
Gemma 4 31B100%$0.00018.3s
Gemma 3 12B52%$0.00004.1s
DeepSeek V4 Flash (Reasoning)78%$0.000220.6s
Z.AI GLM 4.576%$0.00045.3s
Hermes 3 405B58%$0.000013.3s
Gemini 3.1 Flash Lite (Reasoning)78%$0.00031.4s
Gemini 3.1 Flash Lite (Preview)70%$0.0003979ms
DeepSeek V3 (2025-03-24)84%$0.00036.1s
ByteDance Seed 1.6 Flash66%$0.00024.5s
0.600.700.800.901.00

Cost vs Performance

Compares total cost for this test against the test score. Quadrant lines are drawn at the median values. Only models with available cost data are shown.

2 low-scoring outliers hidden: Ministral 3 3B (0.0%), LFM2 24B (0.0%).

Most Stable Models (Top 20)

Ranked by stability (median × consistency). Click a model name to view its detail page.

ScoreConsistencyStability
Claude Opus 4.6 (Reasoning)100%100%100%
Claude Opus 4.6100%100%100%
Claude Sonnet 4.6100%100%100%
Claude Sonnet 4100%100%100%
Gemma 4 31B100%100%100%
Claude 3.7 Sonnet100%100%100%
Qwen3.6 Max Preview96%61%61%
GPT-4.196%61%61%
GPT-4o, May 13th (temp=0)96%61%61%
ByteDance Seed 1.694%53%53%
Gemma 4 31B (Reasoning)92%46%46%
Grok 4.20 (Reasoning)92%46%46%
GPT-4o, Aug. 6th (temp=0)92%46%46%
Claude Sonnet 4.6 (Reasoning)90%40%40%
GPT-4o, May 13th (temp=1)90%40%40%
Gemma 4 26B88%35%35%
Qwen 3.5 122B88%35%35%
Gemini 2.5 Flash88%35%35%
Claude Opus 486%31%31%
Qwen 3.5 Plus (2026-02-15)86%31%31%
0%10%20%30%40%50%60%70%80%90%100%

Top Overall Models (Top 20)

Ranked by composite score (performance, cost, speed & stability). Click a model name to view its detail page.

ScoreCostSpeedStability
Gemma 4 31B100%$0.00018.3s100%
Claude Sonnet 4.6100%$0.00331.9s100%
Claude Sonnet 4100%$0.00332.8s100%
Claude 3.7 Sonnet100%$0.00343.5s100%
Claude Opus 4.6100%$0.00553.9s100%
Claude Opus 4.6 (Reasoning)100%$0.00603.1s100%
GPT-4.196%$0.00172.6s61%
GPT-4o, May 13th (temp=0)96%$0.00375.2s61%
GPT-4o, Aug. 6th (temp=0)92%$0.00222.4s46%
Gemini 2.5 Flash88%$0.0005933ms35%
ByteDance Seed 1.694%$0.001314.2s53%
Gemma 4 26B88%$0.00014.5s35%
Qwen 3.5 Plus (2026-02-15)86%$0.00056.6s31%
Grok 4 Fast84%$0.00033.6s27%
Grok 4.1 Fast84%$0.00044.4s27%
GPT-4o, May 13th (temp=1)90%$0.00375.2s40%
DeepSeek V3 (2025-03-24)84%$0.00036.1s27%
Grok 4.20 (Reasoning)92%$0.003512.9s46%
Claude Haiku 4.582%$0.00111.6s23%
Gemini 3 Flash (Preview)80%$0.00061.8s20%
0%10%20%30%40%50%60%70%80%90%100%
Model Total â–¼SimpleSimple (1-shot)Simple (5-shot)Multiple speakersUnattributed dialogue
Claude Opus 4.6 (Reasoning)100%100%100%100%100%100%
Claude Opus 4.6100%100%100%100%100%100%
Claude Sonnet 4.6100%100%100%100%100%100%
Claude Sonnet 4100%100%100%100%100%100%
Gemma 4 31B100%100%100%100%100%100%
Claude 3.7 Sonnet100%100%100%100%100%100%
Qwen3.6 Max Preview96%90%100%100%90%100%
GPT-4.196%90%100%100%90%100%
GPT-4o, May 13th (temp=0)96%100%100%100%80%100%
ByteDance Seed 1.694%90%100%100%80%100%
Gemma 4 31B (Reasoning)92%90%90%100%80%100%
Grok 4.20 (Reasoning)92%100%100%100%60%100%
GPT-4o, Aug. 6th (temp=0)92%100%100%100%60%100%
Claude Sonnet 4.6 (Reasoning)90%100%90%60%100%100%
GPT-4o, May 13th (temp=1)90%90%100%70%90%100%
1–15 of 147
Page 1 / 10

Simple

Simple (1-shot)

Simple (5-shot)

Few-shot Rule Following

Multiple speakers

Unattributed dialogue