Run: 2026-04-27-01KQ777SZV6RB3P89BH96HTSY5

Run Details

Model:: Qwen 3.6 Flash
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 27.1s
Cost:: $0.009231
Provider:: Alibaba

Evaluation summary

Score	Evaluator	Details Link
86.38%	Matches word count	(details)
64.59%	Dialogue to Total Word Ratio	(details)
75.4868%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

"Okay, the target is Timothy's seventh birthday at Jump 'n' Bounce," Molly whispered. "How many guards?" Bill hissed. "Three. A bored uncle guarding the cookie table and a ferocious toddler named Brayden with a lightsaber." "Brayden? Last time he ate my mask and broke my ankle." "We need a distraction. I'll release the rubber chicken." "Too obvious. Use the piñata strategy." "Did you pack the glitter bombs? We can blind the parents." "Glitter is illegal in six zones. Pack the whoopee smoke." "The map says the presents are under the inflatable dinosaur. What if Brayden naps under it?" "We wake Brayden. Chaos covers the loot." "And the loot? All of it?" "Every wrapped box. I've got the wheelbarrow loaded with anti-gravity tape." "Anti-gravity? Molly, last time you used anti-gravity tape, we floated into the ceiling fan." "That was the cheap brand. This is industrial strength." "I've got the laser pointer mapped to the wrapping paper stash. But Molly, what if they're gluten-free? We can't steal non-vegan gifts." "Bill, focus! We want the Lego castle and the Switchable Power." "What if Brayden is awake?" "Then you're on crowd control with squeaky shoes." "Ready when you are, partner." "Don't call me partner. Call me the one who steals the biggest bag of loot."

Evaluation details

Result

Evaluator

Details

Meta Data

86.38%

Matches word count

n/a

neededClean	false
words	211

64.59%

Dialogue to Total Word Ratio

Ratio: 98.13%, Deviation: 8.13%

neededClean	false
wordsTotal	214
wordsDialogue	210

75.4868%