Run: 2026-05-01-01KQHSKPG2THHKQ4EW1271Z5X1

Run Details

Model:: Grok 4.20
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 6.7s
Cost:: $0.000908
Provider:: xAI

Evaluation summary

Score	Evaluator	Details Link
27.17%	Matches word count	(details)
36.79%	Dialogue to Total Word Ratio	(details)
31.9769%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

“Bill, you’re sure this is the one? A kiddie party? We’re stealing from toddlers now?”

“Toddlers with rich parents, Molly. Those presents are stacked like pirate treasure. We hit it during musical chairs, in and out.”

“I draw the line at wearing the clown suit. I’m not painting my face.”

“You’re not the clown, I’m the clown. You’re the balloon artist. Nobody suspects balloon artists. We slip in, I do the funny walk, you palm the gift bags while they’re distracted by my balloon poodle.”

“Your balloon poodle looks like a sausage with legs. Last time a kid cried.”

“Exactly. Tears create chaos. Chaos is our friend. While they’re consoling little Timmy, we vacuum up the loot. Think of it as early redistribution of wealth.”

“I’m thinking more like we’re going to hell, Bill. What’s next, nicking teeth from under pillows?”

“Don’t tempt me. Focus. You enter from the garden pretending to deliver the bouncy castle. I burst out of the cake yelling ‘It’s magic time!’ Then we both yell ‘Pin the tail on the donkey’ and start stuffing sacks.”

“You’ve actually practiced yelling ‘It’s magic time,’ haven’t you?”

“Every morning in the shower. You in or not?”

“Fine. But if I end up twisting balloon animals while you moonwalk in size-twenty shoes, I’m keeping the biggest teddy bear.”

Evaluation details

Result

Evaluator

Details

Meta Data

27.17%

Matches word count

n/a

neededClean	false
words	219

36.79%

Dialogue to Total Word Ratio

Ratio: 100.00%, Deviation: 10.00%

neededClean	false
wordsTotal	220
wordsDialogue	220

31.9769%