Run: 2026-05-01-01KQHSH13D1BBHFE5Y10CZ86BF

Run Details

Model:: Grok 4.20 (Reasoning)
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 1m 27s
Cost:: $0.016853
Provider:: xAI

Evaluation summary

Score	Evaluator	Details Link
90.48%	Matches word count	(details)
92.75%	Dialogue to Total Word Ratio	(details)
91.6159%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

"Bill, this plan is insane but tell me again how we're going to break into that children's party and steal all the presents without getting caught," said Molly. "There will be parents and vicious kids everywhere."

"Easy Molly, we arrive as surprise entertainers, a pair of bumbling magicians," replied Bill. "While I do card tricks you load the bags and if anyone asks it's all part of the show. We fake it like always. The kids won't know the difference and the adults will be too busy to notice. Plus we bring our own cake with a file in it just in case."

"A file in the cake? That's for prison breaks not birthday parties you moron," said Molly. "This is the worst idea since we tried to steal the Easter bunny. Why are we doing this?"

"Imagine the loot Molly all those new bikes computers and stuffed animals we can sell for top dollar. It's perfect," said Bill. "We'll be in and out before the cake is cut. No one will be the wiser. Ready partner?"

"Ready as I'll ever be for this madness but if a kid cries I'm out. Let's review the escape route before we put on those polka dot suits and ridiculous noses," said Molly.

Evaluation details

Result

Evaluator

Details

Meta Data

90.48%

Matches word count

n/a

neededClean	false
words	210

92.75%

Dialogue to Total Word Ratio

Ratio: 95.24%, Deviation: 5.24%

neededClean	false
wordsTotal	210
wordsDialogue	200

91.6159%