Run: 2026-02-20-01KHY8RZ464VB5CZEBHXS9TQ2R

Run Details

Model:: Grok 4
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 17.6s
Cost:: $0.007925
Provider:: xAI

Evaluation summary

Score	Evaluator	Details Link
51.93%	Matches word count	(details)
73.72%	Dialogue to Total Word Ratio	(details)
62.8206%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

Molly paced the dingy motel room, her eyes gleaming with mischief, while Bill hunched over a crumpled blueprint of the suburban McMansion.

"Alright, Bill, listen up," Molly said, jabbing a finger at the map. "The party's at noon tomorrow—little Timmy's turning eight. Balloons, cake, the works. Our target: that pile of presents under the bouncy castle."

Bill scratched his beard. "Presents? For kids? Molly, that's low even for us. What if we get caught by a pack of sugar-rushed munchkins?"

"Low? It's genius! We'll dress as clowns—big shoes, red noses, the whole shebang. You distract with balloon animals, I'll snatch the gifts into our giant clown pants."

"Clown pants? Those things could hide a pony! But what about the parents? Soccer moms with pepper spray?"

"Easy—I'll spike the punch with laxatives. They'll be too busy in the bathroom to notice. We zip-line out the window with the loot. Imagine the haul: bikes, toys, maybe a PlayStation!"

Bill chuckled. "You're insane. Remember the zoo heist? We almost got mauled by penguins."

"This is foolproof! In and out in ten minutes. We'll fence the stuff at the black market toy exchange. Come on, partner—let's make some kids cry... for comedy's sake."

Bill sighed. "Fine, but if a toddler bites me, you're on diaper duty."

Evaluation details

Result

Evaluator

Details

Meta Data

51.93%

Matches word count

n/a

neededClean	false
words	216

73.72%

Dialogue to Total Word Ratio

Ratio: 82.57%, Deviation: 7.43%

neededClean	false
wordsTotal	218
wordsDialogue	180

62.8206%