Run: 2026-05-01-01KQHR26AYZEES3D3FCJ66FCSW

Run Details

Model:: Grok 4.20 (Reasoning)
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 1m 24s
Cost:: $0.020535
Provider:: xAI

Evaluation summary

Score	Evaluator	Details Link
27.17%	Matches word count	(details)
85.02%	Dialogue to Total Word Ratio	(details)
56.0928%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

Bill scratched his chin. "So Molly, how exactly are we gonna pull off breaking into this children's party to steal all the presents? The place will be crawling with hyper kids and suspicious moms."

"Easy Bill," Molly replied. "We arrive disguised as the pony ride operators. While the little darlings are riding, we sack up the gifts from the table. Use the bouncy castle for our getaway. It's brilliant because no one suspects the entertainment of being thieves."

"But what about the birthday boy? Won't he be devastated without his presents? And if we get caught we'll never live down being the jerks who robbed a kids party," Bill said.

"Bill stop worrying. Kids get presents all the time. This is our ticket to the big time. We'll be laughing about this heist for years. During musical chairs that's when we strike. I'll signal you with a sneeze and you knock over the cake as distraction. Then we grab everything and run to the van parked behind the clown's car. Trust me it's foolproof."

"Okay Molly you convinced me. But next time can we steal from adults? This feels too evil. Let's go over the timing again so we don't mess up."

"Deal. We hit at three when the ice cream is served. Now let's practice our pony voices."

Evaluation details

Result

Evaluator

Details

Meta Data

27.17%

Matches word count

n/a

neededClean	false
words	219

85.02%

Dialogue to Total Word Ratio

Ratio: 96.35%, Deviation: 6.35%

neededClean	false
wordsTotal	219
wordsDialogue	211

56.0928%