Run: 2026-02-16-01KHKAMC0113WRJGFKFJBTMAY5

Run Details

Model:: GPT-5.1
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 30.1s
Cost:: $0.023563
Provider:: OpenAI

Evaluation summary

Score	Evaluator	Details Link
100.00%	Matches word count	(details)
36.79%	Dialogue to Total Word Ratio	(details)
68.3940%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

"Molly: A children’s party, Bill. We’ve finally hit moral rock bottom."

"Bill: Nonsense. Kids have too much joy. We’re just redistributing birthday inequality."

"Molly: You’re stealing unicorn puzzles from toddlers."

"Bill: Unicorn puzzles build unrealistic expectations. I’m basically a life coach."

"Molly: Philosophical kleptomania. Lovely. What’s the plan?"

"Bill: We blend in as entertainment. I juggle, you do face painting, we liberate the presents."

"Molly: My artistic ability peaks at stick figures."

"Bill: Perfect. Nothing distracts parents like a tragically bad tiger on their child’s forehead."

"Molly: And while they’re consoling the weeping tiger, we swipe the loot?"

"Bill: Exactly. You swap gift bags with our decoy sack, I wheel the real one out."

"Molly: What if some kid guards the pile like a sugar‑crazed dragon?"

"Bill: Then you tell them there’s a piñata emergency."

"Molly: That’s not a thing."

"Bill: It is when you shout ‘The candy is endangered!’"

"Molly: I can’t decide if this is genius or a cry for help."

"Bill: Think of the resale value on twelve remote‑control cars."

"Molly: And the guilt?"

"Bill: Optional accessory. Sold separately."

"Molly: Fine. But I’m keeping any stuffed sloth."

"Bill: Deal. Even thieves have standards."

"Molly: Especially about sloths."

Evaluation details

Result

Evaluator

Details

Meta Data

100.00%

Matches word count

n/a

neededClean	false
words	200

36.79%

Dialogue to Total Word Ratio

Ratio: 100.00%, Deviation: 10.00%

neededClean	false
wordsTotal	200
wordsDialogue	200

68.3940%