Run: 2026-04-24-01KQ0NWV6DJD8KYB0PATDCQ4CQ

Run Details

Model:: GPT-5.5 (Reasoning, Low)
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 11.9s
Cost:: $0.013200
Provider:: OpenAI

Evaluation summary

Score	Evaluator	Details Link
99.92%	Matches word count	(details)
95.76%	Dialogue to Total Word Ratio	(details)
97.8381%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

Molly unfolded a crayon map on the café table.

“Behold, Bill: Operation Birthday Bonanza.”

“Why is the house drawn as a cupcake?”

“Because it’s a themed party. Stay focused.”

“I am focused. I’m focused on the moral weight of robbing six-year-olds.”

“Seven-year-olds. Practically adults.”

“Last week you cried when a pigeon looked disappointed.”

“That pigeon knew what it did.”

“So the plan is: we stroll in, mingle, and steal presents?”

“Liberate presents.”

“From children.”

“From capitalism.”

“Wrapped in dinosaur paper.”

Molly tapped the map.

“We enter during musical chairs.”

“I hate musical chairs. Too much tension.”

“You’ll create a distraction.”

“With what?”

“Your balloon-animal skills.”

“I can make a snake.”

“Everyone can make a snake.”

“Mine has emotional range.”

A waiter passed. Bill lowered his voice.

“What about the birthday girl?”

“We avoid eye contact.”

“She’ll have cake on her face and trust in her heart.”

“Fine. We only take suspiciously large boxes.”

“Those are usually bicycles.”

“Exactly. Evidence of excess.”

Bill sighed.

“What if the children cry?”

Molly paused.

“We return everything and become magicians.”

“Can we start there instead?”

She looked at the crayon map.

“Bill, you’re ruining my criminal arc.”

“You drew glitter on the getaway van.”

“It’s called branding.”

Evaluation details

Result

Evaluator

Details

Meta Data

99.92%

Matches word count

n/a

neededClean	false
words	203

95.76%

Dialogue to Total Word Ratio

Ratio: 85.44%, Deviation: 4.56%

neededClean	false
wordsTotal	206
wordsDialogue	176

97.8381%