Run 2026-02-16-01KHK97V9EPKQPZ6RA6DE77PYS

Run Details

Model:: GPT-5.1
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 50.0s
Cost:: $0.018693
Provider:: OpenAI

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

AI

"Molly, I can't believe we're robbing a six-year-old's birthday party."

"Calm down, Bill. Rich six-year-old. Those gift bags are basically ATMs with glitter."

"There's a unicorn bouncy castle. That's not security, that's a trap for my dignity."

"Dignity doesn't pay rent. Presents do. Now, schedule: clown shows at three, magician at three-thirty, we slip in during the balloon-animal chaos."

"Explain how we don't get spotted while hauling a sack of Paw Patrol merchandise out the door."

"Simple. We bring our own sack of decoy presents, swap, and vanish."

"So we're… reverse Santas."

"Exactly. Anti-Clauses."

"I didn't grow up dreaming I'd become the Grinch's unpaid intern."

"Consider it character development. Did you print the fake 'Adult Supervision Inspector' badges?"

"Yes, but they say 'Supversion.' The printer jammed and I panicked."

"Perfect. No one reads badges at a sugar high."

"Fine. What about the kids? They see everything. They're like tiny drunk detectives."

"That’s why we bring camouflage."

"You bought camouflage for a suburban birthday?"

Molly pulled out two oversized mascot heads from the duffel.

"We go in as backup entertainment: Princess Llama and Sir Dance-a-lot."

Bill stared.

"I quit."

"You dance, I loot."

"…I'm billing overtime."

"And I demand cake hazard."

Evaluation details

Result

Evaluator

Details

Meta Data

93.6496%

Matches word count

n/a

neededClean	false
words	209

96.7652%

Dialogue to Total Word Ratio

Ratio: 94.26%, Deviation: 4.26%

neededClean	false
wordsTotal	209
wordsDialogue	197

95.2074%

Run: 2026-02-16-01KHK97V9EPKQPZ6RA6DE77PYS

Run Details

Model input

Model response

Evaluation details