Run: 2026-04-24-01KQ0NQ2RGHKFNA8NMHMNZ6DPH

Run Details

Model:: GPT-5.5
Test:: Dialogue tags
Scenario:: Write 200 words with 10% dialogue
Duration:: 7.5s
Cost:: $0.008160
Provider:: OpenAI

Evaluation summary

Score	Evaluator	Details Link
99.38%	Matches word count	(details)
91.94%	Dialogue to Total Word Ratio	(details)
95.6570%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

Molly crouched behind the hydrangeas, wearing a caterpillar costume that smelled faintly of attic and regret. Across the street, the Henderson house glowed with balloons, streamers, and the unmistakable shriek of sugared children. Bill unfolded a blueprint he had drawn on the back of a pizza menu, then held it upside down with professional confidence.

Inside, a magician produced scarves from his sleeve while parents applauded with the dead eyes of the defeated. Near the fireplace, a mountain of presents shimmered in dinosaur paper and smug ribbons.

“Cake first, loot second,” Molly whispered.

Bill nodded gravely. “No, presents first. Cake distracts.”

A golden retriever in a party hat spotted them and began wagging like a security alarm made of joy. Molly tossed it a sausage roll. The dog accepted the bribe and joined their criminal enterprise.

Their plan was simple: enter as entertainers, perform one balloon animal, create confusion, evacuate gifts in a laundry cart labeled Extremely Boring Towels. Unfortunately, Bill’s balloon animal was a sword, his only trick, and the birthday boy immediately challenged him to a duel.

Molly sighed, adjusted her antennae, and rolled the cart toward destiny, frosting, and probable community service.

Evaluation details

Result

Evaluator

Details

Meta Data

99.38%

Matches word count

n/a

neededClean	false
words	195

91.94%

Dialogue to Total Word Ratio

Ratio: 4.62%, Deviation: 5.38%

neededClean	false
wordsTotal	195
wordsDialogue	9

95.6570%