Run: 2026-04-24-01KQ03D1J2FJQEQ4NDBVW5JRAQ

Run Details

Model:: GPT-OSS 120B
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 6.8s
Cost:: $0.001346
Provider:: Amazon Bedrock

Evaluation summary

Score	Evaluator	Details Link
99.38%	Matches word count	(details)
97.09%	Dialogue to Total Word Ratio	(details)
98.2323%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

Molly and Bill crouch behind the punch bowl, scheming.  
Molly: "Okay, Bill, the birthday cake is the perfect distraction."  
Bill: "Got it, I’ll juggle the clown balloons while you slip in."  
Molly: "Sneak past the piñata, grab the gift table, then vanish."  
Bill: "What about the kids? They’ll scream."  
Molly: "We’ll wear clown masks, they’ll think it’s part of the show."  
Bill: "Brilliant, and the sugar rush will keep them busy."  
Molly: "Timing is key—mid‑song, when the magician says ‘Abracadabra.’"  
Bill: "I’ll set off confetti cannons, then we’ll tip‑toe to the presents."  
Molly: "Don’t forget the Mom’s handbag—her credit cards are the real loot."  
Bill: "Got it, I’ll pocket those before the cake is cut."  
Molly: "Remember, no glitter left behind—clean up is optional."  
Bill: "If we’re caught, we’ll blame the clown."  
Molly: "Perfect, the kids love clowns, the parents fear them."  
Bill: "What’s our exit strategy?"  
Molly: "Slide down the slide, then dash through the backyard gate."  
Bill: "And the getaway car is parked at the neighbor’s birthday cake stand."  
Molly: "Great, I’ll bring the sack, you bring the grin."  
Bill: "Ready when you are, partner in crime."  
Molly: "On three—one, two, three—let's raid the party!"  
Bill: "Heist time!"

Evaluation details

Result

Evaluator

Details

Meta Data

99.38%

Matches word count

n/a

neededClean	false
words	205

97.09%

Dialogue to Total Word Ratio

Ratio: 85.85%, Deviation: 4.15%

neededClean	false
wordsTotal	205
wordsDialogue	176

98.2323%