Run: 2026-04-24-01KQ0296BCH170S43Q2CQEYFG1

Run Details

Model:: GPT-OSS 120B
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 8.9s
Cost:: $0.001641
Provider:: Amazon Bedrock

Evaluation summary

Score	Evaluator	Details Link
100.00%	Matches word count	(details)
98.39%	Dialogue to Total Word Ratio	(details)
99.1958%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

The dim garage smelled of pizza and old plans; Molly and Bill crouched over a crumpled invitation.  
Molly: “We slip through the pet door, same as before, kids distracted by the clown.”  
Bill: “Clown's pins cover us; they’ll think the noise is part of the act.”  
Molly: “I’ll disable the lights with the portable dimmer, then we grab the gift table before the piñata explodes.”  
Bill: “What about the security camera? I brought the signal jammer from the last job.”  
Molly: “Plug it into the power strip behind the cake stand, and the footage will loop a five‑minute clip of balloons.”  
Bill: “Got the zip‑ties for the presents, right? We’ll bag them fast, no torn wrapping.”  
Molly: “Remember, we leave the candy untouched; kids will think their treats are a miracle and not suspect a thing.”  
Bill: “All right, on the count of three, we bust in, grab the loot, and disappear before the parent‑teacher meeting starts.”  
Bill: “When the clown tosses the last balloon, I’ll swing the rope, slide under the table, and snatch the biggest gift box marked ‘Superhero’ before anyone notices.”  
Molly: “If any kid cries, we’ll blame the overdue nap time and slip out quietly.”

Evaluation details

Result

Evaluator

Details

Meta Data

100.00%

Matches word count

n/a

neededClean	false
words	199

98.39%

Dialogue to Total Word Ratio

Ratio: 86.43%, Deviation: 3.57%

neededClean	false
wordsTotal	199
wordsDialogue	172

99.1958%