Run: 2026-04-24-01KQ0AM3PJTSRPNYW5CM8M4TCK

Run Details

Model:: GPT-OSS 120B
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 2m 23s
Cost:: $0.000875
Provider:: Novita

Evaluation summary

Score	Evaluator	Details Link
99.38%	Matches word count	(details)
97.09%	Dialogue to Total Word Ratio	(details)
98.2323%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

Molly crouches behind the inflatable castle, Bill whispers from the snack table.  
Molly: "Okay, the party starts in fifteen, kids will be glued to the clown, perfect distraction."  
Bill: "Yeah, the clown's juggling knives—big mistake, they'll be too busy watching those."  
Molly: "We need the back door, the one that leads straight to the gift table."  
Bill: "Got it. I scoped the hallway; the sensor alarm is set for thirty seconds after motion."  
Molly: "Thirty seconds? We can sprint in, grab everything, and be out before the kids notice."  
Bill: "Our exit plan includes the puppy tunnel, you know, the one with the fake bone."  
Molly: "Brilliant. I’ll disable the lights with this tiny flashlight, you handle the door."  
Bill: "Fine, but watch the confetti cannon—once it fires, everyone will be covered in glitter."  
Molly: "That’s actually our advantage; they'll be too busy cleaning to notice missing presents."  
Bill: "I'll grab the candy stash as a backup if the gifts are locked."  
Molly: "Lock? Those gift boxes are just cardboard. Easy."  
Bill: "Alright, on my count—three, two, one—go!"  
Molly: "Leave the balloon arch; kids love it."  
Bill: "Perfect. Let’s make it unforgettable."  
Bill: "We vanish before the cake cutting."  
Lights dim.

Evaluation details

Result

Evaluator

Details

Meta Data

99.38%

Matches word count

n/a

neededClean	false
words	205

97.09%

Dialogue to Total Word Ratio

Ratio: 85.85%, Deviation: 4.15%

neededClean	false
wordsTotal	205
wordsDialogue	176

98.2323%