Run: 2026-04-24-01KQ09VNRW1HWSV6WAEFM5YNJT

Run Details

Model:: GPT-OSS 120B
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 12.8s
Cost:: $0.001347
Provider:: Nebius

Evaluation summary

Score	Evaluator	Details Link
100.00%	Matches word count	(details)
88.44%	Dialogue to Total Word Ratio	(details)
94.2190%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

The dusty attic echoed with muffled giggles as Molly unfurled a crumpled map of the party venue for the loot.  
Molly: "We hit the kid zone at midnight, when the clown's nap time overlaps the cake frosting."  
Bill: "Midnight? The party ends at eleven; that's a two‑hour window for nap‑time mayhem."  
Molly: "Exactly, the parents will be snoozing, the kids will be asleep, and the candy will be unguarded."  
Bill: "What about the balloon arch? It could trigger the motion sensor."  
Molly: "I'll bring the anti‑balloon deflater; a tiny pinprick and poof—no more floating alarms."  
Bill: "And the gift table? It's a fortress of wrapping paper."  
Molly: "I've got a heat gun to melt tape without ripping the boxes, perfect for a silent swap."  
Bill: "Do we need disguises? I can dress as a giant dinosaur, it scares kids and gives us cover."  
Molly: "A dinosaur is cliché; I prefer a giant inflatable unicorn, double‑rainbow effect distracts the security cameras."  
Bill: "What about the toy robot that patrols the hall? It's programmed to squeal at intruders."  
Molly: "We'll jam its firmware with a homemade EMP; a forty‑second blackout and it's a tin can."  
Bill: "How many presents are we talking?"

Evaluation details

Result

Evaluator

Details

Meta Data

100.00%

Matches word count

n/a

neededClean	false
words	201

88.44%

Dialogue to Total Word Ratio

Ratio: 84.08%, Deviation: 5.92%

neededClean	false
wordsTotal	201
wordsDialogue	169

94.2190%