Run: 2026-04-24-01KQ02NJVWFS6954YD4F59XWSF

Run Details

Model:: GPT-OSS 120B
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 1m 56s
Cost:: $0.000869
Provider:: DeepInfra

Evaluation summary

Score	Evaluator	Details Link
100.00%	Matches word count	(details)
97.47%	Dialogue to Total Word Ratio	(details)
98.7362%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

In the garage, Molly and Bill huddled over the party invitation, whispering.  
Molly: "The party is at 3 p.m., kids will be distracted by balloons."  
Bill: "We slip in through the back, grab the gift table, and vanish before the clown arrives."  
Molly: "What about the piñata? Should we swing it or just take the candy?"  
Bill: "Leave the piñata, it's messy; the presents are all in the gift bags."  
Molly: "The gift table is under a pink tarp. We need a ladder."  
Bill: "I brought one from the construction site, hidden in my van."  
Molly: "Great, we knock on the door, pretend to be entertainers."  
Bill: "I'll wear the oversized rabbit suit, you’ll be the magician."  
Molly: "Kids love magicians, they’ll gather around, we’ll distract them with tricks."  
Bill: "Meanwhile I’ll scoop up the bags, quick as a bunny."  
Molly: "We need to time it with the cake cutting, that’s the peak chaos."  
Bill: "Exactly, the sirens of frosting drown out any alarm."  
Molly: "After we’re out, we split the loot at the dock."  
Bill: "And we never speak of it again."  
Bill: "We'll vanish like ghosts, no footprints left."  
Molly: "And the kids will think it was magic."

Evaluation details

Result

Evaluator

Details

Meta Data

100.00%

Matches word count

n/a

neededClean	false
words	200

97.47%

Dialogue to Total Word Ratio

Ratio: 86.00%, Deviation: 4.00%

neededClean	false
wordsTotal	200
wordsDialogue	172

98.7362%