Run: 2026-04-24-01KQ045MCF1TD7QM6XQVYCYS7E

Run Details

Model:: GPT-OSS 120B
Test:: Dialogue tags
Scenario:: Write 200 words with 50% dialogue
Duration:: 8m 38s
Cost:: $0.002407
Provider:: Amazon Bedrock

Evaluation summary

Score	Evaluator	Details Link
100.00%	Matches word count	(details)
99.44%	Dialogue to Total Word Ratio	(details)
99.7200%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

Rain drummed on the roof of the center, where the planning session hid behind castles. Molly crouched beside a cracked cookie tin, studying the floor plan of the birthday bash. Bill perched on a wobbling chair, fiddling with a toy grenade. The room smelled of cake frosting and latex. Outside, giggling children shouted “Pin the Tail on the Donkey!” while parents struggled with party hats. The target: a room packed with wrapped gifts, each promising cash value to the buyer. Their mission was simple—sneak in, swipe every present, vanish before the cake was cut.  
Molly: "We'll slip through the piñata opening, it's the biggest gap after they dump the candy."  
Bill: "The aunt with the oversized clown nose watches the gift table, but she can't see behind the balloon arch."  
Molly: "We disable the noisy DJ console, then the kids think it's a game and keep dancing."  
Bill: "What about the chocolate fountain? It's sticky, but we can use it to slide under the table."  
Molly: "After we grab the presents, we dump them into the laundry basket and exit through the staff door."  
Bill: "Got it. Meet at the cupcake stand at midnight, and no one mentions word 'pirates'."

Evaluation details

Result

Evaluator

Details

Meta Data

100.00%

Matches word count

n/a

neededClean	false
words	201

99.44%

Dialogue to Total Word Ratio

Ratio: 52.74%, Deviation: 2.74%

neededClean	false
wordsTotal	201
wordsDialogue	106

99.7200%