Run: 2026-07-01-01KWEAFTBWKHYSCY5MYCXCZ7JY

Run Details

Model:: Claude Sonnet 5 (Reasoning, Low)
Test:: Dialogue tags
Scenario:: Write 200 words with 50% dialogue
Duration:: 11.7s
Cost:: $0.004796
Provider:: Amazon Bedrock

Evaluation summary

Score	Evaluator	Details Link
27.17%	Matches word count	(details)
0.00%	Dialogue to Total Word Ratio	(details)
13.5829%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

Molly slid the blueprint across the table, a crayon drawing of a castle bounce house with a suspicious number of arrows pointing to a table labeled "PRESENTS!!!" in glitter glue.

"This is our target?" Bill squinted. "A six-year-old drew our heist plan?"

"Timmy's cousin. She's got an eye for security gaps."

"Molly, there's a clown."

"Bozo's a distraction. Sends the kids into hysterics every time he pulls out that horn."

Bill rubbed his temples. "We're stealing birthday presents from children."

"We're liberating unopened Nerf guns and something called a Bakugan, whatever that is."

"This feels wrong."

"You said that about the diamond heist too."

"That was different! Diamonds don't have thank-you notes attached!"

Molly grinned, tapping the drawing. "The gift table's unguarded during cake time. Ninety seconds, tops."

"And if a parent notices?"

"We're 'entertainment.' You're juggling. I'm doing balloon animals."

"I don't know how to juggle."

"You'll learn."

"Molly—"

"Bill. There's a PlayStation in that pile. Wrapped, but I can tell by the shape."

Bill paused, considering. "...Fine. But I'm not wearing the clown wig."

"Deal. I already claimed it."

Evaluation details

Result

Evaluator

Details

Meta Data

27.17%

Matches word count

n/a

neededClean	false
words	181

0.00%

Dialogue to Total Word Ratio

Ratio: 76.63%, Deviation: 26.63%

neededClean	false
wordsTotal	184
wordsDialogue	141

13.5829%