NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2024-11-04-01JBW8TN5SG51RYWZS497YC0R5
Run Details
Model:
Claude 3.5 Haiku
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
5.1s
Cost:
$0.001344
Evaluation summary
Score
Evaluator
Details Link
93.65%
Matches word count
(details)
0.02%
Dialogue to Total Word Ratio
(details)
46.8365%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly leaned back in the rickety kitchen chair, her combat boots propped on the formica table. Bill was hunched over a crude diagram sketched on the back of a pizza box. "So you're telling me we're gonna rob a five-year-old's birthday party?" Molly asked, twirling a lockpick between her fingers. Bill adjusted his thick-rimmed glasses. "Not just any party. This is the Jameson kid. His dad's a hedge fund manager. Those presents are gonna be top-tier." "Seriously? We're stealing from children now?" "Technically, we're liberating high-end merchandise from unsuspecting rich parents," Bill said with a wink. Molly snorted. "That's just stealing with extra words. What's the entry point?" Bill traced a line on the diagram. "Back patio. Bounce house provides perfect cover. We'll have a solid seven-minute window between cake cutting and gift opening." "And the security?" "Two grandmothers and an overworked clown. We've got this." Molly rolled her eyes. "This is officially the most ridiculous heist we've ever planned." "You said that about the retirement home job, and we cleared sixty grand in rare coin collections." "Fair point," Molly grinned. "When do we move?" "Saturday. Purple party hat. Be ready."
total_cost
0.001344
Evaluation details
Result
Evaluator
Details
Meta Data
93.65%
Matches word count
n/a
neededClean
false
words
191
0.02%
Dialogue to Total Word Ratio
Ratio: 67.01%, Deviation: 17.01%
neededClean
false
wordsTotal
197
wordsDialogue
132
46.8365%