NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2024-07-19-01J35T3020DZP7RYKF5DDHDBNW
Run Details
Model:
GPT-4o Mini (temp=1)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
3.2s
Cost:
$0.000174
Evaluation summary
Score
Evaluator
Details Link
97.63%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
48.8138%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly glanced over the fence, eyeing the colorful balloons. "You really think we can pull this off?" Bill adjusted his ski mask. "Come on, Molly! It's just a bunch of kids. They won't even notice the presents missing!" "What if they do? I can’t bear the thought of a crying child!" Bill chuckled. "You’re worried about their feelings? We’re stealing their toys!” Molly crossed her arms. "Yeah, but they’re kids! They don’t deserve this.” "Neither do we! We need a payday! Did you see the way they stack those gifts? It’s like a treasure mountain!" Molly sighed. "Alright, but we stick to the plan. In and out, no crying kids, and definitely no dog!" "Relax! What’s a little barking gonna do? We're professionals!" "Right, because nothing says ‘professional’ like robbing a birthday party." Bill smirked. "Hey, we’re doing them a favor! Think of it this way: fewer presents mean more time for cakes and games!" "Does that include running away after we grab the gifts?" "Of course! Besides, who could resist frosting?" Molly rolled her eyes. "Let’s hope the balloons don’t pop. It’s hard to stay low-profile with a bunch of squeaky rubber."
total_cost
0.00017385
Evaluation details
Result
Evaluator
Details
Meta Data
97.63%
Matches word count
n/a
neededClean
false
words
193
0.00%
Dialogue to Total Word Ratio
Ratio: 85.57%, Deviation: 35.57%
neededClean
false
wordsTotal
194
wordsDialogue
166
48.8138%