NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2024-07-19-01J35T9J0VHFF63YCBY8K4QK89
Run Details
Model:
GPT-4o Mini (temp=1)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
7.9s
Cost:
$0.000193
Evaluation summary
Score
Evaluator
Details Link
99.74%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.8722%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill crouched behind a massive inflatable unicorn, its vibrant colors oddly comforting amid their shifty scheme. “Alright, Bill,” Molly whispered, adjusting her beanie. “We in and out like ninjas, right?” “More like ninj-less,” Bill smirked, clutching a sack the size of his ambition. “You sure we’ll fit? I mean, have you seen the size of that cake?” Molly rolled her eyes. “Focus, Bill. We don’t want the kids to spot us. Just keep your eyes on the prize.” “Prize? You mean the giant stuffed bear? It’s practically a small car!” “Exactly! And that’s why we’re here. You distract the clowns, I’ll grab the gifts.” “Clowns? You couldn’t have picked a less terrifying distraction?” Bill gulped. “What if they start juggling me?” “Then you just juggle them back! Just remember: we’re thieves, not terrified toddlers. You can do this!” “Alright, but if I see a pie-in-the-face gag, I’m bailing.” Bill took a deep breath and peeked out from behind the unicorn. “What’s the plan again?” “Stick to the shadows, avoid sticky fingers, and don’t trip over the obstacle course.” “Easy for you to say. You’re not the one with a knack for gravity!” Molly snickered, shaking her head. “Let’s do this, gravity boy!”
total_cost
0.00019305
Evaluation details
Result
Evaluator
Details
Meta Data
99.74%
Matches word count
n/a
neededClean
false
words
204
0.00%
Dialogue to Total Word Ratio
Ratio: 73.08%, Deviation: 63.08%
neededClean
false
wordsTotal
208
wordsDialogue
152
49.8722%