NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2024-07-19-01J35SHBZH7BVD6NTHME0MSPNS
Run Details
Model:
GPT-4o Mini (temp=0)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
8.2s
Cost:
$0.000180
Evaluation summary
Score
Evaluator
Details Link
90.48%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
45.2419%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly peered through the window of the brightly decorated house. “Look at all those gifts, Bill! We could retire on that haul!” Bill adjusted his ski mask. “Retire? I was thinking more like a nice vacation. You know, somewhere without kids.” Molly rolled her eyes. “Right, because nothing says ‘relaxation’ like a beach full of screaming toddlers.” “Okay, fine. But how do we get in? The door’s wide open, but I’m not walking into a party full of sugar-fueled six-year-olds.” Molly grinned. “Simple. We’ll disguise ourselves as clowns!” “Clowns? You want me to wear a red nose and oversized shoes?” “Think of it as method acting! Plus, who would suspect a couple of clowns?” Bill sighed. “I don’t know, Molly. Clowns are creepy. What if they start throwing cake at us?” “Then we throw the cake back! It’s a win-win!” “Right, because nothing says ‘professional thieves’ like a food fight.” Molly smirked. “Exactly! Now, grab that wig and let’s get our party hats on. We’ve got presents to pilfer!” “Fine, but if I end up with frosting in my hair, I’m blaming you.” “Deal! Now, let’s go steal some joy!”
total_cost
0.00017985
Evaluation details
Result
Evaluator
Details
Meta Data
90.48%
Matches word count
n/a
neededClean
false
words
190
0.00%
Dialogue to Total Word Ratio
Ratio: 87.11%, Deviation: 37.11%
neededClean
false
wordsTotal
194
wordsDialogue
169
45.2419%