NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2024-07-19-01J35ST3940CNA9XJ6WTW2CXHW
Run Details
Model:
GPT-4o Mini (temp=1)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
3.9s
Cost:
$0.000189
Evaluation summary
Score
Evaluator
Details Link
99.74%
Matches word count
(details)
36.79%
Dialogue to Total Word Ratio
(details)
68.2661%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Molly, are you sure this is a good idea?" "Bill, it’s genius! Kids won’t notice a couple of adults in costumes!" "In clown suits? It’s not exactly inconspicuous, is it?" "Hey, I’ve seen the way they behave. If we juggle and honk, they’ll think we’re part of the show!" "And you think they’ll just hand over the toys?" "Exactly! ‘Oh look, a clown! Here, take my bicycle!’" "Right, because kids are known for their generosity. What if they scream?" "Then we make balloon animals to distract them." "Bi-what? I can’t even tie my shoelaces, Molly!" "Come on, just twist them like this." "But I don't want a rubber chicken thrown at me!" "It’s all about the presentation. Plus, what’s better than leaving a party with a truckload of presents?" "What about the parents?" "They’ll be busy with the cake! ‘Cake-distract-o’ strategy." "Are we even going to fit all this in the van?" "Honestly, I was thinking of an inflatable pool. It’s easier to carry." "An inflatable pool? For toys? You’re nuts!" "Exactly! Our cover! C’mon Bill, it’ll be fun!" "Fine, but if this flops, I’ll never do this again!" “Deal. Now, let’s practice our clown faces!”
total_cost
0.00018884999999999998
Evaluation details
Result
Evaluator
Details
Meta Data
99.74%
Matches word count
n/a
neededClean
false
words
196
36.79%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
197
wordsDialogue
197
68.2661%