NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2024-07-19-01J35SRYEH4CNNZGGB8Y3WSQX0
Run Details
Model:
GPT-4o Mini (temp=1)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
5.8s
Cost:
$0.000188
Evaluation summary
Score
Evaluator
Details Link
90.48%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
45.2419%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly peered through the bushes, adjusting her ski mask. "Bill, are you sure this is a good idea? It’s a kid's party!" "Absolutely! Think of it as an... entrepreneurial venture," Bill replied, pulling out a blueprints of the backyard. “Yeah, ‘cause nothing screams success like robbing kids!” Molly snorted. “They’ll never miss ‘em! What do they know about Hot Wheels or Lego sets?” “Except for the whole crying thing when they realize they're gone. You’ve thought this through, huh?” “Alright, the plan is simple. We sneak in during the piñata smashing. It’ll be chaos!” “Brilliant! And what if we get caught?” “Caught? Pfft. It’s just a bunch of snotty four-year-olds! I’m not worried at all. Just stick to the plan!” “And what’s our code word? In case it gets too hot?” “Um… ‘candy cane’!” “Candy cane? Really? Just sounds too festive for a heist!” “Fine! What about ‘banana split’ then?” Molly shrugged. “Better. But it still sounds like dessert!” “Focus! Once they hit the piñata, we swoop in, grab the loot, and vanish!” “Like thieves in the night?” “Exactly! Now, let’s go steal some toys!” “Right behind you, Robin Hood!”
total_cost
0.00018825
Evaluation details
Result
Evaluator
Details
Meta Data
90.48%
Matches word count
n/a
neededClean
false
words
190
0.00%
Dialogue to Total Word Ratio
Ratio: 88.54%, Deviation: 38.54%
neededClean
false
wordsTotal
192
wordsDialogue
170
45.2419%