NC Bench
Overview
About
Tests
Models
Model
Llama 3.1 8B
Test
Dialogue tags
Scenario
Write 500 words with 70% dialogue
Run ID
2024-07-23-01J3G81SM5GYB4XC45X0R51YK8
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly leaned against the counter, sipping a lukewarm coffee, as Bill stared at the layout of the conference room projected behind him. Their newest marks were a children's party catering service, renting out bouncy castles and party hats to parents who would rather not do the work themselves. "Alright, so we've got our entry points, our mark, and our exit routes," Bill said, waving at the floor plan. "We're in, we grab the presents, and out." Molly raised an eyebrow. "You think everyone just tosses their gifts in a heap on the floor? That's adorable. They're probably color-coded and wrapped in matching bows." Bill shrugged. "It's worth a shot, right? And even if they're not, we've got enough presents to give us a decent chance. This party's got at least three kids there, not to mention the obligatory plus one or two. That's an average of five to six presents per kid, conservatively." Molly snorted. "You're really counting on the plus one being another kid, aren't you? You're so hopeless, Bill. Anyway, what's the security like?" "Two adults who are clearly distracted by a group of six-year-olds trying to eat ice cream cones without dripping anything on their white party shirts. And judging by the picture on their Facebook page, they've got at least three dogs at home, so I'm guessing they're either a very happy family or a very miserable one." Molly nodded thoughtfully. "Okay, I like our chances. What's the payout?" Bill grinned. "Well, if we get the high-end stuff – the iPhones, the video games, the Lego Mindstorms kits – we're looking at at least five grand, maybe more. But let's be real, we're probably going to get a bunch of old toys and Hand-Me-Downs we'll never be able to sell. So, maybe three grand, tops." Molly's face set in a determined expression. "Well, in that case, let's make sure we get the good stuff. I'm thinking of taking out the catering service's walk-in freezer to raise the stakes. We can threaten to put all the ice cream cones in there to melt." Bill groaned. "Molly, you want to turn this into a hostage situation? That's not how we get in and out quietly. Besides, what if the police show up? You do remember that time we accidentally took out the neighbor's mailman instead of the accountant?" Molly shrugged. "Well, that was a minor setback. Okay, no walk-in freezer. But let's make this interesting – loser buys dinner at the most embarrassing restaurant in the city." Bill hesitated before grinning. "Okay, you're on. But if we get caught, which we will – and I'm not saying I think we will, but hypothetically – the loser also buys bail and wears the getaway outfit for the subsequent police lineup." The usually staid conference room seemed to shimmer with mischief as the two thieves shook on their bet, silently grinning the whole way through the handshake.
id
"gen-4v6YmIfHnT96TiopHjHp2QymtRrL"
total_cost
0.0001434
upstream_id
"11ca6319-09cb-46c3-90fa-aef846e019fb"
created_at
"2024-07-23T16:51:05.444647+00:00"
Tuesday, July 23rd, 2024 at 4:51:05 PM GMT+00:00
model
"meta-llama/llama-3.1-8b-instruct"
app_id
182717
streamed
false
cancelled
false
provider_name
"Fireworks"
latency
2506
moderation_latency
(null)
generation_time
0
finish_reason
"stop"
tokens_prompt
102
tokens_completion
645
native_tokens_prompt
70
native_tokens_completion
647
num_media_prompt
(null)
num_media_completion
(null)
origin
"https://ncbench.com/"
usage
0.0001434
Evaluation details
Result
Evaluator
Details
Meta Data
90.4837%
Matches word count
n/a
neededClean
false
words
490
4.9436%
Dialogue to Total Word Ratio
Ratio: 77.76%, Deviation: 7.76%
neededClean
false
wordsTotal
490
wordsDialogue
381
47.7137%