Run: 2024-07-23-01J3G81SM5GYB4XC45X0R51YK8

Run Details

Model:: Llama 3.1 8B
Test:: Dialogue tags
Scenario:: Write 500 words with 70% dialogue
Duration:: 2.6s
Cost:: $0.000143
Provider:: Fireworks

Evaluation summary

Score	Evaluator	Details Link
35.00%	Matches word count	(details)
4.94%	Dialogue to Total Word Ratio	(details)
19.9729%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

Molly leaned against the counter, sipping a lukewarm coffee, as Bill stared at the layout of the conference room projected behind him. Their newest marks were a children's party catering service, renting out bouncy castles and party hats to parents who would rather not do the work themselves.

"Alright, so we've got our entry points, our mark, and our exit routes," Bill said, waving at the floor plan. "We're in, we grab the presents, and out."

Molly raised an eyebrow. "You think everyone just tosses their gifts in a heap on the floor? That's adorable. They're probably color-coded and wrapped in matching bows."

Bill shrugged. "It's worth a shot, right? And even if they're not, we've got enough presents to give us a decent chance. This party's got at least three kids there, not to mention the obligatory plus one or two. That's an average of five to six presents per kid, conservatively."

Molly snorted. "You're really counting on the plus one being another kid, aren't you? You're so hopeless, Bill. Anyway, what's the security like?"

"Two adults who are clearly distracted by a group of six-year-olds trying to eat ice cream cones without dripping anything on their white party shirts. And judging by the picture on their Facebook page, they've got at least three dogs at home, so I'm guessing they're either a very happy family or a very miserable one."

Molly nodded thoughtfully. "Okay, I like our chances. What's the payout?"

Bill grinned. "Well, if we get the high-end stuff – the iPhones, the video games, the Lego Mindstorms kits – we're looking at at least five grand, maybe more. But let's be real, we're probably going to get a bunch of old toys and Hand-Me-Downs we'll never be able to sell. So, maybe three grand, tops."

Molly's face set in a determined expression. "Well, in that case, let's make sure we get the good stuff. I'm thinking of taking out the catering service's walk-in freezer to raise the stakes. We can threaten to put all the ice cream cones in there to melt."

Bill groaned. "Molly, you want to turn this into a hostage situation? That's not how we get in and out quietly. Besides, what if the police show up? You do remember that time we accidentally took out the neighbor's mailman instead of the accountant?"

Molly shrugged. "Well, that was a minor setback. Okay, no walk-in freezer. But let's make this interesting – loser buys dinner at the most embarrassing restaurant in the city."

Bill hesitated before grinning. "Okay, you're on. But if we get caught, which we will – and I'm not saying I think we will, but hypothetically – the loser also buys bail and wears the getaway outfit for the subsequent police lineup."

The usually staid conference room seemed to shimmer with mischief as the two thieves shook on their bet, silently grinning the whole way through the handshake.

Evaluation details

Result

Evaluator

Details

Meta Data

35.00%

Matches word count

n/a

neededClean	false
words	482

4.94%

Dialogue to Total Word Ratio

Ratio: 77.76%, Deviation: 7.76%

neededClean	false
wordsTotal	490
wordsDialogue	381

19.9729%

id	"gen-4v6YmIfHnT96TiopHjHp2QymtRrL"
total_cost	0.0001434
upstream_id	"11ca6319-09cb-46c3-90fa-aef846e019fb"
created_at	"2024-07-23T16:51:05.444647+00:00" 7/23/2024, 4:51:05 PM
model	"meta-llama/llama-3.1-8b-instruct"
app_id	182717
streamed	false
cancelled	false
provider_name	"Fireworks"
latency	2506
moderation_latency	(null)
generation_time	0
finish_reason	"stop"
tokens_prompt	102
tokens_completion	645
native_tokens_prompt	70
native_tokens_completion	647
num_media_prompt	(null)
num_media_completion	(null)
origin	"https://ncbench.com/"
usage	0.0001434