NC Bench
Overview
About
Tests
Models
Model
GPT-4.1
Test
Dialogue tags
Scenario
Write 500 words with 30% dialogue
Run ID
2025-04-14-01JRTSW1KWRQF9XHFE8QNNHT5F
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In the cramped backroom of Big Sal’s Pizza Emporium, Molly crouched over a hand-drawn map crusted in marinara stains. Bill, squat and wiry, eyed the crayon-sketched territory with the suspicion reserved for bad knock-knock jokes. “So, here’s the plan,” Molly said, tapping a green blob labeled ‘Bouncy Castle.’ “We go in through the kitchen—” “In through the kitchen?” Bill hissed, flattening himself against a pile of flour sacks. “That’s insane. There are always, like, three teenagers making dough back there.” “No one looks twice at pizza delivery people,” Molly retorted. “We grab two pizza boxes, slap on these hats, and boom: we’re invisible.” Bill regarded the neon-red ‘SAL’S’ cap as if it had crawled out from under the fridge. “‘Invisible’? The last time you said that, I ended up riding a pony covered in glitter paint and six-year-olds.” “Which worked,” Molly pointed out. “The distraction let us puncture the piñata and nab the cash.” Bill squinted at the map. “That’s not the same. Presents are bigger. And we’re up against…parents.” Molly slid a VHS tape across the table. “I’ve cased the joint. Parents sit in the cake zone, sipping decaf and wondering why they ever had children. The presents table”—she circled it in blue—“is right by the Bouncy Castle. If we time it for the ‘magic show,’ all eyes are front, hands are sticky, and our getaway route is clear.” Bill let out a slow, pizza-scented sigh. “I don’t like magic. Nothing good ever happens during magic.” “We’re not staying for the grand finale,” Molly grinned. “Once we snag the loot, we slip past the ball pit, out the emergency exit, and into Sal’s delivery van.” Bill picked at a pepperoni. “How much loot are we talking? I don’t need another sack of dollar-store Legos.” She grinned. “Rumor has it, the birthday kid’s grandma sprung for a remote-control drone and someone’s bringing an actual PlayStation.” Bill’s eyes widened. “You’re not full of anchovies?” “Cross my heart. ‘Big Jake’ himself said so.” He hesitated, glancing at the biggest obstacle on Molly’s map: a stick figure labeled ‘Aunt Marge—Spots Everything.’ “How do we get past Marge?” he whispered. “Distraction,” Molly said smoothly. “You, my friend, are going to start a cheese fight.” His mouth fell open. “You want me to hurl mozzarella at a bunch of kids?” “No, Bill. You let one of the kids nail you with cheese. You yell ‘Food fight!’ and the rest writes itself.” He groaned. “I knew it’d end with cheese.” Molly reached for his shoulder. “Look, you’re the best distraction in the business. No one flings food like you.” “My reputation precedes me,” Bill muttered. “That’s settled then,” Molly stood, stretching. “We synchronize watches. 3:15, on the broccoli minute. In and out.” Bill peered at his watch, which was crusted with what might have been pineapple. “Fine. But if I get hit with birthday cake again, I’m switching to jewel heists.” Molly grinned, tucking the crayon map into her jacket. “You say that every time.” Bill hugged a pizza box. “This better be worth it.” She winked. “Present and accounted for, partner.” As they slinked out, the scent of cheese and conspiracy trailed in their wake. The party—and the presents—awaited.
id
"gen-1744656271-mNb8GOfBUBD9reRMB1kU"
upstream_id
"chatcmpl-BMJ5DVi875bYT8ZnBWTwzIdb5Cwke"
total_cost
0.006398
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:44:41.23548+00:00"
Monday, April 14th, 2025 at 6:44:41 PM GMT+00:00
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
430
moderation_latency
81
generation_time
8890
tokens_prompt
68
tokens_completion
782
native_tokens_prompt
67
native_tokens_completion
783
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.006398
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
546
0.0000%
Dialogue to Total Word Ratio
Ratio: 65.75%, Deviation: 35.75%
neededClean
false
wordsTotal
546
wordsDialogue
359
0.0000%