NC Bench
Overview
About
Tests
Models
Model
GPT-4.1
Test
Dialogue tags
Scenario
Write 500 words with 30% dialogue
Run ID
2025-04-14-01JRTSSC65RC2NXNTVWD3BYNVM
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly pressed herself against the wall of Tina Trumble’s Inflatable Wonderland, peeking around the corner at a six-foot-tall giraffe made of balloons. Bill crouched next to her, clutching a rolled-up blueprint of the venue—scribbled with circles, arrows, and what appeared to be mustard. “Remind me again, why are we targeting children’s parties? Didn’t we agree we’d do jewel heists?” Bill whispered, his voice itchy with anxiety. Molly flashed him a grin. “You ever tried fencing a diamond in this town? Too much heat. But party favors? You can sell a brand-new Lego set faster than a guy named Vinny can say ‘receipt.’ Look.” She pointed through the window at a pile of gifts stacked higher than a four-year-old. “That’s three hundred quid in chocolates alone.” Bill frowned as a man in a clown suit juggled rubber chickens near the gift table. “I hate clowns. You ever see their eyes? There’s nothing behind them. Just pure circus.” “The plan, Bill. Focus.” He unrolled the blueprint on the grass. “We go in as entertainers. There’s the magician slot—some bloke named Sir Sparkle lost his keys or something. We swap in, do a few tricks, then nab the goods.” Molly’s eyes narrowed. “Magic? I don’t even know card tricks. Last time I tried, I set my sleeve on fire.” “Relax. I picked up a few bits off YouTube.” He fanned a deck of cards, dropping half in the process. “Ta-da!” She scooped the cards up. “Your magic might need actual magic. But we can improvise, right? Worst case, you distract ‘em with your juggling and I slip the presents into the laundry bins by the kitchen exit.” Bill pursed his lips. “Laundry bins?” “Party gets messy. Trust me, I’ve worked this demographic.” Molly grinned at a memory. “Besides, who’s gonna notice a mountain of gift-wrapped boxes going out among the dirty napkins?” He glanced through the window again, watching the children shrieking as they pinned a tail on a suspiciously grumpy looking donkey. “What about the kids? Kids have… senses. They can sniff out guilt.” “Not if we’re in disguise. Let me see your smile.” Molly squinted at him. Bill attempted a grin; he looked less joyful, more constipated. “We’ll work on it,” she muttered. “Okay, let’s synchronize. We walk in at three p.m., ten-minute magic routine, five-minute ‘juggling duel’—while you drop a smoke bomb, I stuff presents in the bin, kitchen door out, and back here before they’re blowing out the candles.” Bill’s hands trembled as he picked a bit of tinsel from his pocket. “And what if some sugar-crazed kid gets in the way?” “Bring extra balloon animals. Kids are easily distracted by something that can be popped.” A tiny dog barked from somewhere inside. Bill flinched. “You sure you want to do this? I mean, it’s not exactly The Italian Job… It’s more like The Sesame Street Job.” Molly pressed a plastic clown nose into his palm. “Bill, you want to pay the rent or not? Stick close. If you see a clown, just smile and do not—under any circumstances—accept any pies.” Bill straightened, looking at her with admiration and terror. “Let’s do it then.” He took a deep breath. “For Lego and profit.” Molly grinned. “That’s the spirit. Let’s go make some birthday wishes come true—for us.”
id
"gen-1744656183-31AsxVycuV4KLCKzFiPh"
upstream_id
"chatcmpl-BMJ3oGLS7Md8xi2UVoeRYsKmhDHiT"
total_cost
0.006286
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:43:18.345648+00:00"
Monday, April 14th, 2025 at 6:43:18 PM GMT+00:00
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
348
moderation_latency
99
generation_time
13238
tokens_prompt
68
tokens_completion
768
native_tokens_prompt
67
native_tokens_completion
769
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.006286
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
563
0.0000%
Dialogue to Total Word Ratio
Ratio: 62.52%, Deviation: 32.52%
neededClean
false
wordsTotal
563
wordsDialogue
352
0.0000%