NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKAS4MY13XQ0S0MYR61RKYM
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
26.7s
Cost:
$0.017253
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The van idled across the street from the glittery suburban house, its windows pulsing with pink light and the faint thump of bubblegum pop. Balloons bobbed on the mailbox like a cheery crime deterrent. Molly flipped down the sun visor, studying the crude crayon map taped to it. “All right, Bill, walk me through it again. Where’s the treasure hoard?” “In the living room, under the unicorn banner,” Bill said, squinting at the house. “Right between the bouncy castle and the face-painting station. High traffic area. High frosting area.” “Unicorn banner, bouncy castle, frosting,” Molly muttered. “You’ve just described my personal hell.” “It’s a children’s party, not a demon summit,” Bill said. “How bad can it be?” “Last time we hit a kid’s party, I took a Capri-Sun to the eye,” Molly said. “I still see tiny straws when I close my eyes.” “Focus,” Bill said. “We’re professionals. We go in, we blend, we extract the presents, we go out. In, blend, extract, out.” “Professionals don’t wear this,” Molly said, flicking the elastic of her paper birthday hat. “I look like an accountant at gunpoint.” “You look festive,” Bill said. “It’s called deep cover.” “It’s called humiliating,” Molly said. “And why am I the one with the glittery ‘Birthday Princess’ sash?” “Because you have resting ‘I deserve presents’ face,” Bill said. “Lean into your strengths.” Molly peered through the windshield. “Explain again why we’re stealing from six-year-olds instead of, say, a bank?” “Banks have cameras, armed guards, and accountants with no sense of humor,” Bill said. “Children have jelly fingers and parents running on three hours of sleep. We choose the softer target.” “Right, because nothing says ‘soft target’ like thirty sugar-high goblins in party hats,” Molly said. “We’re going to be torn apart by kindergarteners like a pack of wild Chihuahuas.” “Not if we stick to the plan,” Bill said. “You go in first, as the party entertainer.” “I juggle lockpicks, not balloon animals,” Molly said. “What exactly am I entertaining them with, my crippling anxiety?” “Your charisma,” Bill said. “And the fact that you brought a piñata.” Molly looked at the duct-taped paper-mâché llama on the van floor. “Bill, this piñata looks like it was assembled in witness protection.” “It’s a distraction device,” Bill said. “The moment you hang it up, the kids will focus on beating candy out of it. While they go feral, I slip in, locate the present stack, and initiate extraction.” “Do you hear yourself?” Molly said. “You’re planning to weaponize candy.” “It’s called leveraging available resources,” Bill said. “Besides, I’ve done worse with a cupcake tower and a bubble machine.” “What about the parents?” Molly said. “Someone’s going to notice their loot mountain shrinking.” “That’s why you’re stalling them,” Bill said. “Do close-up magic. Parents love close-up magic.” “I know exactly one trick,” Molly said. “It’s called ‘your wallet is missing, sir.’” “Perfect,” Bill said. “Just don’t actually steal from the parents. That’s unprofessional. Today is about presents. Focus on our thematic niche.” Molly exhaled, fogging the windshield. “Okay, say you get to the mountain of gifts. Then what? You can’t just walk out holding a bike and a Barbie Dream House.” “That’s why we brought the gift bags,” Bill said, shaking a stack of suspiciously large polka-dot bags. “We repack everything. To them, it just looks like we’re helpful relatives loading the car.” “What helpful relative shows up at a kid’s party with no child?” Molly said. “We’re the only adults here without a small screaming hostage.” “We have an imaginary one,” Bill said. “His name is Dylan. He’s shy. He’s ‘in the bathroom’ for the duration of the heist.” “Dylan sounds suspicious,” Molly said. “Dylan sounds like he vapes and lies about his age.” “Dylan is six and likes dinosaurs and STEM toys,” Bill said firmly. “Know your fake nephew.” Molly pinched the bridge of her nose. “If anyone asks where Dylan’s parents are?” “Divorced,” Bill said immediately. “You’re the cool aunt, I’m the uncle who overcompensates with expensive gifts that, ironically, we have just stolen.” “Full emotional backstory, nice touch,” Molly said. “All right, timeline?” “Five minutes for you to win over the kids, three minutes for piñata chaos,” Bill said. “Four minutes for present extraction, two minutes for loading the van. Fourteen minutes total.” “Fourteen minutes,” Molly said. “Bill, I can’t even decide what to order at a restaurant in fourteen minutes.” “Well, luckily, this isn’t a restaurant,” Bill said. “It’s a tactical incursion into a frosting-heavy environment.” “Speaking of frosting,” Molly said, squinting at the window, “is that cake or structural foam on that thing?” “That is a three-tier rainbow monstrosity with edible glitter,” Bill said. “Do not get distracted by the cake. Cake is how they get you.” “Children or cops?” Molly said. “Who’s ‘they’ in this scenario?” “Both,” Bill said. “Anyone under four feet is basically undercover law enforcement with jam on their badge.” Molly stared at the house for a long moment. “We are going to kidnap joy,” she said. “We are going to relocate joy,” Bill corrected. “To a secure off-site location with excellent resale value.” “We’re going to get hexed by every Disney princess under that roof,” Molly said. “On the bright side,” Bill said, reaching for the door handle, “if we go down for this, nobody’s going to believe the arrest report.” Molly adjusted her sash, sighed, and opened her door. “Fine,” she said. “Let’s go ruin a childhood.”
created_at
"2026-02-16T13:38:58.91344+00:00"
2/16/2026, 1:38:58 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
3191
moderation_latency
(null)
generation_time
26634
tokens_prompt
77
tokens_completion
1896
native_tokens_prompt
66
native_tokens_completion
1717
native_tokens_completion_images
0
native_tokens_reasoning
391
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0172525
router
(null)
provider_responses
0
id
"resp_06073777bb11d92e0169931dd7cdec8194a1840b951a1efb8d"
status
200
is_byok
false
latency
251
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771249111-T97nVqDJ4JSL0dcNXIpP"
upstream_id
"resp_06073777bb11d92e0169931dd7cdec8194a1840b951a1efb8d"
total_cost
0.0172525
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
920
6.5117%
Dialogue to Total Word Ratio
Ratio: 77.39%, Deviation: 7.39%
neededClean
false
wordsTotal
920
wordsDialogue
712
3.2559%