NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHK9KM7M4DVF8ZJWFWJWGC8E
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
50.7s
Cost:
$0.012632
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again why we’re robbing a six-year-old,” Bill whispered, peering over the dashboard at the balloon-infested front lawn. “We’re not robbing the six-year-old,” Molly said. “We’re liberating the presents from the tyranny of birthday capitalism.” “That tyrant is currently wearing a paper crown and Spider-Man pajamas.” “Exactly. No one that sticky should have that much power. Hand me the binoculars.” “You brought binoculars to a clown-themed party?” “I bring binoculars everywhere. What do you see?” “A bouncy castle, three parents asleep in lawn chairs, a magician arguing with a squirrel, and a cake shaped like a dinosaur with an inferiority complex.” “Perfect,” Molly said. “Chaos, sugar, and sleep-deprived adults. This is what professionals call an opportunity.” “Professionals also call this ‘a new low.’” “Bill, focus. Presents are usually stacked near the cake or under some decorative monstrosity. I’m betting that glitter explosion by the patio is the gift table.” “Or a portal to another dimension.” “Either way, we’re walking out richer.” “Do you realize,” Bill said, “that every box in there contains socks, educational puzzles, and things that make noise?” “Noise-making devices sell like gold in the black market of revenge-aunties. Now, timeline. When does the chaos spike?” “Based on my advanced research—” “You mean your childhood?” “—yes, based on my traumatic formative years, the peak insanity hits right after the magician pulls something disappointing out of a hat.” “Like a rabbit?” “Like taxes, Molly. Kids expect dragons now. If he produces anything with fur and a mortgage, they riot.” “Riot equals distraction,” Molly said. “Distraction equals we stroll in, grab the loot, stroll out. Easy.” “Define ‘stroll,’ because those kids are moving like caffeinated piranhas.” “We blend,” Molly said. “We smile, we wave, we say we’re from the entertainment agency.” “What entertainment agency?” “The… Fun Unit.” “The Fun Unit?” “You have a better acronym?” “Fun Resource Entertainment Department?” “So we’re from FRED,” Molly said. “Beautiful. You’re Fred, I’m… also Fred. We’re a Fred-based initiative.” “Parents are never going to buy that.” “Have you met parents at a kids’ party, Bill? They’d believe in a tax-deductible unicorn if it meant five minutes of silence.” “What’s our cover act, then, Ms. Fun Unit?” “Gift Quality Control,” Molly said promptly. “We ‘test’ the toys for safety.” “By removing them from the premises.” “Precisely. It’s a public service.” “And if the birthday kid cries?” “We tell him crying is how you level up in life. Builds character.” “That’s the darkest thing you’ve ever said.” “Character is important,” Molly insisted. “Okay, look sharp. The magician’s about to start. See the hat?” “I also see a rabbit trying to chew its way to freedom.” “If that rabbit escapes, we’ve got thirty seconds before pandemonium.” “So on your signal, we just… walk in?” “Stride with purpose,” Molly said. “No one questions people who stride. It’s a known law of nature.” “And what if someone actually does question us?” “You let me handle the talking. You handle the looking apologetic.” “I was born apologetic.” “Exactly why you’re my partner. Ready?” “No. Are we doing it anyway?” “Of course.” “Fine,” Bill sighed. “But if I get hit with a piñata stick again, I’m unionizing.” “You’re already in a union,” Molly said. “The International Guild of Poor Life Choices.” “I want dental this time.” “You have all your teeth.” “For now.” “On three,” Molly said. “One… two…” The magician pulled the rabbit from the hat. It promptly leaped onto his face. “Three,” Molly breathed. “Showtime, Fred.” “Please stop calling me Fred.” “Too late, Fred. Smile. We are joy incarnate.” “We are crime incarnate.” “Crime can have nice packaging. Now, shoulders back, chin up. You’re here to improve these children’s lives.” “By stealing their stuff.” “By sparing them from educational socks. Trust the mission.” “I can’t believe this is my job,” Bill muttered as they opened the van doors. “It’s not a job,” Molly said. “It’s a calling.” “A calling to rob a dinosaur cake.” “Leave the cake,” Molly said. “We’re not monsters.”
created_at
"2026-02-16T13:18:53.419906+00:00"
2/16/2026, 1:18:53 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
7125
moderation_latency
(null)
generation_time
50164
tokens_prompt
77
tokens_completion
1403
native_tokens_prompt
66
native_tokens_completion
1255
native_tokens_completion_images
0
native_tokens_reasoning
285
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0126325
router
(null)
provider_responses
0
id
"resp_0371c7416b5743ad016993190a90748190b59b153f4999f295"
status
200
is_byok
false
latency
160
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771247882-KyeLKiWNsLsk0O10Ojy1"
upstream_id
"resp_0371c7416b5743ad016993190a90748190b59b153f4999f295"
total_cost
0.0126325
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
675
0.0000%
Dialogue to Total Word Ratio
Ratio: 90.37%, Deviation: 20.37%
neededClean
false
wordsTotal
675
wordsDialogue
610
0.0000%