NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKABS67XAVDWRW3GSCPZP8A
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
26.3s
Cost:
$0.014122
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly pressed her nose to the bakery window, studying the chaos across the street where balloons bobbed like surveillance drones over the community center. “We’re really doing this?” Bill muttered, clutching a paper bag that used to hold donuts and now just held grease stains. “You want to rob a bunch of six-year-olds.” “Correct,” Molly said. “Except technically, we’re liberating the presents from the tyranny of tiny ungrateful hands.” “That’s… worse,” Bill said. “That’s morally worse.” “It’s a rich kid’s party,” she said. “Look at those balloons. That’s helium money. Those are trust-fund balloons.” Across the street, parents unloaded gift bags from SUVs while children in superhero capes shrieked with sugar-fueled joy. “Helium isn’t a moral barometer,” Bill said. “Also, I think one of those kids just ate a cupcake wrapper.” “Exactly,” Molly said. “Do you really want to support a system that rewards that kind of behavior with more presents?” “I’d like to support a system where we don’t get tackled by a clown,” Bill said. “The clown is our alibi,” Molly said. “You’re the clown.” Bill blinked. “I did not agree to clown work.” “You’re already halfway there,” she said. “You panic, you trip over things, and your car honks every time you lock it. All we need is a wig.” “Not wearing a wig,” Bill said. “Wigs are a gateway drug to full-time clowning.” Molly dragged him away from the window and into the narrow alley between the bakery and the florist. “Focus,” she said. “Operation: Cradle of Greed begins in thirty minutes.” “That’s the name?” he asked. “Couldn’t we call it, I don’t know, Operation: Don’t Get Murdered by PTA Moms?” “Fine,” she said. “Operation: Pastel Thunder. Happy?” “That’s… disturbingly accurate,” he admitted, glancing at the cotton-candy-colored decorations. “Walk me through this again, but slowly, like I’m you.” “You wish,” she said. “Step one: You go in as Bubbles the Clown—” “Absolutely not.” “—entertaining the kids, causing mild chaos, nothing litigious. Step two: I slip in through the kitchen door with the caterers.” “You’re five-foot-two and allergic to aprons,” he said. “They’ll spot you immediately.” “I’ve been practicing my caterer face,” she said, and arranged her features into an expression of exhausted, menu-specific hatred. “See? I look like I’ve been yelled at about gluten since dawn.” “Okay, that’s disturbingly convincing,” Bill admitted. “Once inside,” Molly continued, “you direct the kids outside for games. I locate the Gift Mountain, sweep everything into the cart, and roll it out the back.” Bill squinted. “A cart?” “Laundry cart,” she said. “Classic. Nobody questions a woman with dirty tablecloths and deep emotional exhaustion.” “What about security?” he asked. “Parents, staff, sugar-crazed tiny witnesses—” “Parents will be on their phones, arguing with someone named ‘Chad’ about the cake,” she said. “Staff will be guarding the bar. Children trust clowns. That’s literally your whole job: be distractingly tragic.” “I don’t know how to clown,” he protested. “I can juggle exactly zero things.” “Perfect,” she said. “Just try to juggle and fail. Kids love failure. It’s inspiring.” “Won’t they cry?” he asked. “Then you do balloon animals,” she said. “I can’t do balloon animals,” he said. “Last time I tried, it exploded and took my dignity with it.” “Good,” she said. “You’re not making balloon animals; you’re making balloon narratives. ‘This one represents regret.’ Very avant-garde.” “That sounds like how we get sued by an art critic,” he said. A passing car honked, and they flattened against the bricks as though guilt were visibly radiating off them. “Look,” Molly said quietly, “that pile of presents is worth more than my car. I checked the registry.” “There’s a registry for a six-year-old’s birthday?” Bill asked weakly. “Branded scooter, miniature drone, child-size espresso machine,” Molly recited. “No six-year-old needs espresso. We’re basically public safety.” “What if one of the kids sees you with the cart?” he asked. “Then I say, ‘I work here, these are broken, go eat frosting,’” she said. “They’ll wander off like distracted raccoons.” “And if the parents see me?” he asked. “What do I, the clown, do?” “You cry,” she said. “Clown tears neutralize adult suspicion. It’s science.” Bill sighed, eyeing the building as shrieks of laughter leaked out through the open doors. “I can’t believe we’re doing this,” he said. “My criminal record is going to say ‘present theft.’” “Relax,” Molly said, straightening his collar. “Worst case, we donate half to charity and call ourselves chaotic-good.” “And best case?” he asked. “Best case,” she said, grinning, “we retire on black-market Lego money.” He swallowed. “Fine. But if a single balloon pops near me, I’m out.” “Deal,” she said. “Now let’s go crash a birthday.”
created_at
"2026-02-16T13:31:40.710526+00:00"
2/16/2026, 1:31:40 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4339
moderation_latency
(null)
generation_time
26197
tokens_prompt
77
tokens_completion
1681
native_tokens_prompt
66
native_tokens_completion
1404
native_tokens_completion_images
0
native_tokens_reasoning
246
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0141225
router
(null)
provider_responses
0
id
"resp_05c4a3aa8d6533700169931c2212a88193bca346bc2ec2e277"
status
200
is_byok
false
latency
166
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771248674-aIt6bFnqukjMZyZNf10h"
upstream_id
"resp_05c4a3aa8d6533700169931c2212a88193bca346bc2ec2e277"
total_cost
0.0141225
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
793
81.7904%
Dialogue to Total Word Ratio
Ratio: 72.01%, Deviation: 2.01%
neededClean
false
wordsTotal
793
wordsDialogue
571
40.8952%