NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHK9P2MHPB633Y27YSSPW1YR
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
29.4s
Cost:
$0.013203
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“We’re really doing this, then? Robbing a birthday party?” Molly flicked a sprinkle off the blueprint like it insulted her. “Technically,” Bill said, “we’re liberating underappreciated consumer goods from minors who can’t even spell ‘appreciated.’” “They can spell ‘scream,’ though. They’ll weaponize that. Have you ever heard twenty sugar-high eight-year-olds hit the same note?” “I did a job once at a trampoline park,” Bill said. “I still wake up at night hearing the squeak of foam cubes. This’ll be easier.” “You say that now,” Molly muttered. “Run it again.” “Fine.” Bill tapped the floor plan of the community center. “The party is in Multipurpose Room B. Our entrance is the side door next to the dumpsters.” “Romantic.” “At sixteen hundred hours, the clown arrives,” Bill went on. “Parents distracted. Children hypnotized by balloon animals. That’s our window.” “Question,” Molly said. “What if the clown is actually good at their job?” “No such thing as a good clown.” “My therapist would like a word.” “Focus,” Bill said. “Step one: we walk in like we belong there.” “How does anyone ‘belong’ at a kids’ party while over thirty and not holding a diaper bag?” “We’re the—uh—‘interactive gift experience coordinators.’” “That is not a job.” “It is if we say it confidently.” “Bill, you once told a security guard you were a ‘stapler inspector.’” “He let me through, didn’t he?” “He also made you test every stapler on the floor for twenty minutes.” “Occupational hazard,” Bill said. “Anyway. We blend in, we locate the presents table, we execute the switch.” Molly frowned. “Explain the switch again. Slowly. And pretend I’m not willing to go to prison because you mis-labeled a tote.” “Two identical gift bags,” Bill said, holding them up. “Bag A: real presents. Bag B: decoys—tissues, empty boxes, a sweater my aunt knitted that personally offended me.” “So we swipe Bag A, leave Bag B, and no one notices until the kids are home crying into a crocheted insult.” “By that time,” Bill said, “we’re gone. Hearts light, pockets heavy, moral compasses… lightly magnetized.” “And the bouncer?” Molly asked. “There’s always a bouncer. Dads with folded arms. Moms with ‘I run the PTA and the world’ energy.” “That’s why we have disguises.” Bill pushed a package toward her. “Put on the tiara.” “I’m not wearing glitter into a crime.” “It’s not glitter,” Bill said. “It’s operational camouflage.” “It says ‘Birthday Princess.’” “Exactly. No one suspects the Birthday Princess.” “They will when they realize the actual birthday girl is four feet shorter and crying.” “You’re thinking too small,” Bill said. “You’re royalty. You’ve simply… promoted yourself.” Molly sighed, then jammed the tiara on. “If this thing leaves sparkles in my hair, I’m stealing you a conscience and making you use it.” He opened another bag. “And I will be… the magician.” “You can’t do magic.” “I can disappear with valuable objects,” Bill said. “That’s basically the same degree.” “What’s your magician name, then? ‘The Great Liability’?” “‘Billusion.’” “I refuse to participate in this.” “You already put on the tiara.” “Under protest.” “Protest denied,” Bill said. “Okay. Last issue: cake time. That’s our riskiest moment. All eyes on the candles. We move during the song.” “You want to rob a child in the middle of ‘Happy Birthday.’ Who hurt you?” “My childhood,” Bill said. “No one ever stole my presents. I grew up soft.” Molly studied the blueprint, then the gift bags, then her reflection in the plastic tiara. “Fine. We go in, we smile, we coordinate the interactive gift experience, we don’t traumatize anyone with lasting therapy bills.” “Define ‘lasting.’” “Bill.” “Okay, okay. Minimum therapy.” She snorted. “Let’s go disappoint some children—quietly, efficiently, and with outstanding teamwork.” “That,” Bill said, “is the most beautiful crime toast I’ve ever heard.”
created_at
"2026-02-16T13:19:52.916917+00:00"
2/16/2026, 1:19:52 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
5040
moderation_latency
(null)
generation_time
29325
tokens_prompt
77
tokens_completion
1443
native_tokens_prompt
66
native_tokens_completion
1312
native_tokens_completion_images
0
native_tokens_reasoning
384
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0132025
router
(null)
provider_responses
0
id
"resp_011d3ead6a6b110b016993195aea8881978b9cb7618656fb97"
status
200
is_byok
false
latency
235
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771247962-pkPtvi7ImNcOkae0kqkF"
upstream_id
"resp_011d3ead6a6b110b016993195aea8881978b9cb7618656fb97"
total_cost
0.0132025
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
633
0.0000%
Dialogue to Total Word Ratio
Ratio: 85.47%, Deviation: 35.47%
neededClean
false
wordsTotal
633
wordsDialogue
541
0.0000%