NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKA002B31XXAN7K3CDRK16E
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
36.8s
Cost:
$0.013513
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly peered through the dusty blinds of the cramped storage unit. “You know,” she said, “when I pictured my life of crime, I didn’t think it would involve glitter, helium balloons, and a piñata shaped like a vegan unicorn.” Bill unfolded a crinkled invitation covered in dinosaurs. “Focus. This is a high‑value target. Twelve children, statistically each with at least one overcompensating relative. That’s an avalanche of presents.” “Right, but it’s a children’s party,” Molly said. “They’ll cry, Bill. Tiny, sticky faces full of betrayal. I’m not emotionally prepared for that.” “They’ll cry anyway,” Bill said. “Someone always cries. Balloons pop. Cake falls. Joey realizes friendship isn’t legally enforceable. We’re just…adding a subplot.” Molly straightened. “Run it again. Entry plan.” Bill pointed to a crude diagram on a pizza box. “Four p.m., the clown arrives. That’s you.” “I am not a clown, I am a performance artist in crisis,” she said. “And that wig smells like a hamster graveyard.” “Your crisis is how we get in. Parents are distracted. I follow behind with the magic kit.” She squinted. “Yes, the ‘magic kit’ that rattles suspiciously like bolt cutters, duct tape, and my deep regret.” Bill beamed. “Exactly. We mingle. You juggle something nonlethal. I scout the present table.” “You mean that mountain of commercialized affection by the window,” Molly said. “Where’s your exit?” Bill tapped the back door. “Kitchen. Sliding glass. Two locks, both insultingly optimistic. I slip out, grab the presents, load them in the laundry basket. Nobody suspects the laundry basket.” “Bill, they’re middle‑class parents,” Molly said. “They suspect everything. They Google side effects of oxygen.” “Which is why we have misdirection,” he said. “At four thirty, I trigger Operation Sugarquake.” She groaned. “Is that the thing with the Mentos and the two‑liter cola?” “Two two‑liters,” he corrected. “I drop them behind the bouncy castle. Massive soda geyser. Children shrieking with delight, parents shrieking about the deposit. You wheel the presents out like a stressed aunt taking gifts to the car.” “You want me to look like a stressed aunt?” Molly said. “I can’t, I moisturize.” “Fine, a stressed yet luminous aunt,” Bill said. “Point is, nobody questions the person rescuing the loot from the chaos.” She paced, clown shoes squeaking ominously. “What if a kid follows me? They’re like raccoons with emotional needs.” “You deploy Phase Two,” Bill said. “You say, ‘Help me, sweetie, these go in the car for the birthday boy’s next surprise.’ Boom. Co‑opted. They’re accomplices and legally implicated.” “That is not how the law works,” Molly said. “But I respect the confidence.” He folded the invitation carefully. “Then we drive away, redistribute the gifts on the Thieves Without Borders exchange, and use the cash to fund more important crimes. Like my student loans.” “Bill, we are not fencing Lego to pay off your degree in comparative puppetry.” “Then why did I even go to college?” he muttered. She sighed. “Fine. But I’m keeping any art supplies. Criminal empire or not, I have hobbies.” “Deal,” he said. “One more thing: if anyone asks why the clown left with the presents, you say—” “I say, ‘It’s part of the show, kids,’” Molly said. “And then vanish in a puff of poor life choices.” Bill grinned. “Now you’re thinking like a professional.” She picked up the red nose, rolling it between her fingers. “Professional what, exactly?” He opened the storage unit door, letting in a slice of afternoon light. “The kind of professionals who can walk into a room full of sugar‑wired children,” he said, “and come out richer, still alive, and only mildly traumatized.” Molly shoved the nose on with a resigned huff. “All right. Let’s go ruin some childhood memories.” “Optimism,” Bill said, locking the unit behind them. “We’re just…reallocating joy.”
created_at
"2026-02-16T13:25:25.285357+00:00"
2/16/2026, 1:25:25 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
11759
moderation_latency
(null)
generation_time
36759
tokens_prompt
77
tokens_completion
1121
native_tokens_prompt
66
native_tokens_completion
1343
native_tokens_completion_images
0
native_tokens_reasoning
421
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0135125
router
(null)
provider_responses
0
id
"resp_049e265d67c7517e0169931a9fe95c81949eb115233224ef45"
status
200
is_byok
false
latency
176
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771248287-BSv1exSqnhsg0O0J3lIi"
upstream_id
"resp_049e265d67c7517e0169931a9fe95c81949eb115233224ef45"
total_cost
0.0135125
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
627
1.1769%
Dialogue to Total Word Ratio
Ratio: 79.43%, Deviation: 9.43%
neededClean
false
wordsTotal
627
wordsDialogue
498
0.5884%