NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHK989SFARY7NHSS66F0DAV8
Run Details
Model:
GPT-5.2
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
39.4s
Cost:
$0.021059
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind a hydrangea bush that smelled like soap and disappointment. Bill lurked beside her, wearing a black beanie that made his ears look like offended dumplings. From the yard came the unmistakable sounds of a children’s party: squeals, a tinny pop song, and someone yelling about balloons as if they were a controlled substance. “This is it,” Molly whispered. “Operation Gift Grab.” Bill squinted at the pastel chaos. “It’s called a party, Molly. They’re not guarding state secrets.” “They’re guarding presents,” she said. “Which are basically state secrets to eight-year-olds.” Bill adjusted his beanie. “Just to confirm, we’re stealing all the presents.” “All,” Molly said. “Every shiny box. Every crinkly bag. Even the weird ones shaped like dinosaurs.” Bill’s face twitched. “But… why would anyone wrap something shaped like a dinosaur?” “Children love dinosaurs,” Molly hissed. “Children also love their parents,” Bill whispered back. “Doesn’t mean I understand it.” Molly peeked through the leaves. “Look. Table by the fence. Pile of gifts. It’s like they want us to have them.” “It’s like they want the birthday kid to have them,” Bill said. “Different pronouns. Important.” Molly stared at him. “Don’t you start getting ethical on me now. Not when we’re ten yards from a mountain of potential profit.” “Potential profit?” Bill asked. “What’s inside? A thousand tiny plastic bricks? A singing toothbrush? Socks with little sharks on them?” “Sharks sell,” Molly said. “I once tried to fence a plush unicorn,” Bill muttered. “The guy offered me five dollars and a lecture on childhood joy.” Molly’s eyes narrowed. “Bill, focus. We go in, we act natural, we leave with… unnatural amounts of loot.” “Define ‘act natural,’” Bill said. “Because last time you tried to act natural, you complimented a security guard’s ‘excellent security vibes.’” “That was flirting,” Molly said. “With the concept of security,” Bill replied. Molly took a breath. “Okay. New plan. We blend in as parents.” Bill looked down at himself. “Do I look like a parent?” “You look like someone who’s been yelled at by a parent,” Molly said. “That’s close enough. I’ll do most of the talking.” “I don’t like this,” Bill said, watching a clown in the yard attempt a cartwheel and fail in slow motion. “There’s a clown.” “So?” Molly asked. “There’s always a clown,” Bill said. “Clowns ruin everything. They see too much.” Molly considered that. “Fine. We blend in as… hired entertainment.” Bill blinked. “We are not entertaining.” Molly pointed at the clown now lying on the grass. “The bar is underground.” Bill hesitated. “What would our act be?” “Distraction,” Molly said. “You juggle. I… do something mysterious.” “I can’t juggle,” Bill said. “You can’t steal either, and here we are,” Molly replied. Bill leaned in. “Molly, this is a kids’ party. What if we take all the presents and it’s, like, a bunch of books and puzzles and wholesome stuff? We’d be stealing… personal growth.” Molly’s mouth opened, then closed. “Okay, that sounded awful when you said it.” “And what if one of the gifts is, I don’t know, a handmade card? Are we gonna pawn sentiment?” Molly watched a kid sprint past with a paper crown, shrieking with pure, unfiltered joy. For a moment, her face softened in a way that alarmed Bill. “Don’t do that,” Bill whispered. “Don’t have feelings. Feelings are loud.” “I’m thinking,” Molly said. “Thinking leads to feelings,” Bill said. “It’s a gateway activity.” Molly sighed. “All right. Revised objective.” Bill perked up. “We’re not doing it?” “We’re doing it,” Molly said quickly, then hesitated. “But… not like that.” Bill waited. Molly nodded toward the gift pile. “We steal one thing.” “One?” Bill repeated, scandalized. “Molly, that’s not a heist. That’s a misunderstanding.” “One thing,” she insisted. “We take the clown.” Bill stared. “The clown is not a present.” “Then we return him,” Molly said. “To wherever clowns come from. It’s a public service.” Bill snorted despite himself. “You want to kidnap a clown out of compassion.” “Out of strategy,” Molly said. “If we remove the clown, the party improves. People will thank us. Maybe they’ll offer us cake.” Bill swallowed. “Cake is… pretty good.” Molly straightened her shoulders. “We’re going in.” Bill took one last look at the pile of wrapped joy. “If we end up doing the right thing by accident again, I’m quitting.” Molly smiled grimly. “Relax. It’ll still feel wrong. We’ll just be wrong in a helpful direction.”
created_at
"2026-02-16T13:12:31.241467+00:00"
2/16/2026, 1:12:31 PM
model
"openai/gpt-5.2-20251211"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
10972
moderation_latency
(null)
generation_time
39356
tokens_prompt
77
tokens_completion
1477
native_tokens_prompt
66
native_tokens_completion
1496
native_tokens_completion_images
0
native_tokens_reasoning
407
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0210595
router
(null)
provider_responses
0
id
"resp_0a711128b3096566016993179769248195af94ec33d4566f52"
status
200
is_byok
false
latency
140
endpoint_id
"f00142c2-6a93-49ce-9e36-5593b904ce3b"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.2-20251211"
api_type
"completions"
id
"gen-1771247511-xSec9jZ0xkH48bJFdFKS"
upstream_id
"resp_0a711128b3096566016993179769248195af94ec33d4566f52"
total_cost
0.0210595
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
739
0.0005%
Dialogue to Total Word Ratio
Ratio: 65.63%, Deviation: 15.63%
neededClean
false
wordsTotal
739
wordsDialogue
485
0.0002%