NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKA1NPFJQQBZN4RC73T9S88
Run Details
Model:
GPT-5.2
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
2m
Cost:
$0.110086
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Okay, Bill," Molly whispered, crouched behind the hedge. "We hit the party at two-thirty. The sugar crash begins at three. Presents are our window." Bill squinted through his binoculars. "There are clowns." "Clowns are just thieves with makeup," Molly said. "Stay focused." "They have a bouncy castle," Bill murmured. "That’s basically a moat." Molly tapped the sketchpad on her knee. "We go in as entertainment. You juggle. I do balloon animals." "I can’t juggle." "You can drop things rhythmically. That counts." Bill sniffed. "What about the parents?" Parents are easy. Compliment their hummus and they’ll confess their bank codes. He shifted, and a twig snapped under his shoe. A nearby chihuahua barked like a smoke alarm with opinions. "Shh," Molly hissed. "We are shadows." "I’m more of a smudge," Bill said. Molly pointed at the driveway. See the gift table by the sliding door? Thirty-two bags. Two bikes. One suspiciously large box shaped like a drone. Bill licked his lips. "I always wanted a drone." "You always wanted a printer too, and look how that ended." "It ended in paper jams and tears." "Exactly. Now costumes." Molly rummaged in her backpack and produced two party hats and a plastic tiara. "Disguise is confidence with glitter." Bill held the tiara up. "I’m not wearing this." "You’re wearing it," she said. "If anyone asks, you’re the Birthday Princess’s uncle from Sweden." "My accent sounds like a lawnmower choking." "Perfect. Swedes love lawn care." Bill watched a dad carry out a tray of cupcakes. "The security is… frosting-based." "Don’t underestimate sprinkles," Molly said. "They’re edible shrapnel." He frowned. "What’s the exit plan?" We blend out with the departing children. They’re already trained to clutch bags and flee. Bill chewed his thumbnail. "Won’t anyone notice two adults leaving with forty presents?" We’ll stagger them. You leave with the big items, because you look like a man with back problems. "I do have back problems." "Method acting," she said. I leave with the gift bags, because I have the face of a woman who’d like to speak to the manager. Bill’s eyes widened. "You do." Molly plucked a strand of tinsel from his ear. "Remember: smile. Say ‘Aww.’ Make your eyes go shiny." "My eyes only go watery," Bill said. "Then think of your childhood," Molly said. "Or taxes." He took a deep breath. What if the kid opens the presents early? "Then we improvise," Molly said. "We steal the cake. Nobody guards cake." Bill hesitated. "Is this… too mean?" Molly stared at the banner, then at the pile of perfectly wrapped loot. "Bill, we’re not monsters. We’re professionals." "Professionals who rob eight-year-olds," he said. "We’re redistributing joy," she replied. "From the overgifted to the underfunded. Us." Bill nodded slowly. "Okay. On three?" "On three," Molly said. "One. Two. Three." They both stayed crouched. Bill whispered, "I thought you meant go on three." "I meant commit emotionally," Molly said. "Now go. And for the love of crime, don’t eat the piñata."
created_at
"2026-02-16T13:27:43.203309+00:00"
2/16/2026, 1:27:43 PM
model
"openai/gpt-5.2-20251211"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4726
moderation_latency
(null)
generation_time
119883
tokens_prompt
77
tokens_completion
2225
native_tokens_prompt
66
native_tokens_completion
7855
native_tokens_completion_images
0
native_tokens_reasoning
7114
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.1100855
router
(null)
provider_responses
0
id
"resp_07d1723a8281f1c80169931ad6d2ec81969e09ece6808083b3"
status
200
is_byok
false
latency
167
endpoint_id
"f00142c2-6a93-49ce-9e36-5593b904ce3b"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.2-20251211"
api_type
"completions"
id
"gen-1771248342-l81VwcJP4ruc8RUyy8g9"
upstream_id
"resp_07d1723a8281f1c80169931ad6d2ec81969e09ece6808083b3"
total_cost
0.1100855
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.9990%
Matches word count
n/a
neededClean
false
words
501
97.5893%
Dialogue to Total Word Ratio
Ratio: 50.70%, Deviation: 0.70%
neededClean
false
wordsTotal
501
wordsDialogue
254
98.7942%