NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKAS7B64XRM1Y1EQ47HW2Y2
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
1m 24s
Cost:
$0.067233
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly pressed her nose against the bakery window overlooking the community center. Balloons bobbed above the building like a weather forecast for chaos. “Look at that,” she said. “Tiny humans. Mountains of unguarded loot.” “Presents,” Bill corrected. “We call them presents until they’re in the van.” “They’ll be in the van in, what, an hour?” Molly asked. “Forty-five minutes if the magician bombs.” Bill unfolded a crumpled flyer. “Two p.m., ‘Timmy’s Superhero Birthday Blast.’ Capes, cake, and consumerism. It’s beautiful.” “Superhero party,” Molly mused. “So they’ll all believe they have powers?” “Exactly,” Bill said. “If anyone sees us, they’ll blame Laser Boy or Glitter Girl.” Molly peered across the street. “Is that a bouncy castle?” “Tragically, yes,” Bill said. “One misstep and we’re launched into the sky like criminal confetti.” “They’re kids,” Molly said. “Their main skill is sticking fingers into things labeled ‘don’t touch.’” “That’s also our main skill,” Bill pointed out. “Professional kinship. Now, entry plan.” “Front door,” Molly said. “We walk in like we belong.” “As what?” Bill asked. “Aggressively underdressed relatives?” “Party entertainers,” Molly said. “We juggle. We smile. We relocate gifts.” “You can’t juggle,” Bill reminded her. “I can drop things repeatedly with flair,” Molly said. “Children love slapstick.” Bill rummaged in his bag. “I got disguises. Behold: clown wigs.” Molly recoiled. “Absolutely not. I’m a thief, not a nightmare.” “You robbed a bank in a pineapple costume,” Bill said. “This is a step up.” “That was conceptual,” Molly said. “Fruit-based social commentary.” “Clowns are traditional,” Bill insisted. “Parents see clowns, they assume someone else hired us.” “Clowns are what you see in court right before sentencing,” Molly muttered. “Fine. But I’m a tasteful clown.” “There is no tasteful clown,” Bill said. “You’ll be Giggles. I’ll be Mr. Birthday.” “I refuse to answer to Giggles,” Molly said. “Call me… Cashmere.” “You’re a clown named Cashmere?” Bill asked. “That sounds like a perfume for emotional damage.” Molly sighed. “Okay, fine, Giggles. Where are the presents?” Bill pointed. “Under that big superhero banner.” “The one that literally says ‘PRESENTS’?” Molly asked. “Subtle.” “They’re six,” Bill said. “If it isn’t labeled, they cry.” “Children cry anyway,” Molly said. “It’s their cardio.” “Good,” Bill said. “Crying is noisy. Noise is cover.” “How do we time it?” Molly asked. “Cake moment,” Bill said. “Lights go off, everyone sings, sugar fogs their minds. We swoop.” “And if some kid sees us dragging a wagon of gifts?” Molly asked. “We compliment their cape and say we’re taking them to the Secret Superhero Vault,” Bill said. “Kids love vaults.” Molly hesitated. “What if Timmy cries when he opens nothing?” “Timmy will cry regardless,” Bill said. “Because someone will give him socks shaped like moral lessons.” “So we’re preventing that,” Molly said. “Robin Hoods of plastic.” “Exactly,” Bill said. “We redistribute cheap toys to needy thieves.” Molly adjusted the wig with a grimace. “If this plan fails, I’m haunting you.” “In that wig,” Bill said, “you’ll haunt me either way.” She straightened, squared her shoulders, and nodded toward the community center. “Come on, Mr. Birthday,” Molly said. “Let’s go make some children slightly less fortunate.”
created_at
"2026-02-16T13:39:58.963247+00:00"
2/16/2026, 1:39:58 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
3913
moderation_latency
(null)
generation_time
84013
tokens_prompt
77
tokens_completion
2052
native_tokens_prompt
66
native_tokens_completion
6715
native_tokens_completion_images
0
native_tokens_reasoning
5891
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0672325
router
(null)
provider_responses
0
id
"resp_04c1ed3f1cfa78e00169931dda8e448190a5cb736a555cdfe7"
status
200
is_byok
false
latency
141
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771249114-2hWacpJT1cIEPzRU5fy1"
upstream_id
"resp_04c1ed3f1cfa78e00169931dda8e448190a5cb736a555cdfe7"
total_cost
0.0672325
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
14.3014%
Matches word count
n/a
neededClean
false
words
521
33.6992%
Dialogue to Total Word Ratio
Ratio: 74.66%, Deviation: 4.66%
neededClean
false
wordsTotal
521
wordsDialogue
389
24.0003%