NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHK9K7AJCN8V8M5XE66H8K19
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
44.0s
Cost:
$0.019223
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The surveillance van was technically just Bill’s mom’s minivan with a pair of binoculars on the dashboard and a “LIVE LAUGH LOVE” decal that greatly diminished any sense of menace. Molly sat behind the wheel, watching the inflatable unicorn in the yard sway accusingly in the breeze. “You understand we're about to rob a herd of toddlers, right?” she asked, tapping the steering wheel. Bill peered through the binoculars as if expecting laser grids instead of bunting and a bubble machine. “Technically, we're redistributing resources from the sugar-rich to the poor.” “The poor being you, me, and your questionable gaming habit.” “Hey, those loot boxes aren't going to open themselves, Molly.” On the lawn, a woman in a glittery sash that read MOMAGER arranged a table of color-coded cupcakes while a banner screamed HAPPY 6TH, KAYLA. The presents table was groaning under a rainbow of wrapping paper. From here, Molly could practically hear their crinkly, expensive secrets. “Focus. Party starts at three, clown arrives at three fifteen.” Bill checked his watch, then the clown icon Molly had drawn on their “blueprint,” which was, in fact, a stolen coloring sheet. “I'm still emotionally processing that the clown outranks us operationally.” “He controls balloon distribution; that makes him high command.” They had cased banks, museums, even a casino with its own moat and a very alert swan. None of those targets had unnerved Molly as much as the memory of a toddler with frosting strength and no moral compass. “So we stick to the plan: infiltrate, acquire, exfil, diabetes.” “There is no diabetes phase, Bill, only cardio-related consequences.” He pushed his magician’s cape aside to reach for the duffel bag at his feet. It contained rope, duct tape, a collapsible wagon, and an emergency juice box because Molly had insisted on contingency planning. “Fine, we steal, we run, we question our life choices.” Molly adjusted her top hat in the rearview mirror. The reflection of the unicorn bobbed over her shoulder, judgmental and pink. “Masks on at two fifty-eight; we go in as magicians.” Bill’s own hat was slightly crushed, his eyeliner wing unintentionally smoky. “If anyone asks, I'm the assistant who tragically lost the rabbit.” “You lost the rabbit because you were emotionally unavailable, obviously.” He snorted, then watched as a stream of tiny guests spilled onto the lawn, each clutching a gift bag like a small, sticky briefcase. Somewhere inside, Katy Perry began to compete with a blender. “Do we at least let the kids keep one present?” “Yes. The educational one. We're thieves, Bill, not complete monsters.” A gust of wind sent a paper plate skidding across the sidewalk like a white, wobbly omen. Molly started the engine, the minivan trembling with the effort. “Remember,” she said, eyeballing the chaos ahead, “children are short, unpredictable, and often armed with glitter. Eyes up, hands quick, no sudden movements.” Bill swallowed. “And if we’re caught?” Molly shifted into drive. “Then we do the scariest trick of all,” she said. “We volunteer for the clown’s audience participation.”
created_at
"2026-02-16T13:18:33.498122+00:00"
2/16/2026, 1:18:33 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
22554
moderation_latency
(null)
generation_time
44013
tokens_prompt
77
tokens_completion
983
native_tokens_prompt
66
native_tokens_completion
1914
native_tokens_completion_images
0
native_tokens_reasoning
1233
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0192225
router
(null)
provider_responses
0
id
"resp_0e6c3f4f25995d5701699318fd57f08196be83ad5fa31b271f"
status
200
is_byok
false
latency
147
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771247869-4gHE9TYhyyBxieFuK2OS"
upstream_id
"resp_0e6c3f4f25995d5701699318fd57f08196be83ad5fa31b271f"
total_cost
0.0192225
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
93.6496%
Matches word count
n/a
neededClean
false
words
509
4.3529%
Dialogue to Total Word Ratio
Ratio: 37.92%, Deviation: 7.92%
neededClean
false
wordsTotal
509
wordsDialogue
193
49.0012%