NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0A0XZ6C5ME5M1QKAQPBG4W
Run Details
Model:
GPT-5.4 Mini (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
7.3s
Cost:
$0.004226
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread a crinkled party flyer across the kitchen table like a general unrolling a battle map. “There it is,” she whispered. “Number twelve, Beech Lane. A children’s party. Cupcakes. Balloons. Organized chaos.” Bill peered over her shoulder, squinting at the paper. “Why are children’s parties always listed with exclamation points? Do they think punctuation makes them safer?” “It makes them harder to respect,” Molly said. “Which is perfect for us.” Bill sat back and folded his arms. “I still think stealing presents from children is a bit below our usual standard.” Molly raised an eyebrow. “Our usual standard includes robbing rich people who can afford to be robbed. This is the same, except smaller chairs.” Bill tapped the flyer. “What exactly is the angle?” “The angle,” Molly said, “is that adults at children’s parties are exhausted, overcaffeinated, and emotionally compromised. They are juggling balloons, wet wipes, and a bouncy castle they didn’t read the manual for. No one notices a couple of harmless-looking guests walking out with a pile of gift bags.” Bill frowned. “Harmless-looking? You?” “I can look harmless.” “You look like you’d negotiate a hostage exchange with a stapler.” Molly grinned. “Thank you. That’s the nicest thing you’ve said all week.” He leaned forward. “How many presents are we talking?” Molly pulled a notebook from her coat. “According to the source, at least twenty-seven. Remote-control drone, seven building sets, three identical teddy bears, one extremely expensive train set, and something called a ‘magical emotion robot.’” Bill blinked. “Emotion robot?” “Apparently it reassures children when they are upset.” “Does it work on adults?” “Only if adults are simple.” Bill snorted. “So, what’s the plan, precisely?” “Phase one: arrive separately, dressed as responsible people.” Bill looked down at his cardigan. “I do have a cardigan.” “Exactly. You’ll look like a man who can be trusted near a punch bowl.” “And you?” “I’ll be the aunt.” “You don’t look like an aunt.” “I will wear a scarf and speak in a lower register.” Bill nodded slowly. “Right. And phase two?” “Phase two: make the organizers think we’re there for the children’s entertainment.” Bill stared at her. “What entertainment?” Molly reached into her bag and placed two collapsible origami rabbits on the table. “I found them in a drawer.” Bill examined the paper rabbits. “Those are appalling.” “They’re unforgettable.” “They’re the kind of unforgettable that gets you remembered in a police sketch.” “Only if the police are very artistic.” He laughed, despite himself. “And phase three?” “Phase three is the extraction. While everyone’s distracted by sugar, noise, and whatever demented game involves passing a parcel to the tune of a kazoo, we collect the presents, load them into the laundry hamper, and walk out through the side gate.” Bill was quiet for a moment. “That feels too easy.” Molly pointed at him. “That is how you know it’s dangerous.” He drummed his fingers on the table. “What if the children notice?” Molly shrugged. “Children are famously unreliable witnesses.” “That is a terrible sentence.” “It’s also true.” Bill looked at the flyer again. “What about the birthday child?” Molly softened. “We leave that one a present.” Bill squinted. “We are stealing all the presents.” “Exactly. One small gesture keeps the moral compass spinning.” He sighed. “You are impossible.” “And yet,” Molly said, folding the flyer, “you’re still here.” Bill stood, smoothing his cardigan. “Fine. But if we’re caught, I’m saying you forced me into it.” Molly tucked the notebook into her coat. “If we’re caught, we were never here.” He opened the door, then paused. “Molly?” “Yes?” He glanced back at the table, at the little paper rabbits, at the ridiculous plan. “If there’s cake, we’re not stealing the cake.” Molly smiled. “Bill, we are thieves, not animals.”
created_at
"2026-03-18T11:06:49.716Z"
3/18/2026, 11:06:49 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2788
moderation_latency
(null)
generation_time
7074
tokens_prompt
77
tokens_completion
1275
native_tokens_prompt
66
native_tokens_completion
928
native_tokens_completion_images
0
native_tokens_reasoning
42
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0042255
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0d8ebf40231c92550169ba8749cec48196a88fb58d1caa1ace"
is_byok
false
latency
239
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832009-AMs5bcDYeha3WI3zJMJp"
upstream_id
"resp_0d8ebf40231c92550169ba8749cec48196a88fb58d1caa1ace"
total_cost
0.0042255
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
626
0.00%
Dialogue to Total Word Ratio
Ratio: 72.66%, Deviation: 22.66%
neededClean
false
wordsTotal
629
wordsDialogue
457
0.0000%