NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM07B6GHS25QBZNECH1FN019
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
6.1s
Cost:
$0.004950
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
72.52%
Dialogue to Total Word Ratio
(details)
36.2579%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly pressed her face to the rain-streaked window of the van and stared at the brightly lit house across the street. “There. Blue balloons. Clown banner. One excellent opportunity.” Bill adjusted his cheap silk gloves. “You say that like it’s a bank vault.” “It is, if the bank vault were full of dinosaurs made of wrapping paper.” Bill squinted. “How many presents are we expecting?” “According to our intelligence,” Molly said, tapping a crumpled flyer, “twenty-six.” Bill whistled. “That’s a lot of childhood joy to carry off.” “Try not to say it like you’re proud of yourself.” “I am proud of myself. This is a sophisticated operation.” Molly looked at him. “You brought a whistle.” “For emergencies.” “What emergencies?” “I don’t know yet. That’s what makes them emergencies.” She took a long breath. “Listen carefully, Bill. We go in through the side gate, past the inflatable giraffe, and into the conservatory. The party is in the garden. Parents distracted, children sugared to the point of collapse. We move quickly.” Bill nodded. “In and out. Clean. Elegant.” “Exactly.” “And the presents are in—” “Under the craft table.” Bill frowned. “Under the craft table?” “Why are you surprised?” “Because that’s the kind of hiding place invented by someone who has never met a thief.” “They’re parents, Bill. Their security is emotional.” He looked impressed. “You are really good at this.” “I know.” “Should we have disguises?” Molly held up two paper party hats. “We do.” Bill took one and examined it. “This is less disguise and more confession.” “You want glamour, date a magician.” Bill put the hat on anyway. “Do I at least get a fake mustache?” “You already have one. It’s just bad.” He reached up and touched his upper lip. “This is a strong look.” “This is a look that says, ‘I have a history of poor decisions.’” “Perfect. The children will never suspect me.” Molly pointed toward the house. “Focus. We need the gift bags first. If anyone asks, we’re the entertainment.” Bill brightened. “I can do entertainment.” “No, Bill. We are not letting you do entertainment.” “I’ve got a routine.” “You’ve got a condition.” “I’ve got a scarf and a deck of cards.” “You once accidentally hypnotized yourself.” “That was one time.” “You saluted a lamp for forty minutes.” “The lamp had presence.” Molly pinched the bridge of her nose. “Here’s the plan. I distract the host. You wheel in the trolley.” Bill glanced at the shopping trolley in the back of the van, its wheels painted gold. “This trolley looks expensive.” “It was when I stole it.” “Charming. What do I distract the host with?” “Say you’re from catering.” “I don’t know anything about catering.” “Perfect. Confidence is ninety percent of catering.” “And the other ten?” “Confused smiling.” Bill leaned closer. “What if the children follow us?” “They won’t. We’ll bring the bowl of party rings.” “That feels morally dubious.” “We are stealing all the presents, Bill.” “I know, but party rings are a bridge too far.” Molly opened the door and the cold night rushed in. “You wanted a beautiful crime. This is it.” Bill peered at the house again. “I’ve never robbed a child’s birthday before.” “Neither have I.” He blinked. “That doesn’t reassure me.” “It should. It means we’re being ambitious.” He picked up the trolley handle. “If this goes wrong, I want it known I was against the clown.” “There is no clown.” “There should be. It would explain the fear in my soul.” Molly smiled thinly. “Ready?” Bill took a deep breath. “For the record, I think this is a terrible idea.” “Yes.” “And I think whoever invented gift wrap should be in prison.” “Also yes.” “And if there is cake, I’m having some.” “Absolutely not.” He opened the side gate and whispered, “Then let’s commit the worst behaved burglary in suburban history.” Molly slipped in beside him, lifted her hat, and said, “Quietly, Bill. We’re professionals.” From the garden came the roar of children, the glow of fairy lights, and the unmistakable sound of a parent saying, “Who let the raccoon in?” Bill froze. “Did they say raccoon?” Molly stared toward the hedges. “Bill.” “Yes?” “Tell me you did not bring the whistle.” He slowly raised it. “Define bring.”
created_at
"2026-03-18T10:20:00.413Z"
3/18/2026, 10:20:00 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1529
moderation_latency
(null)
generation_time
5929
tokens_prompt
77
tokens_completion
1322
native_tokens_prompt
66
native_tokens_completion
1089
native_tokens_completion_images
0
native_tokens_reasoning
102
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00495
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0e24e4f05196d1720169ba7c5077e08194b8f04d4eb3232979"
is_byok
false
latency
183
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829200-0SwpaUcKZd3IFxqf6PFV"
upstream_id
"resp_0e24e4f05196d1720169ba7c5077e08194b8f04d4eb3232979"
total_cost
0.00495
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
709
72.52%
Dialogue to Total Word Ratio
Ratio: 72.54%, Deviation: 2.54%
neededClean
false
wordsTotal
710
wordsDialogue
515
36.2579%