NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM09NTGTYMK4HRFM9708NW6W
Run Details
Model:
GPT-5.4 Mini (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
5.7s
Cost:
$0.004833
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
64.42%
Dialogue to Total Word Ratio
(details)
32.2110%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Molly, before we go any further, I need to know one thing,” Bill whispered, peering through the bakery’s fogged window at the brightly lit community hall across the street. “Are you absolutely certain this is a children’s party?” “Bill,” Molly said, adjusting the black knit cap over her hair, “there are balloons the size of beach balls, a clown in the parking lot, and at least three small humans currently glued to the glass like terrified goldfish. Yes, it’s a children’s party.” “I just wanted to rule out a very festive funeral.” “Why would a funeral have a bouncy castle?” Bill frowned. “Modern trends.” Molly pressed her face closer to the window. “Look at the table by the stage. Cupcakes. Juice boxes. And presents. So many presents.” “Big ones?” “Wheeled ones.” Bill gave a slow, reverent nod. “That means batteries.” “That means expensive batteries,” Molly said. “And expensive batteries mean expensive toys, and expensive toys mean rich parents.” “Or at least parents with dangerous optimism.” “Bill, focus. We are not here to judge parenting. We are here to relieve it.” “Such a noble profession,” he said. “You ever think about how our resume sounds?” Molly ticked them off on her fingers. “Burglary, petty theft, burglary with snacks, trespassing, one accidental citizenship test fraud—” “That was one time.” “And now this.” Bill leaned back against the bakery wall. “Stealing presents from a child’s birthday party feels... low.” Molly turned to him, offended. “Low? Bill, we once stole a violin from an opera singer while disguised as ushers.” “That violin was in a locked case.” “Yes, and tonight those presents are in plain sight, wrapped in cheerful paper with ribbons. Far more vulnerable.” “That is not how morality works.” “No, but it is how opportunity works.” Bill squinted at the hall. “What’s the plan, exactly?” Molly smiled with the confidence of someone who had already ignored several red flags. “We go in through the side door during the cake song.” “The cake song?” “You know. When everyone sings, the adults are distracted, the kids are covered in frosting, and the noise levels become legally questionable.” “You’ve really studied the social dynamics of third-grade celebration culture.” “I read a lot.” “You read instruction manuals for locks.” “Still counts.” Bill rubbed his hands together, though not from excitement so much as preemptive guilt. “How many presents?” “Enough to make this worthwhile.” “That is not a number.” “It is in our world.” “And then?” “And then we leave before the magician finishes the rabbit trick.” “Why?” “Because magicians notice things.” Bill nodded again. “True. They always look like they know about your taxes.” Molly glanced at the clown by the parking lot. “You see that guy?” “The clown? Yes, tragically.” “He’s on the phone.” “So?” “So I think he’s security.” Bill stared. “A clown security guard?” “Children’s parties have evolved.” “Next you’ll tell me the balloon animals are wired.” Molly lowered her voice. “The ones near the gift table might be.” Bill recoiled. “You’re joking.” “Am I?” He considered the hall, the clown, the squeals, the cake, the towering heap of presents waiting under a banner that read HAPPY BIRTHDAY, SUPERCOPTER! in glittery letters. Finally he sighed. “I hate that this is a good idea.” Molly clapped him on the shoulder. “That’s the spirit.” “No, that’s the problem.” “Bill, listen to me. We go in, stay calm, act like we belong there.” “We’re wearing black gloves.” “Lots of dads wear black gloves.” “Not indoors.” “Then smile more.” “I do not have a smile appropriate for a children’s party.” “Yes, you do. It’s the one you use when you’re lying to a customs officer.” Bill winced. “That was not my best afternoon.” “Tonight will be better. We blend in, we take the presents, we vanish.” “And if someone asks why we’re carrying six wrapped boxes?” Molly grinned. “We say we’re the entertainment.” Bill blinked. “What kind?” “The kind that leaves early.” She nudged him toward the side alley. “Ready?” He stared once more at the hall, at the frosting, the balloons, the chaotic innocence of it all. Then he muttered, “I can’t believe I’m saying this, but yes.” Molly’s eyes gleamed. “Good. And Bill?” “Yeah?” “Try not to look like a man about to rob a child.” He swallowed. “No promises.” “Excellent,” she said. “That’s the attitude of a professional.”
created_at
"2026-03-18T11:00:45.735Z"
3/18/2026, 11:00:45 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2411
moderation_latency
(null)
generation_time
5374
tokens_prompt
77
tokens_completion
1473
native_tokens_prompt
66
native_tokens_completion
1063
native_tokens_completion_images
0
native_tokens_reasoning
46
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004833
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_08545b8d093dc2410169ba85ddd410819693223b342079a991"
is_byok
false
latency
241
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773831645-13SIvubimFJJf37AQzf6"
upstream_id
"resp_08545b8d093dc2410169ba85ddd410819693223b342079a991"
total_cost
0.004833
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
724
64.42%
Dialogue to Total Word Ratio
Ratio: 72.97%, Deviation: 2.97%
neededClean
false
wordsTotal
725
wordsDialogue
529
32.2110%