NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM07VEF6NZS7E53HY1NWRDZT
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
6.3s
Cost:
$0.005485
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly flattened the party invitation on the kitchen table and stared at the glittery cartoon balloons as if they had personally insulted her. “I cannot believe,” she said, “that the target is called Sunny Smiles Fun Zone.” Bill looked up from polishing a very unnecessary pair of gloves. “That name is a trap. No place with that much joy is clean. It’s either a children’s party or a cult.” “It’s a children’s party,” Molly said. “Birthday of some trust-fund toddler named Hugo or Noodle or whatever. Plenty of presents. Tiny guests. Overworked adults. A perfect score.” Bill gave a slow nod. “You make burglary sound festive.” “It is festive. We’re basically doing after-hours Santa with boundaries.” Bill snorted. “Santa doesn’t wear black and carry bolt cutters.” “We’re not carrying bolt cutters,” Molly said. “We’re carrying a toolbox and a strong sense of entitlement.” He leaned over the invitation. “There’s a clown.” “A clown?” “Yes. It says here, ‘special entertainment by Bobo the Amazing.’” Molly winced. “I hate clowns.” “You hate accountants.” “I also hate clowns. Clowns are accountants for the soul.” Bill folded the invitation carefully. “Fine. Let’s review. We enter during the party, when everyone’s distracted by sugar and balloons. We mingle.” “Mingle?” Molly repeated. “You?” “I can mingle.” “You once introduced yourself to a buffet and asked if it was married.” “It had a ring of onions.” Molly pressed her fingers to her forehead. “Listen. We don’t need to be charming. We need to be invisible.” “Invisible,” Bill said. “In a room full of children?” “Yes.” He frowned. “Children are alarmingly observant. One of them once identified me as ‘the man who smells like soup’ and I was wearing cologne.” “Then don’t smell like soup.” “I’m not sure that’s a controllable trait.” Molly slid a hand-drawn floor plan across the table. “The presents are stacked in the back room beside the giant fake castle. According to my contact, the parents will be busy singing a song called ‘Happy Happy Sparkle Day’ for at least six minutes.” Bill squinted at the sketch. “You have a contact at a child’s party?” “I have a contact everywhere. That’s why I’m alive.” He tapped the back room. “And what’s our exit?” Molly smiled. “Through the catering door.” Bill looked up. “You planned this.” “I planned around you,” she said. “It’s a skill.” He placed both hands on the table. “I just want to be clear: we are stealing gifts from children.” “We are relieving the children of excess gift-based privilege.” Bill blinked. “That’s the most villainous sentence I’ve ever heard.” “Thank you.” “And what exactly are we doing with the presents?” “Selling them. Separately, ideally. We are not competent enough to move fifty identical toy drones without being noticed by a man named Darren in a party hat.” Bill straightened. “I object to the assumption we are not competent.” Molly raised an eyebrow. “Bill, you tried to rob a bakery last month and left with six lemon tarts and a plant.” “The plant was in the way.” “The plant was on the counter.” “It was a bold counter-plant. I respect that.” She laughed despite herself, then reached for her coat. “In and out. No drama.” Bill put on his gloves with ceremonial gravity. “Molly, every job we’ve ever done has involved drama.” “Not this one.” A tiny silence settled over the room. Then Bill said, “There will be a piñata.” Molly closed her eyes. “Of course there will.” “And children hit piñatas with sticks.” “Yes.” “And we are stealing from a room full of stick-having children.” She opened her eyes and grabbed the invitation. “We are not stealing from the children. We’re stealing from the concept of birthday excess.” Bill smiled. “That sounds worse.” “Less guilty, though.” He stood, slipping the gloves into his pocket. “All right, then. Let’s go rob the concepts.” Molly headed for the door and paused with a wicked grin. “And Bill?” “Yeah?” “If anyone asks, we’re a singing duo.” He grimaced. “I don’t sing.” “You will tonight.” “Why?” “Because if we’re caught, I want the police to suffer.”
created_at
"2026-03-18T10:28:52.853Z"
3/18/2026, 10:28:52 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
3087
moderation_latency
(null)
generation_time
6036
tokens_prompt
77
tokens_completion
1362
native_tokens_prompt
66
native_tokens_completion
1208
native_tokens_completion_images
0
native_tokens_reasoning
261
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0054855
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_05d02a3d0649d4c40169ba7e64e7e48196aa386549b1aa33ef"
is_byok
false
latency
176
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829732-36apDu2Z48P7LlDk1QZM"
upstream_id
"resp_05d02a3d0649d4c40169ba7e64e7e48196aa386549b1aa33ef"
total_cost
0.0054855
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
681
0.00%
Dialogue to Total Word Ratio
Ratio: 74.38%, Deviation: 24.38%
neededClean
false
wordsTotal
687
wordsDialogue
511
0.0000%