NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0AG2TVXZAH5ZTHAYR1WJTF
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
6.3s
Cost:
$0.005157
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
2.78%
Dialogue to Total Word Ratio
(details)
1.3891%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Molly,” Bill whispered, crouched behind a hedge, “tell me again why we’re dressed like underpaid magicians?” “Because, Bill, everyone trusts a magician,” Molly whispered back, adjusting a fake top hat that was already leaning dangerously to the left. “And because if you say ‘undercover burglars’ in front of a children’s party, the balloon animals will turn on you.” “You say that like the balloon animals are loyal to us.” “They’re not loyal. They’re judgmental.” Bill peered through the hedge. “That’s a lot of balloons.” “That,” Molly said, “is the largest collection of helium-based optimism I have ever seen.” “Do we still need to do this?” “Might I remind you that we are six hundred pounds in debt to a man who collects vintage spoons and has a temper problem?” Bill winced. “When you say it out loud, it sounds less like a financial situation and more like a cry for help.” “It is both.” He straightened his fake bow tie. “All right. Fine. We get in, we take the presents, we leave.” “Precisely.” “A simple children’s-party present theft.” “Yes.” “Nothing could possibly go wrong.” Molly gave him a look. “Bill, if things go wrong, I will be the one doing the thinking. You will be the one carrying the loot and pretending not to look suspicious.” “I can look suspicious?” “You can look guilty to a houseplant.” “Harsh.” “Accurate.” A burst of shrieking laughter rose from the garden. Bill flinched. “Why are there so many children?” “Because it’s a birthday party.” “I know what a birthday party is, Molly. I’m asking why it has the energy of a minor festival.” “Because whoever planned it hired a clown, a DJ, and apparently a fog machine.” Bill stared. “A fog machine?” “Bill, there is no practical reason for a fog machine at a child’s party.” “And yet there it is.” “And yet there it is,” Molly echoed. “Which means we use the fog to our advantage.” He turned to her slowly. “How exactly do we use the fog to our advantage?” “We walk in like we belong there.” “We do not belong there.” “Confidence, Bill. That’s the whole con.” “You say that like it’s easy.” “It is easy if you stop looking like a man who’s hiding a raccoon in his trousers.” “I am not hiding a raccoon.” “Then why are you walking like that?” Bill muttered, “Because I’m nervous.” “Perfect. Nervous is normal. Nervous parents, nervous relatives, nervous people who definitely should not be near a pile of wrapped gifts and a confused magician.” “Do you think the magician will notice us?” “If he doesn’t, he is not worth the title.” Bill swallowed. “And the cake?” “What about it?” “It’s huge.” “It’s a children’s birthday cake, Bill. They build those cakes like small monuments to sugar and danger.” “Should we steal the cake too?” “No.” “Why not?” “Because we are criminals, not savages.” He nodded gravely. “Right. Presents only.” “Presents only.” Bill craned his neck again. “Okay, I can see the table. There’s a mountain of gifts.” “That’s our mountain.” “Is it weird that I’m emotionally invested already?” “It’s not weird. It’s efficient. The more attached you are, the faster you move.” A child squealed nearby. Bill grabbed Molly’s sleeve. “One just looked at us.” Molly smiled at the garden with saintly warmth. “Then smile back like a person who has never stolen anything in their life.” Bill did as instructed, though he looked as if his face had forgotten how. The child waved. Bill waved back. “Did I do that right?” “You looked like a guilty uncle at a wedding.” “Better than a raccoon?” “Marginally.” A loud voice boomed from inside the house. “All right, everyone! Present time in five minutes!” Bill’s eyes widened. “Five minutes? That’s not enough time.” “It’s plenty of time.” “For what?” “For us to get in, blend, wait for the singing, and lift the gifts during the chaos.” Bill glanced at the hedge, the lawn, the streamers, the clown, the fog, the mountain of presents. “You make criminal behavior sound so tidy.” “I practice.” He took a breath. “Molly?” “Yes, Bill?” “If this works, I want a smaller future.” “Aren’t we all?” She clapped him once on the shoulder. “Now come on. Smile like you’re here for the party.” Bill squared his shoulders. “I hate how good you are at this.” “I know,” Molly said, already stepping toward the gate. “That’s why you’re carrying the big bags.”
created_at
"2026-03-18T11:15:06.214Z"
3/18/2026, 11:15:06 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1424
moderation_latency
(null)
generation_time
6128
tokens_prompt
77
tokens_completion
1345
native_tokens_prompt
66
native_tokens_completion
1135
native_tokens_completion_images
0
native_tokens_reasoning
108
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.005157
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_072937bf0534fbe10169ba893a49ec8196b69e76f277c492ca"
is_byok
false
latency
251
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832506-JCPQ6vPaMs4nCsqdYDoK"
upstream_id
"resp_072937bf0534fbe10169ba893a49ec8196b69e76f277c492ca"
total_cost
0.005157
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
742
2.78%
Dialogue to Total Word Ratio
Ratio: 78.47%, Deviation: 8.47%
neededClean
false
wordsTotal
743
wordsDialogue
583
1.3891%