NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM08BSXBC49GDJHAPA86YPAQ
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
6.2s
Cost:
$0.005166
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
50.07%
Dialogue to Total Word Ratio
(details)
25.0360%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again why we’re casing a children’s party,” Molly said, peering through the rain-streaked window of the van. “Because,” Bill said, tugging his gloves tighter, “birthday parties are the last socially acceptable places where people leave a mountain of valuables unattended in a room full of sugar-crazed witnesses.” “That’s a terrible sentence.” “It’s a perfect sentence. It’s got logic, menace, and a little frosting.” Molly snorted. “We are not stealing frosting.” “Not with that attitude.” She folded her arms. “We are stealing presents. That is what you said. All the presents.” “Correct.” “All of them?” “Every last one.” “Even the giant stuffed dinosaur?” “Especially the giant stuffed dinosaur. You can’t leave an item like that in civilian hands. It has too much swagger.” Molly leaned closer to the window and squinted at the house across the street. “There are clowns.” Bill winced. “Nobody said anything about clowns.” “On the flyer you said there’d be a magician.” “A magician is a man with a decent suit and a nervous hobby. A clown is a threat with shoes.” From inside the house came a burst of shrieking laughter, followed by what sounded like a plastic trumpet collapsing under pressure. Molly said, “We are not stealing from children.” Bill blinked at her. “We are absolutely stealing from children. Very temporarily. It’s called ‘redistribution’ when I say it.” “That is not what redistribution means.” “It is in my heart.” “You have a deeply untrustworthy heart.” “Thank you. Anyway, the plan is simple. We go in, we blend in, we identify the gift table, we execute the lift, and we exit before anyone notices the missing tricycle, the unwrapped Lego tower, or that ominous box that’s probably a drum kit.” Molly stared at him. “That’s your plan? Blend in?” Bill held up a paper hat with tiny rockets on it. “I have obtained headgear.” “That’s a party hat.” “Yes.” “You bought a party hat.” “I invested in camouflage.” “You look like a disappointed elf.” “Exactly. No one suspects a disappointed elf.” Molly rubbed her forehead. “And how, exactly, are we supposed to get a haul out of a room full of parents?” Bill smiled. “I’ve thought about that.” “That sentence never comforts me.” “I’ve rented a party cart.” “A what?” He pointed to the back of the van. “Open the doors.” Molly hesitated, then did. A folded catering cart sat inside, draped in bright wrapping paper and tied with a bow the size of a surrender flag. She stared. “You wrapped the trolley.” “Subtlety,” Bill said. “We wheel in the cart as if it belongs there. Under the paper, we stash the gifts. Under the gifts, we stash the smaller gifts. Under the smaller gifts, we stash my emergency biscuits.” “Your emergency biscuits?” “You cannot commit crimes on an empty stomach.” “You’re impossible.” “I’m prepared.” A woman in pearls was visible through the front window of the house, directing a cluster of small children toward a table covered in cake. Molly said, “What if they ask who we are?” Bill straightened his borrowed waistcoat. “You are Aunt Marge.” “I am not Aunt Marge.” “You are tonight. I am Uncle Felix.” “Uncle Felix sounds like a man banned from museums.” “It’s a strong name.” “It’s a suspicious name.” “Fine. You’re the magician’s assistant.” “I don’t know any magic.” “Neither does the magician, if we’re lucky.” Molly groaned. “If we get caught, I’m blaming your hat.” “If we get caught,” Bill said, “we stick to the story: we were hired to move the gifts for a surprise donation to the orphanage.” “There is no orphanage.” “There will be, emotionally.” She looked at him for a long moment. “You really are committed to making this worse.” “I prefer ‘more interesting.’” From inside, someone began singing off-key Happy Birthday. Bill lifted the cart handle with a flourish. “Shall we rob the toddling upper crust?” Molly sighed, took the other handle, and muttered, “If there’s a clown involved, I’m quitting.” “Fair,” Bill said. “But if there’s a clown involved, try to look natural.” They rolled the cart toward the brightly lit house as another burst of laughter rose from inside, and Bill whispered, “Remember, confidence is everything.” Molly whispered back, “I hate that you’re right.” “Then tonight,” Bill said, “let’s make a fortune out of favors and fear.” “And presents,” Molly said. “Mostly presents.” They reached the front gate, where a hand-painted sign read, PLEASE LEAVE YOUR SHOES AT THE DOOR. Bill squinted. “That’s odd.” Molly stared at it, then at the cart, then at him. “What now?” he asked. She gave him a wicked grin. “Now we take the shoes too.”
created_at
"2026-03-18T10:37:48.856Z"
3/18/2026, 10:37:48 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2515
moderation_latency
(null)
generation_time
5858
tokens_prompt
77
tokens_completion
1513
native_tokens_prompt
66
native_tokens_completion
1137
native_tokens_completion_images
0
native_tokens_reasoning
63
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.005166
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0bbf252fdca2e50e0169ba807ce9cc8194b0518f52f5eba74d"
is_byok
false
latency
167
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830268-xo1EHHA8dRfkHfm04B5W"
upstream_id
"resp_0bbf252fdca2e50e0169ba807ce9cc8194b0518f52f5eba74d"
total_cost
0.005166
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
773
50.07%
Dialogue to Total Word Ratio
Ratio: 66.28%, Deviation: 3.72%
neededClean
false
wordsTotal
777
wordsDialogue
515
25.0360%