NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM07TGH06FDJBT42C4V7ZS7Z
Run Details
Model:
GPT-5.4 Mini (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
5.6s
Cost:
$0.004797
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
5.42%
Dialogue to Total Word Ratio
(details)
2.7104%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Bill, you’re walking like a man with a secret,” Molly said, tugging him behind the hedge. “I am a man with a secret,” Bill whispered. “Two secrets, actually. One, the party is in that house. Two, I’m standing in a bush and I hate it.” “Stop whining. We’re professionals.” “We’re in matching black turtlenecks and borrowed bicycle helmets.” “Exactly. It says ‘professional’ in a playful way.” Bill peered through the hedge at the bright house next door, where a banner reading HAPPY BIRTHDAY, SUNSHINE! hung over the porch. “There are balloons.” “Yes, Bill. It’s a children’s party.” “I know it’s a children’s party. I’m saying there are balloons. Balloons are suspicious.” “Everything is suspicious to you.” “I was suspicious of a clown once. Turned out to be a very normal accountant, but still.” Molly adjusted the tiny flashlight clipped to her collar. “Listen carefully. According to my research, the presents are stacked in the sunroom.” “Your research?” “I asked the caterer.” “You asked the caterer?” “In a vague way. She thought I was a cousin.” Bill blinked. “You’re not a cousin of anyone.” “I have the face of someone who could be related to anyone.” “That’s not a skill.” “It got us free lemonade.” Bill looked back at the house. “How many children are inside?” “Enough to make our lives difficult.” “That’s not a number.” “It is if you’re sensible.” Bill folded his arms. “I’m not comfortable stealing gifts from children.” Molly stared at him. “Bill.” “What?” “We are not stealing from children. We are reassigning assets.” “That’s stealing with paperwork.” “No, it’s art.” “You call everything art if it has a getaway involved.” “Because it does.” Bill sighed. “Why did the client want children’s presents anyway?” Molly lowered her voice. “Because the client is the sort of fellow who collects novelty spoons and has enemies in every postcode.” “That’s not a reason.” “It’s a motivation. Different thing.” A burst of laughter floated from the open windows. Molly crouched lower. “Keep your head down. The host’s mother is doing that thing where adults clap too hard at nonsense.” Bill squinted. “What nonsense?” “I don’t know. Probably a cake shaped like a duck.” “I would like a duck cake.” “We are not here for the duck cake.” “I know. But I’m saying if one were accidentally available, I would not object.” Molly nudged him. “Focus. At 3:15, the magician distracts the front room with a rabbit.” “A rabbit?” “Allegedly.” “A rabbit is not a distraction. A rabbit is a rabbit.” “Bill, children are the easiest audience in the world.” “They’re also loud.” “Which is why we use the noise. While everyone crowds around the magician, we slip into the sunroom, collect the gifts, and exit through the side gate.” “Collect the gifts.” “Yes.” “Not ‘snatch the loot.’ Not ‘nab the treasure.’” “Those phrases attract bad energy.” Bill looked at her. “You’re superstitious about theft now?” “I’m professional about theft. There’s a difference.” He rubbed his hands together. “What if a child sees us?” Molly smiled. “Then we say we’re the present inspectors.” “That is terrible.” “That’s why it works.” Bill nodded reluctantly. “And if they ask for identification?” “I have a laminated badge.” “You do not.” “I do now. I made one this morning.” “How?” “Office supplies and arrogance.” Bill leaned closer. “Molly.” “Yes?” “Are we absolutely sure this is the right house?” Molly checked the numbers on the mailbox, then the banner, then the row of toy cars on the lawn. “Bill, if there were a more obvious house for children’s presents, I’d like to see it.” He exhaled. “All right. Fine. We’ll do it your way.” “Wonderful.” “One question.” “Only one?” “Why is there a goat in the backyard?” Molly frowned. “That,” she said, “was not on the briefing.” From inside the house came a shout, then a cheer, then the unmistakable sound of wrapping paper tearing. Bill stared. “Should we be worried?” Molly grinned, already moving toward the side gate. “No, Bill.” “Why not?” “Because if the children are opening presents now, it means the magician is late.” Bill followed, whispering, “That’s your plan?” Molly glanced over her shoulder. “My plan, Bill, is to be faster than a confused goat.”
created_at
"2026-03-18T10:28:22.186Z"
3/18/2026, 10:28:22 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
3107
moderation_latency
(null)
generation_time
5335
tokens_prompt
77
tokens_completion
1458
native_tokens_prompt
66
native_tokens_completion
1055
native_tokens_completion_images
0
native_tokens_reasoning
56
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004797
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0ad8b6a7202ba05e0169ba7e463f888194939c0834efc90951"
is_byok
false
latency
202
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829702-WUnzYhRUMKr6JGu8CBlb"
upstream_id
"resp_0ad8b6a7202ba05e0169ba7e463f888194939c0834efc90951"
total_cost
0.004797
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
702
5.42%
Dialogue to Total Word Ratio
Ratio: 77.64%, Deviation: 7.64%
neededClean
false
wordsTotal
702
wordsDialogue
545
2.7104%