NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM07EYZSTJ7MG3M1H3Q3RZYS
Run Details
Model:
GPT-5.4 Mini
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
6.1s
Cost:
$0.004730
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
98.14%
Dialogue to Total Word Ratio
(details)
49.0716%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again why we’re robbing a children’s party,” Bill said, staring at the lace curtains like they might accuse him. “Because,” Molly whispered, flattening herself behind a hedge, “it’s the safest place in town.” “A room full of six-year-olds, sugar, and balloons is your idea of safe?” “Yes. Everyone’s distracted, the security is nonexistent, and the loot is concentrated in one place.” “The loot being presents.” “Yes.” Bill frowned. “You make crime sound like community service.” “Lower your voice.” “I’m whispering.” “You’re whispering with emotion.” He ducked as a small confetti cannon went off somewhere inside the house. “What exactly are we stealing?” “Anything with a ribbon. Preferably electronic. Best resale value.” “You’re stealing from children, Molly.” “We’re borrowing from the future generation of consumers.” “That’s worse.” “Bill, focus. There’s a back entrance, a buffet table, and a mountain of gift bags in the living room. In and out.” “In and out,” Bill repeated. “That’s what you said about the wedding.” “And we got out fine.” “We were chased by a saxophone player.” “He had a vendetta.” Bill peeked around the hedge. “I can see a clown.” “That’s not a clown.” “It has red shoes, a wig, and a face like a collapsed cake.” “That’s the birthday magician.” “Why does the magician have a broom?” Molly squinted. “To make the children believe in consequences.” Bill swallowed. “I hate this plan.” “You hate every plan until the money appears.” “I like plans with less glue and fewer cupcakes.” “Bill, are you softening?” “I’m developing instincts.” “You’re developing cowardice.” “Same thing, from a distance.” Molly nudged him. “Listen. We wait until the singing starts. When everyone’s distracted, we slip inside, collect the presents, and load them into the van.” “What if the parents notice?” “They’re adults. They only notice things when it affects their parking.” Bill glanced at the driveway. “There’s a bouncer.” “That’s Derek.” “You know the bouncer?” “I know everyone at this party. I sold Derek a counterfeit wristwatch last summer.” “Good for you.” “He still owes me for the battery.” “You’re planning a heist on the basis of a battery debt?” “Crime is mostly networking.” The front door opened, and a wave of children’s shrieks spilled into the garden. Bill winced. “That sound is pure chaos.” “That,” Molly said, eyes bright, “is opportunity.” A little boy in a pirate hat ran past carrying a spoon like a sword. “Mom! The cake’s on fire!” “It’s a themed cake,” Molly whispered. “Distraction within distraction.” Bill watched a parent sprint toward the kitchen. “I think we should leave.” “Absolutely not. We’re going in.” “You’re insane.” “I’m efficient.” They crouched low and darted along the fence line toward the back door. Molly produced a tiny tool kit from her coat. “You carry that everywhere?” “You don’t?” “I carry regrets.” Molly knelt by the lock. “Hold this flashlight.” “I thought you said we’d wait for the singing.” “We’re improvising.” “You always improvise when there’s a lock.” “There’s always a lock.” The door clicked. Bill blinked. “That was too easy.” Molly grinned. “Children’s party, Bill. No one expects sophistication.” From inside, a chorus began: “Happy birthday to you—” Bill winced again. “I feel guilty.” “Good. Guilt keeps you quick.” They slipped into a hallway piled with party hats and stray shoes. Bill whispered, “How many presents are we talking?” “Enough to make the van sag.” “And if there are name tags?” “We’ll re-gift strategically.” “That’s not a word.” “It is in our line of work.” A child rounded the corner, saw them, and froze with a cupcake in one hand. Bill froze too. “Hello.” The child looked at Molly’s tool kit. “Are you the entertainment?” Molly smiled. “Yes.” Bill turned slowly. “We are?” “Absolutely,” Molly said, without missing a beat. “Interactive magic.” The child pointed at the tool kit. “Can you make my brother disappear?” Bill said, “Only on weekdays.” Molly laughed, a little too loudly. “Show starts in the living room. Very exclusive. Bring everyone you know.” The child gasped and ran off shouting, “The entertainers are here!” Bill stared at Molly. “Did you just recruit a witness?” “I prefer the term ‘advance marketing.’ Now come on.” “Why is your face like that?” “What face?” “The one that says this is either genius or a felony.” Molly adjusted her gloves and headed toward the staircase. “Bill, my dear, in our profession those are just different departments.”
created_at
"2026-03-18T10:22:03.781Z"
3/18/2026, 10:22:03 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
404
moderation_latency
(null)
generation_time
5681
tokens_prompt
77
tokens_completion
1443
native_tokens_prompt
66
native_tokens_completion
1040
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0047295
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_097f2667261392d20169ba7ccbdc8481959702984ba7e2a2c2"
is_byok
false
latency
182
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829323-oepdccuwl5VuWB8QbSpi"
upstream_id
"resp_097f2667261392d20169ba7ccbdc8481959702984ba7e2a2c2"
total_cost
0.0047295
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
733
98.14%
Dialogue to Total Word Ratio
Ratio: 69.39%, Deviation: 0.61%
neededClean
false
wordsTotal
735
wordsDialogue
510
49.0716%