NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KK9QRAEGWCB9HBKBEDZ96M4A
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
32.5s
Cost:
$0.017400
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.33%
Dialogue to Total Word Ratio
(details)
0.1647%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread a crayon-scribbled invitation across the café table as if it were a military map. Bill leaned in so far his hat dipped into the sugar bowl. “Read it again,” he said. “‘Come celebrate Timmy turning six, Saturday, two o’clock, games, cake, magician, bring your biggest smile.’” “Magician,” Bill said darkly. “I don’t like competition.” “It’s not competition if he pulls scarves out of a hat and you pull remote-control dinosaurs out of a gift pile.” Bill sat back, offended. “I have standards, Molly. I am a professional. I prefer the term redistribution specialist.” “You once stole a fruit basket from a chiropractor.” “It was artisanal.” Molly tapped the invitation. “Focus. Twenty children, twenty presents, maybe more if there’s divorced grandparents trying to outspend each other.” Bill’s eyes gleamed. “Batteries, dolls, toy trucks, educational tablets nobody wants, and at least one mysterious box making clunking noises.” “The good stuff goes fast in the secondhand market.” “The excellent stuff stays with me.” “Bill.” “What? Sentiment is important in our line of work.” She took out a tiny notebook. “We need disguises.” “I’m not wearing the clown suit again.” “You honked every time you sat down.” “That was a manufacturing defect.” “You told three children you were emotionally available for balloon animals.” “I was improvising.” Molly considered him. “Fine. We go as entertainers.” Bill brightened. “Magicians.” “No.” “Why not?” “Because you can’t do magic.” “I made a landlord disappear.” “You hid in his wardrobe for six hours and frightened him into moving.” “Still counts.” She flipped a page. “Face painter and assistant.” Bill frowned. “I don’t want to paint tigers on sticky children.” “You won’t. You’ll carry the supplies, look harmless, and keep an eye on the present table.” “I look harmless now.” “You look like a man banned from several aquariums.” “That was one aquarium, and the octopus and I had an understanding.” Molly ignored that. “Timing is everything. When the cake comes out, all eyes go to the candles. Then the singing. Then the chaos of sugar.” “And we lift the haul.” “We relocate the haul.” “Into what?” She nodded toward the stroller parked by the next table. “One of those.” Bill squinted. “A pram?” “A giant baby carriage. Empty blanket on top, presents underneath.” Bill whistled. “That’s low.” “We’re stealing from a children’s party, Bill. Our moral elevator left the building.” He folded his arms. “I still have lines.” “You stole a wishing well.” “It was full of coins and false hope.” Molly lowered her voice. “There is one complication.” “There’s always one complication.” “The birthday boy’s mother is vigilant. Former school crossing guard. Reflexes like a striking heron.” Bill nodded gravely. “I respect the crossing people. They fear nothing.” “So we need a distraction.” Bill smiled slowly. “I can do distractions.” “The last time you did a distraction, you released twelve pigeons in a library.” “And everyone remembered it.” “Mostly because one of them stole a librarian’s glasses.” “They’re clever birds.” “No pigeons.” Bill drummed his fingers. “What about a puppet show that goes wrong?” “How would it go wrong?” “I don’t know. One puppet accuses another of tax fraud. Tensions escalate.” Molly stared at him. “You have a gift for nonsense.” “It’s why we work so well together.” She softened, just slightly. “We work well because I plan and you carry heavy things.” “And because, deep down, you enjoy my company.” “Deep down, I enjoy your upper-body strength.” Bill put a hand to his chest. “Romance.” “Listen carefully. We enter at one forty-five. You smile at parents. Do not flirt with anyone in a paper crown.” “No promises.” “Absolute promises. We locate presents, identify exits, and wait for the cake.” “And if the magician is good?” “Then we applaud politely while committing larceny.” Bill grinned. “You always did make crime sound classy.” Molly stood and tucked the invitation into her coat. “It isn’t classy. It’s rent.” Bill rose after her. “One question.” “What now?” “If there’s a piñata, are we accounting for collateral candy?” Molly paused at the door. “Bill.” “Yes?” “We are absolutely accounting for collateral candy.” He beamed. “Now that is a proper operation.”
created_at
"2026-03-09T16:44:15.705Z"
3/9/2026, 4:44:15 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4349
moderation_latency
(null)
generation_time
32439
tokens_prompt
77
tokens_completion
1343
native_tokens_prompt
66
native_tokens_completion
1149
native_tokens_completion_images
0
native_tokens_reasoning
176
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0174
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_012056995f76e2460169aef8dfc80c81939806e0d3a5920e99"
is_byok
false
latency
156
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773074655-HsQZTF4AOTibHb905U2A"
upstream_id
"resp_012056995f76e2460169aef8dfc80c81939806e0d3a5920e99"
total_cost
0.0174
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
690
0.33%
Dialogue to Total Word Ratio
Ratio: 80.69%, Deviation: 10.69%
neededClean
false
wordsTotal
694
wordsDialogue
560
0.1647%