NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KK9QH18WK1VW3E547EEB1F17
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
31.4s
Cost:
$0.017850
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.01%
Dialogue to Total Word Ratio
(details)
0.0048%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly unrolled a hand-drawn map across the café table, narrowly missing a puddle of tea and Bill’s elbow. “Right,” she said. “Target: one suburban semi-detached fortress, balloons at the gate, bouncy castle in the garden, and approximately twenty sugared children running perimeter patrol.” Bill squinted. “You’ve drawn the trampoline as a shark.” “That is tactical symbolism.” “It’s got teeth.” “It’s a hazard, Bill. Keep up.” He tapped the map. “And this red X?” “The present table.” “The holy mountain.” “The glittering summit.” “The cardboard Alps of plastic dinosaurs, dolls, and educational nonsense nobody wants.” “Children want it.” “Children want to lick windows. Their judgment is unreliable.” Molly sat back with the expression of a woman presenting a masterpiece to a committee of one idiot. “Now. We enter at fourteen hundred hours.” “Two in the afternoon is a bold hour for crime.” “It’s not crime, Bill. It’s redistribution.” “From children.” “From excess. No child needs seventeen presents.” “One of them might be a telescope.” “Then they can look at the stars with the memory of having almost had a telescope.” Bill rubbed his chin. “We need disguises.” “I’ve considered that,” said Molly. “Option one: entertainers.” Bill blinked. “You want me to be a clown?” “I want you to be useful. The clowning would be incidental.” “I have dignity.” “You once hid in a florist’s van under a mountain of carnations.” “And emerged fragrant and misunderstood.” “Exactly. Option two: catering staff.” Bill considered it. “I can carry sausage rolls with conviction.” “You can barely carry a thought with conviction.” “That’s hurtful, Molly.” “That’s strategy, Bill. If I lower your confidence now, you’ll be less likely to improvise later.” He leaned in. “Counterproposal. We go in as magicians.” She stared at him. “Why?” “Because nobody questions a magician. Things vanish around them all the time.” “That,” Molly admitted, “is the first intelligent thing you’ve said since Tuesday.” “Thank you. I’ve been saving one.” Molly adjusted the map. “Fine. We arrive with a folding table, a black cape, and that dreadful wand you stole from the novelty shop.” “I didn’t steal it. I liberated it from retail captivity.” “You tucked it into your sleeve and ran.” “Very fast liberation.” She pointed to the sketched hallway. “Parents cluster here, pretending not to drink prosecco from paper cups. Children swarm here, by the cake. The presents are in the dining room. We create a distraction in the garden.” Bill grinned. “Rabbit from a hat?” “We don’t have a rabbit.” “We have your cousin Keith.” “I am not pulling Keith out of anything. Last time he bit a vicar.” “What’s the distraction then?” Molly’s smile turned sly. “You announce a grand illusion: The Amazing Floating Birthday Child.” Bill went pale. “We are not levitating a child.” “Of course not. We are pretending to set up the levitation, which draws every parent with a camera into the garden. While they gather, I sweep the presents into the catering trolley.” “There’s a catering trolley now?” “There has to be, Bill. Try to think like a professional.” “I am thinking like a professional. A professional who doesn’t want to be chased by eight-year-olds with party horns.” “They’ll be in the garden.” “One always stays behind. There’s always a suspicious one. Glasses. Juice-stained mouth. Sees through lies.” “Then you distract that one with balloon animals.” “I can only make snakes.” “Children are idiots. Call it a giraffe.” Bill nodded slowly. “And exfiltration?” “We roll the trolley to the van, drive away, and sort the loot by market value and batteries included.” He folded his arms. “Molly, I must ask the difficult question.” “Yes?” “What if the cake looks nice?” She sighed. “We take one slice. We are thieves, not barbarians.” Bill placed a hand over his heart. “That’s why I follow you.” “No, you follow me because you once tried to rob a vending machine with a flute.” “It nearly worked.” “It played you off in B minor.” Bill looked at the map again and smiled. “Magicians, then.” “Magicians,” said Molly, rolling it up. “Now finish your tea. We’ve got a birthday to ruin politely.”
created_at
"2026-03-09T16:40:16.932Z"
3/9/2026, 4:40:16 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
6864
moderation_latency
(null)
generation_time
31330
tokens_prompt
77
tokens_completion
1393
native_tokens_prompt
66
native_tokens_completion
1179
native_tokens_completion_images
0
native_tokens_reasoning
236
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01785
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0cf6a513bfca956c0169aef7f1061c8190904fa5a3917fe9e7"
is_byok
false
latency
263
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773074416-wiV42JMF4NtrwL2noTPj"
upstream_id
"resp_0cf6a513bfca956c0169aef7f1061c8190904fa5a3917fe9e7"
total_cost
0.01785
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
684
0.01%
Dialogue to Total Word Ratio
Ratio: 83.60%, Deviation: 13.60%
neededClean
false
wordsTotal
689
wordsDialogue
576
0.0048%