NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM07D26XMQR19HFGNJQ7D94C
Run Details
Model:
GPT-5.4 Mini
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
5.9s
Cost:
$0.004257
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge with a clipboard, a flashlight, and the look of a woman who had once been banned from three bakeries and still considered that a compliment. Across the garden, through a gap in the fence, a bright banner read HAPPY BIRTHDAY, OSCAR! in letters so cheerful they felt almost threatening. Bill peered over her shoulder. “You’re taking notes?” “I’m planning,” Molly whispered. “You’re drawing a skull with a party hat.” “It’s a tactical skull.” Bill adjusted the black gloves on his hands, then immediately got them caught on a thorn. He hissed and yanked free. “So this is the target?” “This is the target,” Molly said. “The house, the garden, the presents, the sugar-fueled chaos. We get in during the singing, out during the cake, and nobody notices a thing.” Bill stared through the fence at the cluster of children in paper crowns. One was already crying because a balloon had looked at him funny. “Those are children, Molly. Tiny, sticky children.” “Yes.” “They’re armed with noisemakers.” “Yes.” “And that clown by the patio has somehow offended me personally.” Molly squinted. “That’s not a clown. That’s a magician.” “That’s worse.” A burst of shrieking applause rose from the party, followed by a deep male voice attempting to sing and failing catastrophically. Molly checked her watch. “Good. The parents are distracted. The gifts are stacked in the conservatory. We take the satchels, clear the ribbon barrier, and walk out like we belong there.” Bill frowned. “Walk out carrying fifteen wrapped boxes?” “Seventeen.” Bill gave her a long look. “You do understand ‘belong there’ is a bit compromised by the carrying of stolen birthdays.” Molly tapped the clipboard. “We are not stealing birthdays. We are stealing presents. There’s a moral distinction.” “Is there?” “Yes. Birthdays are intangible. Presents are in boxes.” Bill considered this. “That is the stupidest thing you’ve ever said, and I once heard you explain parking fines as a conspiracy.” From inside the house came a shout, then the thud of a smaller child running full-speed into a table. The adults produced the sound of people pretending not to panic. Molly smiled. “Listen to that. They’re entering the window of maximum confusion.” “Your windows are always terrible.” “And yet they keep us employed.” Bill pulled a tiny pair of bolt cutters from his coat. “Fine. But if we’re questioned, I want it on record that I objected to stealing from children.” “You objected loudly.” “Loudly, morally, and with excellent posture.” Molly tucked the clipboard under her arm. “You’re the lookout.” “I hate being the lookout.” “You hate every role unless it involves opening things with a crowbar.” “I’m versatile.” “Bill, you once tried to crowbar a jam jar.” “It was a difficult jar.” They edged along the hedge toward a side gate decorated with streamers and a sign that read PLEASE REMEMBER TO SMILE. Bill pointed at it. “That’s not advice. That’s a threat.” Molly slipped a hairpin from her coat and knelt by the lock. “Focus.” “I am focused.” “You’re looking at the cake table.” “I’m monitoring frosting levels.” The lock clicked. Molly eased the gate open an inch. The sound from the party swelled: piano music, shrieks, a chorus of children shouting “Again! Again!” Bill winced. “Molly,” he whispered, “I just want to say, if we survive this, I’m choosing our next job.” “That’s never happened.” “It’s happening now. I’m thinking of something less psychologically damaging. Maybe a bank. Maybe a museum. Possibly a locked skip.” Molly smiled as she slipped into the garden. “Bill, darling, we are professionals.” He followed her, ducking past a wobbling scarecrow holding a balloon. “We’re prowling into a seven-year-old’s birthday with theft bags.” “Professionals with a niche market.” “And what’s the plan if a child sees us?” Molly straightened her gloves and glanced toward the conservatory, where the presents gleamed in towers of ribbon and paper. “We tell them we’re the surprise entertainment.” Bill blinked. “What entertainment?” She produced a handful of glitter from her pocket and let it catch the light. “Exactly.”
created_at
"2026-03-18T10:21:01.545Z"
3/18/2026, 10:21:01 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
529
moderation_latency
(null)
generation_time
5668
tokens_prompt
77
tokens_completion
1216
native_tokens_prompt
66
native_tokens_completion
935
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004257
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0a4e99c19008c8e40169ba7c8da1648193b4181b4893855c88"
is_byok
false
latency
208
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829261-Z5DJVzQilugl2JIrt4Zk"
upstream_id
"resp_0a4e99c19008c8e40169ba7c8da1648193b4181b4893855c88"
total_cost
0.004257
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
676
0.00%
Dialogue to Total Word Ratio
Ratio: 52.65%, Deviation: 22.65%
neededClean
false
wordsTotal
680
wordsDialogue
358
0.0000%