NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1GE2CWD0H85KX8JT72PH2B
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
26.2s
Cost:
$0.013350
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread a hand-drawn map across the sticky café table, pinning one corner with a salt shaker and the other with Bill’s untouched custard tart. Rain freckled the window behind them, and somewhere near the counter a coffee machine coughed like an old lookout with emphysema. “Right,” said Molly, tapping the map. “Objective: every present at the Wainscott twins’ birthday party.” Bill peered at the crayon rectangles she’d drawn. “You’ve labeled the bouncy castle ‘possible death trampoline.’” “Risk assessment,” Molly said. “Professionalism matters.” “It’s a children’s party.” “Exactly. Never trust a room where the chairs are too small and everyone’s sticky.” Bill leaned back, eyeing the tart as if considering a final meal. “I still think this is morally wobbly.” “You steal luxury watches from men named Crispin.” “That’s aspirational theft. This is nicking dinosaur wrapping paper from six-year-olds.” “Seven-year-olds,” Molly corrected. “And we are not stealing joy. We are intercepting goods before they’re opened. Philosophically, they remain in a state of gift ambiguity.” Bill blinked. “You made that up.” “On the bus.” She slid over a second sheet: timings, arrows, and one alarming sketch of a clown with Xs for eyes. “Who’s that?” Bill asked. “Entertainment. Bobo Sparklepants. Neutralize him and the room loses command structure.” “It’s a clown making balloon swords.” “A clown controls mood, attention, and crowd flow. He’s air traffic control in giant shoes.” Bill rubbed his temples. “Please tell me there isn’t a dog.” “There is a dog.” He groaned. “There’s always a dog.” “Miniature schnauzer. Name: Pickles. Temperament: suspicious patriot.” Bill studied the map. “Back gate, kitchen door, gift table by the conservatory. Why not just go in as guests?” Molly gave him a pitying look. “Because, Bill, we are in our thirties and neither of us can convincingly discuss Bluey for longer than forty seconds.” “I’ve done research.” “You called Peppa Pig ‘the pork one.’” He pointed at the map. “Fine. We enter through the garden during pass-the-parcel.” “No, during the magician.” “There’s a magician too?” Molly nodded darkly. “Mr. Fizzlebottom. Waistcoat, wand, probable tax issues.” “Too many performers,” Bill muttered. “This house is basically Glastonbury for minors.” She leaned in. “Listen. While Fizzlebottom produces a dove or sets fire to a hanky, all eyes front. You slip into the conservatory, load the presents into the laundry hamper, and wheel them to the side gate.” “What are you doing?” “Creating a diversion.” “With what?” Molly smiled. “Parents.” Bill looked wary. “That smile means paperwork or flames.” “Neither. I simply whisper to three mothers that Denise was heard saying the party bags looked cheap.” He stared. “That’s monstrous.” “It’s efficient.” Outside, thunder rolled. Inside, Bill finally picked up the custard tart and took a doomed bite. “And if Pickles spots me?” “Pocket sausages.” “If the clown spots me?” “Tell him there’s a better-paying gig in the driveway.” “And if a child spots me wheeling off thirty presents?” Molly folded the map with crisp finality. “You say, ‘Special delivery for the birthday committee,’ and keep walking with purpose. People will believe anything if you look annoyed enough.” Bill chewed slowly. “I hate how often that works.” She stood, buttoning her coat. “Good. Bring dark clothes, soft soles, and a believable smile.” Bill rose too, lifting the hamper checklist like a man accepting a prison sentence. “Molly?” “Yes?” “If we pull this off, what exactly are we?” She considered. “Entrepreneurs.” He sighed. “I was afraid you’d say that.” Molly tossed coins on the table and headed for the door. “Come on, Bill. Let’s go rob a piñata-adjacent economy.”
created_at
"2026-03-06T12:02:24.293Z"
3/6/2026, 12:02:24 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
406
moderation_latency
(null)
generation_time
26149
tokens_prompt
77
tokens_completion
1133
native_tokens_prompt
66
native_tokens_completion
879
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01335
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0d8f667a74e97d610169aac2505a6881908a7a0357569db8a0"
is_byok
false
latency
153
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772798544-BRQ9e9VR5R7CqWPLHtvz"
upstream_id
"resp_0d8f667a74e97d610169aac2505a6881908a7a0357569db8a0"
total_cost
0.01335
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
592
0.00%
Dialogue to Total Word Ratio
Ratio: 67.45%, Deviation: 17.45%
neededClean
false
wordsTotal
599
wordsDialogue
404
0.0000%