NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0A4QR8KK4WHXNY1TF2BCFE
Run Details
Model:
GPT-5.4 Mini (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
6.4s
Cost:
$0.004842
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
80.11%
Dialogue to Total Word Ratio
(details)
40.0553%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again,” Molly whispered, peering through the rain-streaked window of the community hall, “why the children’s party is the target.” “Because,” Bill whispered back, tugging his cap lower, “when people leave presents unattended, they’re practically asking for trouble.” “They’re asking for cake, Bill. The party is for a seven-year-old.” “And seven-year-olds,” he said with grave certainty, “have terrible operational security.” Molly snorted. “You mean they’re sticky and loud.” “I mean,” Bill said, producing a folded blueprint from his coat, “they label everything. See? ‘For Ava.’ ‘For Leo.’ ‘Do Not Shake.’ That last one is usually a challenge.” Molly leaned closer. “This is the dumbest thing we’ve ever done.” Bill brightened. “You say that like there’s competition.” “There is competition. The museum fiasco.” “The museum fiasco was elegant.” “You stole a fake sword and got chased by a docent with a flashlight shaped like a whale.” “Exactly. Unique.” She folded her arms. “Explain to me how we’re meant to get inside a room full of sugar-powered children and one suspicious clown.” “First,” Bill said, tapping the blueprint, “we arrive as entertainers.” Molly stared. “We?” “Yes. I’ll be the magician.” “You can’t do magic.” “I can do one trick.” “You can make a coin disappear.” Bill nodded. “That counts.” “It counts as theft if you’re in my wallet.” “Then you’ll be the balloon artist.” Molly blinked. “I can’t make balloon animals.” “You can barely make balloon shapes.” “I can make a very convincing sausage.” “The children will love that.” “They absolutely will not.” Bill grinned. “Fine. Then you’re the assistant.” “To the fake magician?” “To the magician,” he said. “Always the magician. Assistants get applause and fewer questions.” Molly glanced toward the hall again. “How many presents are we talking?” Bill squinted through the glass. “At least twenty-four. Maybe twenty-five if someone’s aunt is overcompensating.” “That many?” “It’s a milestone birthday.” “Seven is not a milestone.” “For a thief, every pile is a milestone.” She rubbed her temples. “And where exactly do we put twenty-five wrapped boxes?” Bill patted the duffel bag at his side. “In here, after we gently redistribute them.” “Gently.” “We’re not animals, Molly.” “We’re thieves.” “Polite thieves.” She laughed under her breath. “You’re impossible.” “And yet,” he said, with a little bow, “you keep coming to work with me.” “Because you’re the only person I know who can make a crime sound like community service.” Bill beamed. “I practice.” A burst of music and shrieking laughter spilled out when the hall door opened, then closed again. Molly flinched. “That clown is going to notice us.” Bill checked his watch. “The clown won’t notice anything until after the balloon sword competition.” “There’s a balloon sword competition?” Bill looked offended. “Molly, this is a very organized party.” “And our plan is to mingle, distract, and steal presents while children hunt for treasure in the corner?” “Basically.” “That’s awful.” “It’s efficient.” She glanced at him sideways. “You really did draw arrows on a map of a birthday party.” “I like to be prepared.” “You’ve labeled the cupcake table ‘soft obstacle.’” “It is soft. And an obstacle.” Molly’s mouth twitched. “What if they ask who we are?” Bill straightened his tie, which was somehow already crooked. “You said that last time, and I told you the truth.” “And what was the truth?” “That we were freelance enrichment consultants.” “Right,” she said. “That’s what got us chased out of the petting zoo.” “The goats were hostile.” “They were hungry.” He sighed. “Fine. If anyone asks, we’re with the entertainment.” “And if anyone asks what entertainment?” Bill gestured at her. “You do that face.” “My face?” “The one that says, ‘I’m a professional, don’t look directly at me.’” Molly laughed, then lowered her voice. “Bill.” “Yeah?” “If this goes wrong, I’m blaming you.” “If this goes right,” he said, “you’ll still blame me.” “Probably.” He offered his arm toward the door. “Ready to rob a child’s birthday party with style?” Molly took a breath, then slipped on her brightest fake smile. “No.” Bill nodded. “Perfect. Confidence is overrated.” They started toward the entrance, rain ticking on the pavement behind them, while inside the hall a chorus of children shouted, “Presents! Presents!” and Bill murmured, almost reverently, “Listen to that, Molly. Opportunity.”
created_at
"2026-03-18T11:08:54.421Z"
3/18/2026, 11:08:54 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
763
moderation_latency
(null)
generation_time
6179
tokens_prompt
77
tokens_completion
1409
native_tokens_prompt
66
native_tokens_completion
1065
native_tokens_completion_images
0
native_tokens_reasoning
40
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004842
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0c85801c064a26e20169ba87c684388196a3f8151f800705e0"
is_byok
false
latency
220
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832134-0QehkQtNtPrr1q5vrNzL"
upstream_id
"resp_0c85801c064a26e20169ba87c684388196a3f8151f800705e0"
total_cost
0.004842
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
708
80.11%
Dialogue to Total Word Ratio
Ratio: 72.11%, Deviation: 2.11%
neededClean
false
wordsTotal
717
wordsDialogue
517
40.0553%