NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0957WYR3DQ8D8F23DS903G
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
7.2s
Cost:
$0.004919
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
5.31%
Dialogue to Total Word Ratio
(details)
2.6574%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly flattened the map of the community hall on the sticky kitchen table and tapped the glittery rectangle with a pencil. “There. The Children’s Winter Spectacular. Forty-seven gifts, three inflatable snowmen, and one stressed-out magician.” Bill peered over her shoulder. “You forgot the face painter.” “I didn’t forget. I’m choosing to ignore her. She has eyes like a hawk and a brush like a lawsuit.” Bill nodded solemnly. “So. We steal all the presents, disappear into the festive night, and live like royalty.” “Not ‘like royalty,’ Bill. We are not buying crowns. We are buying train tickets, plain clothes, and a sensible amount of cheese.” He frowned. “Royals eat cheese.” “Then we’ll live like very practical royals.” Bill sat back and scratched his chin. “I still don’t understand why a children’s party is the perfect target.” Molly gave him a look. “Because everyone will be distracted by sugar, balloons, and children doing the conga with a papier-mâché reindeer.” “Fair.” “The parents will be taking photographs. The volunteers will be refereeing arguments over who got the blue cupcake. And every present will be stacked under one enormous tree like a neon invitation.” “Neon invitation,” Bill repeated. “That sounds almost poetic.” “I’m planning a crime, not a sonnet.” “That’s what all the best sonnets are about, isn’t it?” Molly pinched the bridge of her nose. “Listen carefully. We go in dressed as lost entertainment professionals.” Bill brightened. “I could be a clown.” “No.” “A magician’s assistant?” “No.” “A man with a clipboard?” Molly paused. “Actually, yes. That’s terrifying.” Bill sat up straighter. “I can do administrative menace.” “You certainly can. I’ll be the event inspector.” “What event?” “Exactly.” Bill looked impressed. “Clever. Very official. Very untrustworthy.” “Thank you. You’ll carry the clipboard and a roll of masking tape.” “For what?” “For looking busy. Nobody questions a person carrying tape.” Bill rubbed his hands together. “And then?” “And then, when the party reaches maximum chaos, you create a diversion.” “I like this part already.” “I know you do. You will spill the bucket of plastic snow in the hallway, announce that the building is ‘experiencing a winter-related emergency,’ and loudly insist that everyone remain calm.” Bill blinked. “That’s my masterpiece?” “You’re better in performance than in planning.” He took offense for precisely two seconds. “And what do you do while I’m becoming a legend?” “I smile at the parents, thank the magician for his sacrifice, and wheel the presents out on the catering trolley.” Bill leaned forward. “You’re telling me the whole load fits on a catering trolley?” “I’m telling you the party hall has one trolley, two doors, and no one ever remembers where the trolley came from.” “Hmm.” He squinted at the map. “What about the ringing of the alarm?” “There is no alarm.” “No alarm at a children’s party?” Molly shrugged. “The only security is a bored teenager named Owen and a badge that says Volunteer, Not Paid.” Bill grinned. “Owen sounds vulnerable.” “Owen sounds like he’ll accept any explanation involving cake.” “Then we’re safe.” “No,” Molly said, folding the map with crisp precision, “we’re invisible.” Bill stood and straightened his coat. “I do enjoy when you talk like that. Makes me feel sleek.” “You look like a damp pigeon.” “A dangerous damp pigeon.” “A very loud one.” He held out his hand. “Partners?” Molly shook it. “Partners.” Bill smiled. “One question.” “What now?” “Are we taking the stuffed bear shaped like a unicorn?” Molly considered it. “Only if it’s holding a gift card.” “You are the soul of greed.” “I am the soul of planning.” He glanced toward the window, where snow drifted past the glass like torn confetti. “Tomorrow, then.” “Tomorrow,” Molly agreed. Bill grinned. “Children’s party. No alarm. Masking tape. Administrative menace.” “And if anyone asks what we’re doing?” He lifted the clipboard like a sacred relic. “We’re here to support the arts.” Molly snorted. “Bill, you’re a disgrace.” “And yet,” he said, pocketing the tape, “somehow, the perfect accomplice.”
created_at
"2026-03-18T10:51:42.379Z"
3/18/2026, 10:51:42 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1398
moderation_latency
(null)
generation_time
6919
tokens_prompt
77
tokens_completion
1271
native_tokens_prompt
66
native_tokens_completion
1082
native_tokens_completion_images
0
native_tokens_reasoning
145
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0049185
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_05fa3b8098698e180169ba83be717c8197b03b05bf6af0f620"
is_byok
false
latency
185
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773831102-aDW6178isdWLsslXWOmc"
upstream_id
"resp_05fa3b8098698e180169ba83be717c8197b03b05bf6af0f620"
total_cost
0.0049185
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
663
5.31%
Dialogue to Total Word Ratio
Ratio: 77.66%, Deviation: 7.66%
neededClean
false
wordsTotal
667
wordsDialogue
518
2.6574%