NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0A5N0X2C77D5FAX52AXBTK
Run Details
Model:
GPT-5.4 Mini (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
7.3s
Cost:
$0.004599
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Molly, tell me again why the target is a children’s party,” Bill whispered, crouched behind a hedge in a rented clown nose. “Because, Bill,” Molly said, adjusting her glittery gloves, “nobody watches the presents at a children’s party. They’re too busy watching children eat cake like tiny raccoons.” “First of all, that’s horrifying. Second, I don’t like that you called this a target.” “It is a target.” “It is a bouncy castle with streamers.” “A bouncy castle with streamers containing at least thirty-seven wrapped gifts.” “Thirty-seven?” “Maybe thirty-eight. The magician brought one.” “The magician brought a gift?” “Apparently he’s also the birthday boy’s uncle.” Bill peeked over the hedge. “There are balloons everywhere. I feel judged by them.” “Stay focused. We go in through the side gate, smile like we belong, and then we blend in.” “I don’t blend in.” “You’re wearing a bow tie made of pipe cleaners.” “I’m trying to respect the environment.” Molly rolled her eyes. “Listen carefully. We are not stealing from children. We are liberating merchandise from an undersecured festive environment.” “That sounds worse.” “It sounds professional.” “It sounds like prison with cupcakes.” “Bill.” “What?” “Do you want the new roof patched before winter or do you want to freeze like a decorative corpse?” “I want both, but I have been informed by life that choices are for other people.” “Then stop complaining and watch the parents.” Bill squinted toward the backyard. “Why are the parents in a circle whispering?” “Because they’re the real security. Kids don’t notice anything, but a mother can detect movement in a wrapped box from fifty yards.” “You said this was easy.” “I said this was easier than the museum job.” “The museum job had lasers.” “This has punch. Similar principle.” “Children are not a principle.” “They are today.” A shriek of laughter burst from the yard. Bill flinched. “One of them just tackled a cake.” “Good. Distraction.” “Good? That child is inside the cake.” “Even better. It’s camouflage.” “That’s not camouflage.” “It is if everyone is too shocked to look at the gift table.” Bill rubbed his hands together. “All right, I’m in. What’s the play?” “Simple. You carry the folding chair.” “Why do I carry the folding chair?” “Because you look like a man who owns a folding chair.” “I don’t know whether to be offended or impressed.” “Both later. You walk in with the chair, mutter something about ‘extra seating,’ and I trail behind with the cupcake tray.” “We have a cupcake tray?” “We do now.” “Where did you get a cupcake tray?” “From the neighbor’s porch. Don’t make this sentimental.” “You stole from Mrs. Alvarez?” “I borrowed. Permanently.” Bill sighed. “And then?” “Then you set the chair by the gift table, I start apologizing to the adults for being ‘the cousin who’s always late,’ and while they’re distracted, I switch the actual presents into the laundry basket under the tray table.” Bill blinked. “You have a laundry basket?” “I have range.” “That is disturbingly efficient.” “Thank you.” “You’re welcome? No, wait, I didn’t mean—” “Bill.” “Sorry.” “Can you, for once, keep your face calm?” “My face is naturally trusting.” “That’s not a face.” “It’s my brand.” Molly checked her watch. “We have ten minutes before they sing.” “That’s ominous.” “It’s a birthday song, not a thunderstorm.” “I’ve heard your humming. I’m not taking chances.” “Do you want in or not?” Bill stared at the house, then at the floating balloon arch, then at Molly. “I want out, but I want the roof more.” “That’s the spirit.” “And if a child asks what we’re doing?” “You say, ‘I’m with the entertainment.’” “I am not entertainment.” “With that nose, Bill, you are the whole circus.” He groaned. “Fine. But if we get caught, I’m blaming the magician.” “The magician has an alibi.” “He’s a magician.” “Exactly.” Molly straightened her gloves and stepped toward the gate. “Ready?” Bill hefted the folding chair. “No.” “Good. You’re learning.” “And Molly?” “What?” “If those presents are all socks and plastic dinosaurs, I’m leaving you to answer for this alone.” She grinned. “Bill, if they’re all socks and plastic dinosaurs, we’re stealing from children with excellent taste.”
created_at
"2026-03-18T11:09:24.394Z"
3/18/2026, 11:09:24 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
742
moderation_latency
(null)
generation_time
7010
tokens_prompt
77
tokens_completion
1360
native_tokens_prompt
66
native_tokens_completion
1011
native_tokens_completion_images
0
native_tokens_reasoning
34
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004599
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0a0dadfb8e2a95660169ba87e474108194b2250e7016983e34"
is_byok
false
latency
194
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832164-SRxjkaUmjj1fKVcZiL3C"
upstream_id
"resp_0a0dadfb8e2a95660169ba87e474108194b2250e7016983e34"
total_cost
0.004599
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
694
0.00%
Dialogue to Total Word Ratio
Ratio: 87.64%, Deviation: 17.64%
neededClean
false
wordsTotal
696
wordsDialogue
610
0.0000%