NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM08T0JVRJED1W7MPZFENKER
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
6.3s
Cost:
$0.004666
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
3.64%
Dialogue to Total Word Ratio
(details)
1.8211%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the party invite across the kitchen table like it was a treasure map and not a card with a cartoon dinosaur holding balloons. “Okay, Bill, let’s review the target.” Bill peered at it. “I’m already offended by the font.” “The birthday boy is six.” “So?” “So the presents are likely to be stacked near the cake.” Bill nodded solemnly. “A classic security flaw. Inexperienced hosts always overestimate the goodwill of children.” Molly snorted. “We’re not discussing goodwill. We’re discussing loot.” “Right. Loot.” Bill leaned in. “How many gifts are we talking?” “Based on the RSVPs, at least twenty-four. Maybe more if Aunt Linda shows up. Aunt Linda never comes to a party without a blender, a knitted thing, and opinions.” Bill rubbed his hands together. “Beautiful. A mountain of wrapping paper. The poor thing will be like a tiny festive bank vault.” Molly tapped the invitation. “And the parents have a bouncy castle in the yard.” “Security nightmare.” “It’s not a fortress.” “It’s still a bouncy castle, Molly. Have you ever tried to move quietly through one? You sink. You bounce. You apologize to the foam.” “We are not going through the bouncy castle.” “Good. I was worried that was your plan.” “No, Bill, my plan is simple. We attend as a couple of harmless, slightly underdressed relatives, blend in, wait for the opening of presents, and then—” “Wait,” Bill said. “We’re pretending to be relatives?” “Would you rather be clowns?” He stared at her. “No.” “Exactly.” “I just want to note,” he said, pointing at her, “that last time you said ‘simple,’ we ended up in a petting zoo during a mayoral fundraiser.” “And we got out with the cheese board.” “The goat ate my cufflink.” “A price worth paying.” Bill folded his arms. “Fine. But what’s the actual extraction method?” Molly smiled the smile he usually saw right before a bad idea became a worse one. “We bring a gift bag.” “A gift bag,” he repeated, suspicious. “Yes. An enormous one.” “How enormous?” “Big enough to hold all the presents.” Bill blinked. “That’s not a gift bag. That’s a seduction pouch for a magician.” “It will be festive.” “It will be criminally optimistic.” “We’ll set it near the pile and gradually relocate gifts into it.” “Gradually,” he said. “Meaning while everyone is distracted?” “Yes.” “With what?” Molly shrugged. “Cake. Loud singing. Someone inevitably crying. The usual chaos of childhood.” Bill considered this. “I do enjoy a crying child as cover.” “You say that like it’s normal.” “In our line of work? It practically is.” Molly folded the invite and tucked it into her coat. “There’s also a face-painting station.” Bill’s eyes lit up. “Now you’re speaking my language.” “No.” “Imagine it,” he said. “We arrive with painted cheeks, maybe a pirate theme. Nobody suspects pirates at a children’s party.” “Everyone suspects pirates at a children’s party.” “Then we’ll be bad pirates. Pirates who brought a casserole.” Molly laughed despite herself. “You cannot be trusted with a casserole.” “I can absolutely be trusted with a casserole.” “You once used lasagna as a shield.” “It was a tactical lasagna.” She stood, slinging her bag over her shoulder. “Listen. We go in. We smile. We compliment the balloons. We compliment the birthday cake. We compliment the host’s life choices, even if they include a clown with a saxophone.” Bill grimaced. “If there’s a clown with a saxophone, I’m leaving.” “You’re not leaving. You’re holding the bag.” He gasped. “Me?” “You have the broadest shoulders.” “They’re not broad, they’re haunted.” Molly headed for the door. “Get your gloves.” “I don’t own gloves.” “Then borrow some. And Bill?” “Yeah?” “If this works, we retire.” He followed her into the hall. “And if it doesn’t?” She grinned. “Then we’re banned from children’s parties forever.” Bill thought about that, then sighed happily. “Best possible outcome, really.”
created_at
"2026-03-18T10:45:34.441Z"
3/18/2026, 10:45:34 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1409
moderation_latency
(null)
generation_time
6130
tokens_prompt
77
tokens_completion
1212
native_tokens_prompt
66
native_tokens_completion
1026
native_tokens_completion_images
0
native_tokens_reasoning
114
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0046665
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0163c84047bb13680169ba824e8390819490d6947ded42e382"
is_byok
false
latency
206
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830734-ZhXRny9AKtEdMO07pnRN"
upstream_id
"resp_0163c84047bb13680169ba824e8390819490d6947ded42e382"
total_cost
0.0046665
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
643
3.64%
Dialogue to Total Word Ratio
Ratio: 78.14%, Deviation: 8.14%
neededClean
false
wordsTotal
645
wordsDialogue
504
1.8211%