NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM08FVMKJ5A2VNF8DHPFEBQX
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
5.3s
Cost:
$0.004806
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread a crinkled party invitation across the table like it was a treasure map. “Right,” she said, squinting at the glitter. “At exactly four o’clock, we glide in, mingle with the parents, and take the presents.” Bill peered at the invite. “It says, ‘Bring a smile and wear socks if you must.’ That’s not a security system.” “It is if the hostess is a monster,” Molly said. “Also, it says no shoes in the house.” Bill frowned. “You’re kidding.” “I never kid about footwear-based vulnerability.” He took a sip of tea and made a face. “This is terrible tea.” “That’s because you’re nervous.” “I’m not nervous. I’m morally underwhelmed.” Molly leaned back in her chair. “You said you wanted a simple job.” “I said I wanted a simple job that didn’t involve crying children and balloons.” “Balloons are fine.” “They are not fine. They’re loud, they float, and one pop and suddenly everyone’s looking at the nearest suspicious man.” Molly pointed at him. “That’s why you’re bringing the clown nose.” Bill blinked. “The what?” “The clown nose. You know, for blending in.” “I refuse to blend in as a clown.” “Then what’s your plan, Bill? Dress as the grumpy uncle?” “I could be a magician.” Molly snorted. “You can’t even make your own rent disappear.” He folded his arms. “Maybe I don’t want to steal presents from children.” “Oh, don’t be dramatic. They’re not children’s presents yet. They’re just presents.” Bill opened his mouth, then shut it again. “That is the worst sentence you’ve ever said.” “I’m full of bad sentences,” Molly said. “It’s part of my charm.” He studied her for a moment. “Why this party anyway?” Molly smiled, all sharp edges. “Because everyone in that neighborhood leaves their doors unlocked when there’s cake involved. It’s not the presents. It’s the principle.” “The principle of what?” “The principle of being faster than the rich.” “That sounds less like a plan and more like a slogan.” “Everything sounds like a slogan if you say it with enough confidence.” Bill rubbed his temples. “Let me get this straight. We walk into a six-year-old’s birthday, act casual, and somehow leave with a mountain of wrapping paper and ethical regrets?” “That’s the spirit.” “I hate your spirit.” “You’ll miss it when it’s gone.” He stared at her. “And what exactly are we doing with the loot?” Molly shrugged. “Selling it, obviously.” “To whom? Other villains?” “To people with birthdays coming up. We’re not monsters.” Bill laughed despite himself. “That’s your defense? We’re thieves with standards?” “High standards,” she said. “We don’t steal from churches, hospitals, or anyone wearing tiny party hats.” “That’s generous.” “It’s called having a code.” “A code with remarkably low moral altitude.” Molly stood and tossed the invitation into her coat pocket. “Are you in or not?” Bill sighed long and theatrical, the sigh of a man already regretting all his life choices. “If I say no, you’ll do it alone and somehow blame me for the emotional fallout.” “I absolutely will.” “And if I say yes?” “You get half.” “Half of stolen toys?” “Bill, think bigger. There’s a mountain of gift cards out there.” He raised an eyebrow. “Now that is more criminal.” Molly grinned. “There you are.” He rose, straightening his collar. “Fine. But if any child asks why I’m near the cake, I’m telling them I’m the new aunt.” “Perfect,” Molly said. “And if anyone asks why I’m here, I’m the children’s financial advisor.” Bill looked at her. She looked back. Then, almost at once, they both started laughing, because no matter how terrible the plan was, it was still the kind of terrible plan that only made sense if you were already too deep into it to turn back.
created_at
"2026-03-18T10:40:01.7Z"
3/18/2026, 10:40:01 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1362
moderation_latency
(null)
generation_time
5007
tokens_prompt
77
tokens_completion
1133
native_tokens_prompt
66
native_tokens_completion
1057
native_tokens_completion_images
0
native_tokens_reasoning
194
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004806
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_01e918ecc003fe9d0169ba8101c3b48195bdc0a30eb81b461c"
is_byok
false
latency
183
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830401-miww0ntb5HDUr2zzIw2z"
upstream_id
"resp_01e918ecc003fe9d0169ba8101c3b48195bdc0a30eb81b461c"
total_cost
0.004806
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
624
0.00%
Dialogue to Total Word Ratio
Ratio: 71.29%, Deviation: 21.29%
neededClean
false
wordsTotal
627
wordsDialogue
447
0.0000%