NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM07FZ5CTHH5AGHMSZ5YXF94
Run Details
Model:
GPT-5.4 Mini
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
6.0s
Cost:
$0.004356
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
11.36%
Dialogue to Total Word Ratio
(details)
5.6793%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the bakery van, peering at the brightly lit community hall where a banner read HAPPY BIRTHDAY, ZIGGY! in letters that looked as though they had been cut out by an overly cheerful committee. Bill adjusted his black gloves with the solemnity of a priest preparing for confession. “I still say this is a bad idea.” Molly snorted. “It’s a children’s party, Bill. The security level is usually ‘forgot the cake knife in the kitchen.’” “That’s not a security level, that’s a crime waiting to happen.” “A crime is what we’re here for,” Molly whispered, checking the clipboard in her pocket. “And this one is elegant. We go in, mingle, identify the presents, and take them.” Bill looked aghast. “We are not mingling. We are burglars, not parish volunteers.” “We are whatever gets us through the door.” “And if someone asks why we’re there?” Molly gave him a grin. “You’re the clown.” Bill stared. “I am absolutely not the clown.” “You’re right,” Molly said. “The clown is a profession. You’re more of a clown-shaped consequence.” From inside the hall came a squeal of delight and the thump of a tiny, relentless drum. Bill winced. “There’s music.” “Yes.” “There are children.” “Yes.” “There’s a bouncy castle.” Molly checked the scene through the window. “A small one. Inflatable. Beige. It’s an insult to joy.” Bill leaned in and immediately pulled back. “I can see six presents by the cake table. Possibly seven. One of them is in shiny paper with rockets. That’s definitely a valuable item.” “See?” Molly said. “Focus on the objective.” “The objective is madness.” “The objective is loot.” “The objective is why society invented fences.” Molly patted his shoulder. “Bill, listen to me. The parents are distracted, the entertainer is twisting a balloon swan into what may be a felony, and the presents are stacked in a corner like offerings to a very small, very loud god.” “And what if the children notice?” Molly shrugged. “Then we become invisible.” “We are not invisible.” “We are emotionally invisible.” Bill squinted at her. “That’s not a thing.” “It is if you commit to it.” A shriek from inside made them both flinch. A child, presumably the birthday boy, had discovered a pinata shaped like a dragon. Its papier-mâché face hung near the ceiling with the grim dignity of a doomed monarch. Bill swallowed. “How exactly do you propose we carry out the theft?” Molly drew two tote bags from beneath the van. One was covered in cartoons of puppies; the other said WORLD’S BEST AUNT in sequins. Bill took the latter and recoiled. “I’m not carrying that.” “You’re carrying it because no one suspects a man with a sparkly aunt bag.” “I suspect a man with a sparkly aunt bag.” “Then you’re on the right team.” Bill peered through the window again. “That table is guarded.” “By a plate of cucumber sandwiches.” “And a woman with a clipboard.” Molly’s eyes narrowed. “That’s the real danger.” “Yes! Finally, we agree.” “I can handle the clipboard woman,” Molly said. “You handle the cake.” “Why would I handle the cake?” “Because while everyone screams about frosting, I slip the presents into the bags.” Bill blinked. “That is not a plan. That is a sugar emergency.” “It’s inspired.” “It’s criminal.” “It’s both. That’s the beauty.” A burst of laughter rolled from the hall, followed by the host’s voice calling, “Who wants to play Musical Chairs?” Molly straightened. “Perfect. They’re all moving.” Bill took a deep breath, then another, as if trying to inhale enough courage to last the evening. “If we do this, I want it on record that I hate you.” Molly smiled sweetly. “You adore me. Now put on the party hats.” “I said I hate you.” “Exactly. That’s the spirit. Hat first, conscience later.” Bill groaned, but he followed when Molly stepped out from behind the van and marched toward the hall with the confidence of a woman carrying a very bad idea to term. Behind them, the birthday music swelled, the children cheered, and the presents waited in their colorful paper shells, blissfully unaware that two of the least festive criminals in the city were about to ruin a perfectly decent afternoon.
created_at
"2026-03-18T10:22:36.728Z"
3/18/2026, 10:22:36 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
357
moderation_latency
(null)
generation_time
5857
tokens_prompt
77
tokens_completion
1230
native_tokens_prompt
66
native_tokens_completion
957
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004356
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0cf05ff754181d440169ba7cecc8b08196803811b6110fe135"
is_byok
false
latency
161
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829356-iUCg1XOwrNtZLwJ58kiC"
upstream_id
"resp_0cf05ff754181d440169ba7cecc8b08196803811b6110fe135"
total_cost
0.004356
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
703
11.36%
Dialogue to Total Word Ratio
Ratio: 56.60%, Deviation: 6.60%
neededClean
false
wordsTotal
705
wordsDialogue
399
5.6793%