NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0AJCZYRJE5VEEECQNVMSED
Run Details
Model:
GPT-5.4 Mini
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
4.5s
Cost:
$0.003609
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
46.09%
Dialogue to Total Word Ratio
(details)
23.0428%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly glanced through the rain-spattered window at the brightly lit house. “There it is,” she whispered. “The castle of sugar, ribbons, and screaming.” Bill adjusted his black gloves. “You say that like it’s not a nightmare.” “It is a nightmare,” Molly said. “A profitable one.” Bill peered at the balloon arch. “You’re really sure the presents are inside?” “Bill, listen to me carefully. Children do not throw parties like this without a mountain of loot. There will be toys, games, gift bags, and probably at least one elaborate electric dinosaur.” “Fine,” he said. “But I want it known I hate this plan.” “You hate every plan.” “I have standards.” “You once tried to rob a bakery because the croissants looked arrogant.” “They did.” Molly pulled a folded map from her coat. “We go in through the side gate, past the bouncy castle, through the kitchen, and out with the presents before anyone notices.” Bill frowned. “Why would the presents be in the kitchen?” “Because adults put things wherever they stop thinking.” “That’s fair,” he muttered. “What about security?” Molly pointed to a man in a glittery hat nodding along to music at the edge of the lawn. “That,” she said, “is the security.” Bill squinted. “That’s Uncle Darren.” “Exactly. A man like that can be distracted by a juice box.” Bill sighed. “You have a very low opinion of family gatherings.” “I have a healthy respect for chaos.” A shriek came from inside the house, followed by the sound of a trumpet being played badly. Bill winced. “That’s a child’s party all right.” Molly smiled. “Listen to the atmosphere. It’s practically begging for an elegant criminal intervention.” “Elegant?” Bill said. “We’re wearing black clothes and hiding behind a hedge.” “Elegance is a state of mind.” “It’s also not crouching in a wet rosebush.” “Lower your voice.” “I’m whispering.” “You’re whispering aggressively.” He leaned closer. “If this goes wrong, we’re getting tackled by eight-year-olds with party hats.” Molly gave him a look. “Then don’t let it go wrong.” Bill scratched at his mask. “And if they ask why we’re there?” “We are caterers.” “We are not.” “We are in black.” “So is grief.” “Bill.” “Fine, fine. What’s the signal?” Molly raised one finger. “When the clown starts the magic show, we move.” Bill stiffened. “There’s a clown?” “Apparently.” “In what universe is that reassuring?” “In this one, because nobody trusts a clown enough to watch the gift table.” Bill swallowed. “I hate clowns.” “I know.” “I hate children’s parties.” “I know.” “I hate your plans.” “I know.” He looked back at the glowing windows. “Do you really think there’ll be something worth stealing?” Molly’s grin sharpened. “Bill, this is not about worth. This is about volume. Every guest brought a present, and every present is probably wrapped in shiny paper with a little tag. We take the lot, vanish into the night, and tomorrow some very upset parents argue about dinosaurs and dollhouses.” Bill considered that. “You do make crime sound festive.” “It is festive.” “That is deeply concerning.” She nudged him toward the gate. “Come on. On my mark.” Bill followed, muttering, “If a six-year-old chases me, I’m quitting.” “A six-year-old won’t catch you.” “Then what will?” Molly looked toward the house, where the clown burst into view juggling oranges. “The moral implications,” she said. “Now move.”
created_at
"2026-03-18T11:16:22.156Z"
3/18/2026, 11:16:22 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
475
moderation_latency
(null)
generation_time
4330
tokens_prompt
77
tokens_completion
1048
native_tokens_prompt
66
native_tokens_completion
791
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.003609
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0a8a2d9315731b140169ba898638e08195aca7e28285cc632c"
is_byok
false
latency
268
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832582-KhKuWFYmbxkmTM9RDht5"
upstream_id
"resp_0a8a2d9315731b140169ba898638e08195aca7e28285cc632c"
total_cost
0.003609
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
557
46.09%
Dialogue to Total Word Ratio
Ratio: 73.94%, Deviation: 3.94%
neededClean
false
wordsTotal
564
wordsDialogue
417
23.0428%