NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1HXS6X3CWEWK7VX0GZPX9R
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
28.0s
Cost:
$0.015270
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
15.60%
Dialogue to Total Word Ratio
(details)
7.8011%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again,” said Bill, peering through the hedge at the bunting, “why we’re robbing a six-year-old’s birthday instead of a bank like respectable criminals?” “Because,” said Molly, adjusting her cap, “banks have guards, lasers, cameras, and men named Trevor who take their jobs seriously. Children’s parties have balloon arches, distracted parents, and a magician in a waistcoat sweating into a rabbit.” “I still feel odd about it.” “We’re not stealing from the child,” Molly said. “We’re intercepting the gifts before they form an emotional attachment. It’s practically administrative.” “Administrative theft.” “The least violent kind.” Bill squinted at the garden. “There’s a clown.” “That’s not a clown. That’s the uncle everyone tolerates for the sake of potato salad.” “He’s making balloon swords.” “Exactly. Amateur.” A shriek of laughter burst from the lawn. Children stamped past in paper hats while a woman carried a cake shaped like a dinosaur suffering from architectural stress. Bill winced. “Look at them. Tiny, sticky innocents.” “Sticky, yes. Innocent, debatable. Last week one of them hit me with a juice box because I wouldn’t tell him where clouds go at night.” “So what’s the plan?” Molly unfolded a hand-drawn map. “We enter through the side gate at precisely the moment they sing. Nobody hears anything during communal singing; social shame drowns all alarm. You go to the gift table, I create a diversion.” “What sort of diversion?” “I haven’t decided. Possibly shouting, ‘The pony’s loose.’” “There is no pony.” “There doesn’t have to be. The idea of a loose pony is stronger than the reality.” Bill nodded reluctantly. “And the presents?” “Into the large black bin bags.” “We’re stealing gifts in bin bags. That feels on the nose.” “It’s called honesty in branding.” He looked at the pile of wrapped boxes visible through the conservatory doors. “What do you think’s in them?” “Plastic misery. Battery-powered noise. Craft kits with glitter that migrates for decades. We’re doing those parents a favor.” “You’re saying this is a community service?” “I’m saying if we find anything decent, like a tablet or a bicycle voucher, destiny wanted us to have it.” Bill rubbed his hands. “I do like destiny.” “Everyone likes destiny when it sounds like theft with paperwork.” From inside the garden came the first wobbly notes of “Happy Birthday.” Molly straightened. “There. The anthem of opportunity.” Bill took a breath. “Right. In, bag, out.” “In, bag, out,” she agreed. “No chatting, no browsing, and absolutely no sampling the buffet.” “Why do you always say that like I’m the problem?” “Because once you stole a cheese board and got caught discussing brie with the hostess.” “It was an exceptional brie.” “It was your downfall.” They slipped through the gate and crouched behind a paddling pool. The singing grew louder, less accurate. “Go,” whispered Molly. Bill darted to the gift table under a banner reading MAX TURNS 6!!! “Good grief,” he muttered. “There are loads.” “Fewer observations, more larceny,” hissed Molly. He began stuffing presents into a bag. “Heavy one. Could be a train set.” “Take it.” “Soft one. Pajamas.” “Leave it. We have standards.” Molly popped up and shouted, “Loose pony! Loose pony by the sausage rolls!” The effect was immediate. Parents gasped. A grandfather seized a croquet mallet. The uncle with balloon swords yelled, “I knew this day would come!” Bill stared. “That worked absurdly well.” “People yearn for drama,” Molly said, sweeping the remaining boxes into her bag. “Now move.” A small voice behind them said, “Those are my presents.” They froze. A little boy in a dinosaur hat stood with his hands on his hips. Bill swallowed. “Hello.” “You’re bad guys,” the boy said. Molly smiled tightly. “We prefer ‘freelance redistributors.’” The boy considered this. “Will you take my Aunt Fiona too?” Bill blinked. “Pardon?” “She sings loudly and gives socks.” Molly hoisted the bags. “Tempting, but we’re traveling light.” The boy sighed. “Then leave the big red one. That’s the remote-control shark.” Bill looked at Molly. Molly looked at Bill. “Administrative theft,” Bill murmured. “Oh, fine,” said Molly, setting down the red box. “We’re criminals, not barbarians.” Then they vaulted the hedge and ran, pursued by outraged adults, one imaginary pony, and the unmistakable sound of Aunt Fiona beginning a second verse.
created_at
"2026-03-06T12:28:27.749Z"
3/6/2026, 12:28:27 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
450
moderation_latency
(null)
generation_time
27948
tokens_prompt
77
tokens_completion
1324
native_tokens_prompt
66
native_tokens_completion
1007
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01527
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_06e1c9fcd89ba38b0169aac86bce0081978bbc690d2ccfdeb1"
is_byok
false
latency
142
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772800107-LfXcPNNRJJdEXHRg2H4D"
upstream_id
"resp_06e1c9fcd89ba38b0169aac86bce0081978bbc690d2ccfdeb1"
total_cost
0.01527
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
707
15.60%
Dialogue to Total Word Ratio
Ratio: 63.90%, Deviation: 6.10%
neededClean
false
wordsTotal
712
wordsDialogue
455
7.8011%