NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM08V6E3MHTQM4MDPNGQPZ45
Run Details
Model:
GPT-5.4 Mini
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
5.7s
Cost:
$0.004253
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
95.07%
Dialogue to Total Word Ratio
(details)
47.5341%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Bill, tell me again why we’re robbing a children’s party,” Molly whispered, crouched beside a hedge with a pair of binoculars and the moral flexibility of a crow. “Because, Molly,” Bill said, adjusting his black gloves with unnecessary drama, “the presents are unattended, the crowd is distracted, and the birthday cake smells expensive.” “You say ‘birthday cake’ like it’s currency.” “In this business,” Bill said, “it is.” Molly lowered the binoculars and stared at the brightly decorated house ahead of them. A banner over the porch read HAPPY BIRTHDAY, SWEETIE! in letters so cheerful they felt aggressive. “It’s a bit much,” she said. “The balloons alone look like they’d squeal if you touched them.” “That’s why we’re professionals,” Bill replied. “We move silently, we blend in, we take the goods, and we exit before anyone realizes little Trevor’s sixth birthday has been financially ruined.” “Trevor,” Molly repeated. “Do we know it’s Trevor?” “We know the cake says Trevor, the invitations say Trevor, and the child in the cowboy hat yelling at a clown has the spiritual energy of a Trevor.” Molly sighed. “Fine. What’s the plan?” Bill produced a folded map from inside his coat. “Front door is for guests. Side gate is guarded by a Labrador with trust issues. Back window is open.” “You checked the dog?” “I bribed it with bacon.” “You bribed a dog with bacon?” Bill frowned. “Molly, please. I’m trying to maintain an atmosphere of competence.” “You’re wearing two different shoes.” “Exactly,” he said. “Inconspicuous.” They watched a stream of parents carrying wrapped gifts toward the house. One man entered holding a giant box shaped like a race car. A woman passed with a present wrapped in paper covered in dinosaurs and what looked suspiciously like glittering skulls. Molly narrowed her eyes. “We could maybe wait until the magician starts. Everyone loves a magician.” “That’s the problem,” Bill said. “When everyone loves the magician, nobody watches the hallway.” A shriek of laughter floated from inside. Then came the sound of a child declaring, “Mine! Mine! Mine!” with the kind of conviction usually reserved for tyrants and cats. Bill nodded toward the back alley. “Ready?” “I was born ready,” Molly said, “which I believe is also when my mistakes started.” They slipped along the fence, past a plastic flamingo with a party hat, and reached the open kitchen window. Bill peered in. A mountain of presents crowded the far room, stacked beneath streamers and a cardboard cutout of a smiling dinosaur wearing sunglasses. “Look at that,” he whispered. “A gift goldmine.” “You said that like a pirate who failed economics.” “I’m multi-talented.” Molly climbed through first, landing lightly on the tile. “If anyone asks, we’re here for the entertainment.” Bill followed and nearly knocked over a tray of cupcakes. He caught it with one hand and bowed to no one in particular. “See? Grace.” “Steady,” Molly hissed. “We need the gifts, not a misdemeanor involving buttercream.” From the next room came the booming voice of a father shouting, “No running near the punch bowl!” and a chorus of children immediately running near the punch bowl. Bill grabbed a sack from under the sink. “We load fast, we load quiet, we leave by the pantry door.” Molly eyed the pile. “And if the kids see us?” “Then we smile.” “We’re stealing their presents, Bill.” “Exactly. Smile apologetically.” She took the sack. “This is the worst job we’ve ever had.” Bill began stuffing boxes inside. “Not even close. Remember the garden gnome casino?” “That was at least aimed at adults.” “Adults are just children with debt,” he said. Molly snorted despite herself and hauled another stack of gifts toward the bag. “If this ends with a chase through a bouncy castle, I’m blaming you.” Bill looked up as footsteps approached the kitchen door. He froze, sack in hand, and whispered, “Molly.” “What?” “We may have company.” A tiny voice sang from the hallway, “Mommy says nobody’s allowed in the gift room until cake!” Molly and Bill exchanged a glance, then smiled the identical smiles of people who had absolutely no business being in a children’s party and even less intention of leaving empty-handed.
created_at
"2026-03-18T10:46:13.202Z"
3/18/2026, 10:46:13 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
306
moderation_latency
(null)
generation_time
5328
tokens_prompt
77
tokens_completion
1222
native_tokens_prompt
66
native_tokens_completion
934
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0042525
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0112f6aab91b0df90169ba8275458881948955ee698e29075e"
is_byok
false
latency
200
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830773-YPe6MP3CXTD8z0oCgNLE"
upstream_id
"resp_0112f6aab91b0df90169ba8275458881948955ee698e29075e"
total_cost
0.0042525
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
695
95.07%
Dialogue to Total Word Ratio
Ratio: 48.99%, Deviation: 1.01%
neededClean
false
wordsTotal
696
wordsDialogue
341
47.5341%