NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0AP2NEHP1P6KSNTQYNNB0D
Run Details
Model:
GPT-5.4 Mini
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
5.0s
Cost:
$0.004320
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.01%
Dialogue to Total Word Ratio
(details)
0.0032%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched beside the hedge with a pair of binoculars that were, in her opinion, too expensive to be associated with crime. Across the lawn, a banner reading HAPPY BIRTHDAY, CASSIE! flapped in the breeze like it had already heard something terrible and was trying to leave. The house beyond was glowing with paper lanterns, laughter, and enough helium balloons to make the whole place look mildly insubordinate. Bill adjusted the black knit cap over his ears and whispered, “You’re sure this is the one?” “The address is correct,” Molly said. “The security is weak, the parents are distracted, and the gift table is in the conservatory.” Bill squinted. “How do you know all that?” “I read the invitation.” “You got invited?” “No,” Molly said. “I stole one from the recycling bin.” Bill nodded appreciatively. “That’s the kind of long-term thinking that made you a professional.” “It also made me a man the PTA once referred to as ‘the woman with the clipboard,’” Molly said. “We have a three-minute window when the magician starts the rabbit act. Everyone will be looking left.” Bill frowned. “Why would the magician start with a rabbit?” “Because it is a children’s party and he is, apparently, a monster.” From inside the house came a roar of delighted shrieking, followed by the shrill voice of a child announcing that the cake had been “exploded on purpose.” Bill winced. “I don’t like this,” he said. “Stealing from adults is cleaner. More dignity. Less frosting.” Molly lowered the binoculars. “We are not stealing from children. We are stealing from their parents, who have replaced responsibility with shiny wrapping paper and a playlist called ‘Tiny Dancers.’” “That does make it sound better.” “It is better.” They watched as a harried father in a paper crown staggered past the window carrying a tray of cupcakes like he was evacuating a hospital. Bill pointed. “That guy looks scared.” “He’s been assigned balloon duty,” Molly said. “No one survives balloon duty emotionally.” She rose into a crouch and checked her watch. “On my mark, we go through the side gate, past the hedges, and straight to the conservatory. No talking, no sudden movement, no heroics.” Bill placed a hand on his chest. “When have I ever done heroics?” “You once tried to rescue a falcon from a farmers’ market.” “It was trapped in a wicker basket.” “It was a decorative pheasant centerpiece.” Bill looked offended. “It still had the eyes of an animal.” Molly took a breath. “Mark.” They slipped through the gate and along the side of the house, moving with the exaggerated solemnity of people who had once practiced being villains in a mirror. The kitchen window was open, and from inside came the smell of sugar, buttercream, and somebody’s complete inability to count. A woman shouted, “Who took the scissors?” and several children screamed “Not me” with the unified innocence of a crime syndicate. At the conservatory door, Molly paused. The gift table stood in the center under a paper star, stacked with bright rectangles and impossible bows. Bill whispered, “That’s a lot of loot.” “Presents,” Molly corrected. “Loot sounds more honest.” Molly produced a slim tool from her sleeve and set to work on the latch. “Once we load the bags, we leave quietly.” “Quietly,” Bill echoed. “Of course.” The latch clicked. Then, from behind them, a tiny voice said, “Excuse me, are you the entertainment?” Molly and Bill turned slowly. A small girl in a princess dress stood on the patio holding a cupcake with both hands. Her face was smudged with chocolate. She stared at them with the grave curiosity of a magistrate. Bill smiled too widely. “Yes?” The girl narrowed her eyes. “You’re not the balloon man.” Molly recovered first. “No, dear. We’re the surprise.” The girl considered this. “For Cassie?” “Exactly,” Molly said. Bill gave a solemn nod. “Big surprise.” The girl looked at the tool in Molly’s hand, then at their black clothes, then at the gift table. “Are you stealing the presents?” Molly and Bill froze. After a long beat, Bill said, “Only the embarrassing ones.” The girl ate frosting from her thumb. “Good,” she said. “Take the doll that cries. It’s awful.”
created_at
"2026-03-18T11:18:22.65Z"
3/18/2026, 11:18:22 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
409
moderation_latency
(null)
generation_time
4780
tokens_prompt
77
tokens_completion
1205
native_tokens_prompt
66
native_tokens_completion
949
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00432
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_01a44985f5d7a53f0169ba89feb4808193ae96bb63fd7b034d"
is_byok
false
latency
191
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832702-JHIi9LZRfkWVPD2heo2s"
upstream_id
"resp_01a44985f5d7a53f0169ba89feb4808193ae96bb63fd7b034d"
total_cost
0.00432
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
702
0.01%
Dialogue to Total Word Ratio
Ratio: 43.89%, Deviation: 13.89%
neededClean
false
wordsTotal
704
wordsDialogue
309
0.0032%