NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1HSS8ZXXR9C42J4F9Y5C7Z
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
27.4s
Cost:
$0.014190
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
5.89%
Dialogue to Total Word Ratio
(details)
2.9445%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the privet hedge with the solemn concentration of a safecracker and the knees of a woman who had made several bad life choices. Across the street, number twelve blazed with pastel bunting, paper lanterns, and the kind of cheerful chaos that suggested at least three children had already eaten icing directly off the tablecloth. Bill lay beside her in a raincoat that made him look less like a master thief and more like a depressed geography teacher. “You told me this was a quiet job,” Molly hissed. “It is quiet,” Bill whispered back. “For a children’s party.” “There’s a pony.” “That’s one pony. I’ve robbed through worse.” “You’ve never robbed through livestock.” “I once escaped through a wedding with a goose under each arm.” Molly glanced at him. “That was not a robbery. That was confusion.” “Confusion is nature’s burglary.” From the house came a shriek of delight, then the unmistakable thud of small feet stampeding toward sugar. Molly unfolded a hand-drawn map of the premises, sketched on the back of a takeaway menu and stained with what she hoped was gravy. “Right,” she said. “Presents table is in the conservatory. Cake in the dining room. Face painting in the lounge. Magician in the garden.” Bill frowned. “I don’t like a magician on site.” “He’s for the children, Bill, not counterintelligence.” “That’s what they want you to think. Nobody in a velvet cape is innocent.” “We go in as caterers, grab the gift haul, out through the side gate, over the fence, and into your van.” “My van says ‘Balloon Bonanza’ on the side.” “That helps.” “It also has six hundred balloons in the back.” “Then we float away if cornered.” Bill considered this. “Good contingency.” They rose and adjusted their disguises: paper hats, black polo shirts, and expressions of mild food-service despair. Molly lifted a tray of sausage rolls as if she had been born carrying disappointment on silver. “If anyone asks,” she said, “you’re Barry.” “Why am I Barry?” “Because you look like a Barry.” “That is profiling.” “Can you spell canapé?” “No.” “Then you’re Barry.” They crossed the street with brisk professional misery. At the gate, a woman in a sequined cardigan intercepted them, clutching a clipboard and radiating maternal command. “Oh good, you’re late,” she said. “Put those in the kitchen. And someone needs to help Kevin with the piñata.” Bill smiled weakly. “I’m more of a logistics man.” “Nonsense. Everyone’s hands are full.” As she swept away, Molly murmured, “See? Invisible.” “Incompetent people always are.” Inside, the house roared with noise. Children in pirate hats careered between adults carrying bowls, ribbons, and low expectations. In the conservatory sat the presents: a glorious mound of wrapped boxes, gift bags, and glittering parcels piled like tribute to a tiny tyrant. Bill’s eyes shone. “That’s magnificent.” “Focus,” Molly said. “We need sacks.” A small boy appeared at Bill’s elbow, face painted as a tiger and already sweating orange. “Are you the clown?” Bill stiffened. “Do I look like joy?” The boy thought about it. “No. You look like my uncle after tax season.” “Fair.” Molly snatched two tablecloths from a sideboard. “Bundle and move.” They swept presents into improvised sacks with the speed of practiced opportunists. A dollhouse vanished. A telescope. Something making robot noises. Bill hefted a suspiciously heavy parcel. “What’s in this one, bricks?” “Probably educational,” Molly said. “Take it.” Then a voice boomed behind them. “Ah-ha! Assistants!” They turned. The magician stood in the doorway, mustache waxed to knife points, cape glinting, rabbit tucked under one arm like a loaded threat. “I beg your pardon?” Molly said. “I need volunteers for the vanishing act.” Bill tightened his grip on the bulging cloth. “Unfortunate timing.” The magician smiled. “Nonsense. Children adore suspense.” Molly gave Bill a look. “Can you knock him out with a telescope?” “Professionally? Yes.” The rabbit stared at them with deep moral disappointment as the first child rounded the corner and screamed, “Mum! The caterers are stealing my dinosaur!”
created_at
"2026-03-06T12:26:16.744Z"
3/6/2026, 12:26:16 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
677
moderation_latency
(null)
generation_time
27282
tokens_prompt
77
tokens_completion
1218
native_tokens_prompt
66
native_tokens_completion
935
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01419
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0a54c3c7abd2df2f0169aac7e8e3e08197aa583e904a1553d2"
is_byok
false
latency
341
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772799976-O8pkrMr0dKKjFW14mV0X"
upstream_id
"resp_0a54c3c7abd2df2f0169aac7e8e3e08197aa583e904a1553d2"
total_cost
0.01419
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
669
5.89%
Dialogue to Total Word Ratio
Ratio: 42.47%, Deviation: 7.53%
neededClean
false
wordsTotal
671
wordsDialogue
285
2.9445%