NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1GTAVHBKAJ20T74XWAMP31
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
27.2s
Cost:
$0.014490
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
27.74%
Dialogue to Total Word Ratio
(details)
13.8699%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill crouched behind the hedge of Number Twelve, staring at a back garden festooned with balloons, banners, and a pink plastic castle that looked insultingly easy to climb. “It’s a fortress of sugar,” Bill whispered. “It’s a children’s party in Surbiton, not the Tower of London,” Molly said, adjusting the black wool cap that made her look less like a criminal mastermind and more like an unemployed mime. Bill peered through the leaves. “I count twelve small civilians, one clown, two exhausted adults, and a grandmother with the eyes of a prison governor.” “The grandmother is the pivot,” Molly said. “Never trust anyone in orthopedic sandals.” A shriek rose from the garden as a magician produced a coin from a child’s ear. Bill flinched. “Hostile environment.” “Focus. Presents are stacked on the table by the conservatory. Bright paper, ribbons, gift tags, no visible guard rotation except the grandmother.” “And the clown.” “The clown is ornamental.” Bill frowned. “You always underestimate professionals in face paint.” Molly unfolded a hand-drawn plan on the damp grass. It was mostly arrows, circles, and one sketch of a cake labeled TOO MUCH. “We enter through the side gate when they sing. Noise cover, emotional distraction, adults occupied filming on phones they don’t know how to silence.” “And then?” “You drift to the drinks table disguised as a parent.” “I don’t look like a parent.” “You look exactly like a father who forgot the child’s age and bought the wrong wrapping paper.” Bill considered that. “Cruel but fair.” “I’ll cut behind the bouncy castle, reach the gift table, and load the sack.” “We have a sack?” Molly held up a folded blue IKEA bag. “We have ambition.” A little boy in a pirate hat raced past the hedge and stopped dead, staring through the branches. Molly and Bill froze. “Why are you in the bush?” the boy asked. Bill smiled in panic. “Birdwatching.” “At a party?” “Rare species,” Molly added smoothly. “The Lesser Spotted Magician. Very elusive.” The boy squinted. “You look suspicious.” “So do pirates,” said Molly. He accepted this and ran off, immediately distracted by a bubble machine. Bill exhaled shakily. “Nearly got done in by a five-year-old.” “They’re the sharpest age,” Molly said. “No taxes, all instinct.” From the garden came the first wavering notes of Happy Birthday. “That’s our cue,” Bill said. “No,” Molly hissed. “Second verse. First verse is reconnaissance. Second verse is action. Third verse is chaos.” “There’s a third verse?” “In families like this, there always is.” They slipped along the hedge to the side gate. Through it, the garden blazed with pastel anarchy. A woman in a paper crown was lighting candles while another parent filmed vertically and gave contradictory instructions to no one. Bill straightened his jacket. “Right. I’m a father. I’m disappointed in modern educational standards. I say things like ‘screen time.’” “Good. Keep your shoulders slumped and your soul extinguished.” He entered first, carrying a paper plate Molly had stolen from a bin to complete the look. Instantly, another dad nodded at him with the dead-eyed fraternity of men trapped beside folding chairs. Bill nodded back, appalled by how natural it felt. Molly darted behind the bouncy castle as the children howled through the second verse with the tuneless conviction of the recently sugared. She reached the gift table and began sweeping parcels into the blue bag. Bill sidled to the drinks table and murmured, “Problem. Grandmother on the move. Twelve o’clock. Fast walker.” “How fast?” “Church-hall competitive.” Molly zipped the bag. “Delay her.” “With what?” “Conversation. The elderly love war stories and opinions about bins.” Grandmother bore down on him, formidable as weather. Bill blurted, “Terrible what they’ve done with recycling, isn’t it?” She stopped. “At last,” she said, gripping his sleeve. “Someone sensible.” Molly hefted the bulging bag and headed for the gate. Then the clown turned, saw her, and shouted, “Oi! Those are the presents!” Every child in the garden looked up at once. Molly smiled brightly. “Game!” she cried. “Who can catch the birthday thief?” Bill tore free of Grandmother. “Run,” he said. The children screamed with delight and charged after them like tiny, vengeful cavalry.
created_at
"2026-03-06T12:09:06.17Z"
3/6/2026, 12:09:06 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
474
moderation_latency
(null)
generation_time
27124
tokens_prompt
77
tokens_completion
1250
native_tokens_prompt
66
native_tokens_completion
955
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01449
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0a8d9cdcef4f6b7c0169aac3e23aa481938e92a228bf8dd86e"
is_byok
false
latency
195
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772798946-bTV3iXrmDdVjsfizSeCI"
upstream_id
"resp_0a8d9cdcef4f6b7c0169aac3e23aa481938e92a228bf8dd86e"
total_cost
0.01449
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
697
27.74%
Dialogue to Total Word Ratio
Ratio: 44.94%, Deviation: 5.06%
neededClean
false
wordsTotal
701
wordsDialogue
315
13.8699%