NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1GPYXESTG9QEVWH5490BW9
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
23.5s
Cost:
$0.012180
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
84.44%
Dialogue to Total Word Ratio
(details)
42.2178%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge in a velvet blazer the color of bad decisions, studying the church hall across the lane. Bunting flapped. Balloons bobbed. Through the windows came the sugar-bright chaos of a magician warming up and children already shrieking as if someone had announced free ponies and no bedtime ever again. Bill, beside her, unfolded a hand-drawn map that looked less like a plan and more like a nervous octopus. “Tell me again,” said Molly, “why we are robbing a seven-year-old’s birthday party.” “Because,” Bill whispered, tapping the map with criminal dignity, “children receive an alarming amount of high-end merchandise for doing absolutely nothing.” “They survive being seven.” “So do pigeons. No one gives them remote-control helicopters.” Molly peered through the hedge. “I count twenty-three gift bags on the table. Metallic paper. Good ribbons. One suspiciously large box.” “The large box is our north star,” Bill said. “That’s where ambition lives.” “Aren’t you troubled,” Molly asked, “by the ethics?” “Constantly. But mostly by the wrapping.” A parent in a paper crown hurried past the window carrying orange squash like a man transporting unstable chemicals. Bill lowered himself deeper into the shrubbery. “We go in disguised,” he said. “Blend with the crowd. Smile at authority. Exit with the goods before cake.” “What are we disguised as?” Bill looked pleased. “Entertainers.” Molly slowly turned to him. “We are wanted in three counties.” “Precisely why no one expects us to do balloon animals.” “I can’t make a balloon animal.” “You can make a balloon regret.” She took the clown wig from his bag and held it as if it were a dead rat wearing optimism. “If I put this on, Bill, our friendship becomes an allegation.” “You said you wanted subtle.” “This is not subtle. This is a cry for intervention.” Across the road, a child in a pirate hat pressed his face to the window, then vanished. Molly exhaled. “Fine,” she said. “Talk me through it.” Bill straightened. “We enter through the side door during the magician’s second routine. Historically, that’s when parental vigilance collapses. You distract the adults—” “With what?” “Light conversation. A practical fire. Improvise. I slip to the presents table, transfer all gifts into the catering trolley waiting by the kitchen, and wheel them out the rear exit.” “There’s a catering trolley?” “There will be if confidence works.” Molly narrowed her eyes. “Confidence is not a vehicle.” “It is in this country.” Music thumped from inside: a children’s song played at a volume normally used to disperse rioters. Bill tucked in his shirt and adjusted a bow tie patterned with tiny ducks. “If we’re caught,” Molly said, “I’m telling them you’re from the agency.” “If we’re caught,” Bill replied, “cry and mention allergies. Adults retreat from paperwork.” She put on the wig. It sat on her head like an insult. “How do I look?” “Like a woman with a tragic backstory and access to face paint.” “Good. I’ll channel grief.” They crossed the lane with the brisk false purpose of people who had never once paid taxes honestly. At the side door Molly paused, hand on the handle. “One last question,” she murmured. “What do we do if the children notice?” Bill smiled, all nerve and nonsense. “Molly, children notice everything. That’s why we move fast and wave constantly.” Inside, a hundred balloons shivered at once, and from the hall came the roar of a room discovering cake was imminent. Molly opened the door. “Showtime,” she said. “Party time,” said Bill. “Don’t ever say that again.” “Professionally noted.”
created_at
"2026-03-06T12:07:15.639Z"
3/6/2026, 12:07:15 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
444
moderation_latency
(null)
generation_time
23479
tokens_prompt
77
tokens_completion
1052
native_tokens_prompt
66
native_tokens_completion
801
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01218
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0ec778e24037f8ad0169aac373b160819783b6c656ea88272a"
is_byok
false
latency
154
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772798835-Hj2jkMEgvM84X7nNAYcW"
upstream_id
"resp_0ec778e24037f8ad0169aac373b160819783b6c656ea88272a"
total_cost
0.01218
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
591
84.44%
Dialogue to Total Word Ratio
Ratio: 51.84%, Deviation: 1.84%
neededClean
false
wordsTotal
598
wordsDialogue
310
42.2178%