NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1HCN90E9RBRF8V2WGB992D
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
28.1s
Cost:
$0.013545
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
2.35%
Dialogue to Total Word Ratio
(details)
1.1737%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“I’m telling you, Bill, a children’s party is the perfect target,” Molly said, flattening a hand-drawn map across the café table between the sugar dispenser and a damp ring left by someone else’s tea. “No one expects criminal enterprise near a bouncy castle.” “No one respectable, maybe,” Bill muttered. “I do. I hear the phrase children’s party and immediately expect chaos, screaming, and a man dressed as a pirate with the eyes of a hostage.” “Exactly. Camouflage.” “Camouflage is khaki,” Bill said. “This is balloons and jam.” Molly tapped the map. “Observe. Three o’clock: magician arrives. Three-fifteen: cake unveiled. Three-seventeen: every adult in attendance turns into a photographer with no peripheral vision. During that window, we lift the presents.” Bill stared at the drawing. “Why is the house shaped like a potato?” “That is architectural shorthand.” “It’s a potato with windows.” “It’s a detached potato,” Molly said. “Focus. The gifts are stacked in the conservatory.” “The what?” “The glass room.” “Ah. The greenhouse for cushions.” Molly leaned in. “French doors here. Side gate here. Escape route through the alley, over Mrs. Wetherby’s wall, into the car.” Bill frowned. “Mrs. Wetherby keeps geese.” “She keeps one goose.” “It has the spirit of seven.” “That goose and I have an understanding.” “That’s what worries me. Last time you said that about an animal, I spent twenty minutes in a hedge while a dachshund judged me.” A waitress set down their coffees. Molly smiled sweetly until she had gone. “We need disguises,” Bill said. “We cannot simply stroll into a child’s birthday looking like the tax audit of doom.” “We go in as entertainers.” Bill recoiled. “No.” “Yes.” “I refuse to balloon anything.” “You won’t have to. You’ll be my assistant.” “I’d rather be a pallbearer.” Molly ignored him. “I’ll do close-up magic. You carry props, look mournful, and say as little as possible.” “I can do mournful,” Bill admitted. “It’s one of my natural gifts.” “Good. We arrive with confidence, a large case, and vague accusations that the agency failed to mention the stairs.” Bill rubbed his forehead. “Suppose someone asks us to perform.” “Then I perform.” “Can you do magic?” “I can steal a wristwatch while explaining misdirection. That’s basically wizardry.” He sipped his coffee. “What about the presents? Children notice presents.” “Children notice sugar. Presents are decorative until the appropriate howling phase.” “And if one of them sees us?” Molly smiled. “Then we say, ‘Special delivery inspection.’” “That is not a thing.” “It is if you say it with a clipboard.” Bill considered this. “Do we have a clipboard?” “We will by noon.” Outside, rain slicked the pavement, turning the streetlamps milky in the window. Molly folded the potato-house map with priestly care. “There is one ethical concern,” Bill said. “Only one? We’re maturing.” “We are stealing from children.” “We are stealing from affluent parents who overcompensate with imported wrapping paper.” “That is a disturbingly smooth distinction.” “It’s accurate. Besides, children never remember who gave them what. Half those gifts could vanish and they’d blame an uncle.” Bill exhaled through his nose. “What’s the haul?” “Remote-control car, three dolls, one suspiciously enormous dinosaur, art supplies, two board games, and something labeled Deluxe Princess Fortress.” Bill looked up. “What on earth do we want with a Deluxe Princess Fortress?” Molly’s expression turned grave. “Bill, don’t be naïve. We fence the car, the dolls, the dinosaur. We keep the fortress.” For the first time all morning, Bill smiled. “With the little towers?” “With the little towers,” Molly said. He lifted his cup. “To larceny.” “To elegance,” Molly corrected. “To stealing from a party where someone will definitely vomit on a clown.” Molly clinked his cup. “And to getting out before the piñata. I have standards.”
created_at
"2026-03-06T12:19:06.665Z"
3/6/2026, 12:19:06 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
507
moderation_latency
(null)
generation_time
28043
tokens_prompt
77
tokens_completion
1187
native_tokens_prompt
66
native_tokens_completion
892
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.013545
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_01717f79941e19b60169aac63abd98819484fc92820b161bca"
is_byok
false
latency
159
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772799546-4MswHZph7UnmxxaFKvIm"
upstream_id
"resp_01717f79941e19b60169aac63abd98819484fc92820b161bca"
total_cost
0.013545
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
623
2.35%
Dialogue to Total Word Ratio
Ratio: 78.66%, Deviation: 8.66%
neededClean
false
wordsTotal
628
wordsDialogue
494
1.1737%