NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1GXFV2Y6AD7K1XCFN7NS9R
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
30.9s
Cost:
$0.015210
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
7.93%
Dialogue to Total Word Ratio
(details)
3.9667%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again,” Molly said, peering over the park bench as if the swings might be wired, “why exactly are we robbing a birthday party?” “Because,” Bill said, offended by the question, “children are inefficient custodians of luxury goods.” “They’re six.” “Exactly. No inventories. No serial numbers. No meaningful upper-body strength.” Molly pinched the bridge of her nose. “You make villainy sound like office work.” “It is office work,” Bill said. “With better hats.” Across the street, a suburban house bristled with balloons. A banner reading HAPPY BIRTHDAY, OLIVER sagged in the middle like it had lost the will to celebrate. Parents drifted in and out carrying foil trays and gift bags. Molly squinted. “That’s a lot of witnesses.” “Witnesses under forty inches tall,” Bill said. “Half of them can’t pronounce ‘suspicious.’” “They can scream it.” Bill unfolded a hand-drawn map from his pocket. It was on the back of a pizza menu and appeared to involve three arrows, a cupcake, and the word NO DOG? written several times. “Right,” he said. “Entry through the side gate at two-fifteen, during the magician.” “There’s a magician?” “There’s always a magician. Or a clown. Either way, a man in formal wear distracting a patio.” “I hate variables in greasepaint.” “He’s not a variable. He’s an asset. Everyone watches him pull scarves from his sleeve, we remove the presents from the table, load them into the laundry hamper, and depart through the alley.” Molly stared at the map. “You’ve drawn the hamper larger than the car.” “It’s a conceptual hamper.” “And this cupcake?” “That marks the gift table.” “It looks more like a volcano.” “That’s because the icing ran.” Molly leaned back. “Suppose a parent spots us.” Bill waved a hand. “Then we improvise. You say, ‘We’re with entertainment.’” “In these clothes?” They were dressed in black from neck to shoe, Bill in a turtleneck he believed made him look European, Molly in a practical jacket with too many pockets and not enough patience. “Yes,” Bill said. “You look like a balloon artist who’s seen things.” “I look like I mug balloon artists.” “Even better. Gravitas.” A shriek of laughter drifted from the yard. Then a child’s voice yelled, “He made a sword!” Bill checked his watch. “See? Magician phase.” “That sounds more like clown behavior.” “Genre is fluid.” Molly crossed her arms. “I still don’t love stealing from children.” “We are not stealing from children,” Bill said. “We are intercepting gifts before emotional attachment forms.” “That is the sentence of a man who should not be near a toy store.” Bill tapped the map. “Focus. We need roles. I’m retrieval. You’re lookout.” “No.” “No?” “I’m retrieval. You fold under pressure.” “I do not fold.” “At the aquarium job, you apologized to a stingray.” “It looked judgmental.” “It was a fish.” “It was an atmosphere.” Molly stood and adjusted her jacket. “I take the table. You handle the gate, the car, and any unforeseen grandmothers.” “Grandmothers are your department. They like you.” “They don’t like me. They assume I know where the napkins are.” “Useful assumption.” A sudden bark erupted from the yard. Both of them froze. “You wrote NO DOG? three times,” Molly said. “Question marks indicate ongoing inquiry.” “That is a dog.” “It may be decorative.” A golden retriever bounded into view wearing a paper crown. Molly sighed. “That dog is more official than we are.” Bill folded the map with brittle dignity. “Fine. Revised plan. We infiltrate as pet wranglers.” “Why would a birthday party need pet wranglers?” “Because,” Bill said, already rising, “nothing in suburbia survives success. Once a dog has a hat, society is improvising.” Molly looked at the house, the presents stacked high in bright paper towers, the children bouncing like dropped marbles, and the crowned retriever patrolling the lawn like a tiny hairy sheriff. “You know,” she said, “for once I think the job may be beneath us.” Bill considered this. “Counterpoint: I saw at least three remote-control cars and a telescope.” “That does tempt me.” “And a large rectangular box with the posture of a game console.” Molly took a breath, then smiled despite herself. “All right. But if the magician makes me disappear, I’m taking it personally.” “That’s the spirit,” Bill said. “Now walk casually.” “We’re dressed like burglars.” “Then walk theatrically.”
created_at
"2026-03-06T12:10:49.578Z"
3/6/2026, 12:10:49 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
491
moderation_latency
(null)
generation_time
30840
tokens_prompt
77
tokens_completion
1331
native_tokens_prompt
66
native_tokens_completion
1003
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01521
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_02db03c1af1a4c190169aac449a2b0819689e09defb56ebde8"
is_byok
false
latency
128
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772799049-vAfGymhu6LOg6UIkLn2D"
upstream_id
"resp_02db03c1af1a4c190169aac449a2b0819689e09defb56ebde8"
total_cost
0.01521
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
718
7.93%
Dialogue to Total Word Ratio
Ratio: 62.88%, Deviation: 7.12%
neededClean
false
wordsTotal
722
wordsDialogue
454
3.9667%