NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1HAQ56X6GDHFD2T2MMS9PV
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
30.3s
Cost:
$0.015045
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
88.64%
Dialogue to Total Word Ratio
(details)
44.3200%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again,” said Molly, crouched over the kitchen table with a pencil clenched between her teeth, “why we are robbing a birthday party.” “Because,” said Bill, holding up a crayon drawing he’d stolen from the invitation, “children receive astonishing presents. Remote-control cars, tablets, cash in cards, tiny golden jewelry from overgenerous aunts. It’s a diversified portfolio with cake.” “That,” Molly said, “is the ugliest sentence ever spoken.” “It’s not ugly if it’s profitable.” Molly snatched the drawing. “This pirate is wearing two eye patches.” “Exactly. These are not observant people.” Rain tapped the window. Between them sat the battle plan: a hand-drawn map of the community hall, three juice boxes, and a plate of biscuits Bill had labeled Operational Rations in blue icing. “I’m serious,” Molly said. “A children’s party is chaos wrapped in screaming. There’ll be parents.” “Distracted parents.” “There’ll be balloons.” “That’s not a security feature.” “It is if you hate sudden noises.” Bill leaned back in his chair. “Listen. We enter as entertainers.” Molly stared at him. “No.” “Clowns are invisible.” “Clowns are memorable. If a clown steals your toaster, you mention the clown.” “Fine. Face painters.” “Bill, you can’t paint a wall.” “I can paint a tiger.” “You painted a tiger once. It looked like a haunted sofa.” Bill pointed at the map. “Then simple infiltration. We pose as Uncle Terry and Aunt Linda.” “Do those exist?” “They do now.” “Children can smell a fake adult.” “Only if the fake adult is afraid.” Molly folded her arms. “I am not being outwitted by a seven-year-old in a paper crown.” “You say that now.” She tapped the invitation. “Whose party is this, exactly?” “Little Oliver Finch. Turning six. The hall opens at one. Magic show at two. Cake at three. Present mountain assembled near the stage from one-fifteen onward.” “Present mountain?” “That’s what they called it.” “That’s obscene,” Molly muttered. “There are children with one sock and a satsuma, and this Oliver has a mountain.” Bill grinned. “Now you’re morally invested.” “I’m not morally invested. I’m professionally offended.” He slid a list across the table. “We need roles. I distract. You lift.” “Why do you distract?” “Because I’ve got warmth.” “You’ve got volume.” “Children love me.” “Dogs distrust you on sight.” “That’s unrelated.” Molly studied the list. “What’s ‘Operation Pass the Parcel’?” Bill looked pleased. “A timed extraction. While the music’s on, everyone circles. When it stops, confusion. In confusion, we move gifts out the fire exit.” “With what? Our dazzling upper-body strength?” “With a trolley.” “You own no trolley.” “I can borrow one from my cousin.” “Your cousin steals scaffolding for sport.” “He’s community-minded.” She rubbed her temple. “Suppose a parent asks which child we belong to.” “We laugh sadly and say, ‘They all do, in a way.’” “That is psychotic.” “Then we say we’re from the bakery.” “In leather gloves?” “Festive leather gloves.” Molly looked at the map again, and despite herself, a smile crept in. “How many presents do you think there are?” Bill lowered his voice reverently. “At least thirty.” “Thirty,” she echoed. “Maybe more. There’s a bouncy castle. People don’t rent inflatables for modest gift expectations.” She sighed. “All right. But no clowning, no face paint, and if a child asks me to watch a card trick, we abort.” Bill thrust out his hand. “Partners?” She shook it. “Partners.” He beamed. “Tomorrow, we relieve one over-celebrated six-year-old of the burden of abundance.” Molly stood and pocketed the invitation. “And tonight?” “Tonight,” Bill said, reaching for a biscuit, “we practice looking trustworthy.” She took one too. “That’ll take longer than stealing the presents.” “Maybe,” Bill said. “But if anyone asks, we’re excellent with children.” From somewhere downstairs came a crash, followed by Bill’s cat yowling like a police siren. Molly raised an eyebrow. “Your own cat filed a complaint.” Bill listened a moment, then shrugged. “Good. Keeps my instincts sharp.” “Bill?” “Yes?” “If this ends with me running through a church car park carrying a sack of Lego, I’m leaving the country.” He bit into the biscuit and nodded thoughtfully. “Reasonable.”
created_at
"2026-03-06T12:18:03.054Z"
3/6/2026, 12:18:03 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
464
moderation_latency
(null)
generation_time
30259
tokens_prompt
77
tokens_completion
1329
native_tokens_prompt
66
native_tokens_completion
992
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.015045
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_058de7bae9cb75f50169aac5fb1fbc8194800efac11a225e11"
is_byok
false
latency
162
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772799483-IrPODQW38qlMuMZ7zr6v"
upstream_id
"resp_058de7bae9cb75f50169aac5fb1fbc8194800efac11a225e11"
total_cost
0.015045
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
679
88.64%
Dialogue to Total Word Ratio
Ratio: 71.55%, Deviation: 1.55%
neededClean
false
wordsTotal
689
wordsDialogue
493
44.3200%