NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1HWRBXQN814YF2H3WJNWJC
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
34.4s
Cost:
$0.014040
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
22.59%
Dialogue to Total Word Ratio
(details)
11.2957%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again,” said Molly, crouched over the kitchen table, “why we are robbing a birthday party instead of, say, a bank. Banks have money. Children have finger paint and a terrifying relationship with sugar.” “Presents,” Bill said, tapping a hand-drawn map with the solemnity of a general invading Belgium. “Concentrated valuables in decorative paper. Portable. Stackable. Emotionally under-guarded.” “Emotionally under-guarded,” Molly repeated. “Marvelous. And what are we fencing? A toy dinosaur and three jumpers knitted by an aunt named Sue?” “You lack vision,” said Bill. “There’ll be gadgets. Game consoles. Remote-control cars. Possibly a bicycle. Rich parents compete. They don’t love their children, Mol, they outsource affection in ribbon.” Molly leaned back, eyeing the map. It was drawn on the back of a pizza leaflet and featured several arrows, a skull, and a note that simply read BEWARE DOG? in suspiciously cheerful capitals. “And this,” she said, “is the house?” “That,” Bill said proudly, “is the fortress.” “It’s detached.” “Detached houses are the worst sort. Means confidence.” Outside, rain tapped the window like a nervous accomplice. Inside, the flat smelled of damp wool and burnt toast. Bill had set out the tools of the trade: black gloves, a torch, a coil of rope, and, for reasons known only to him, a kazoo. Molly pointed at it. “No.” “It’s for misdirection.” “It’s for unemployment.” Bill ignored her. “Now, the party starts at two. Clown at three. Cake at four. That gives us a narrow insertion window during peak frosting.” “Peak frosting.” “Yes. Everyone’s distracted. Parents taking photographs, children screaming, one uncle trying to start a barbecue in the rain because his masculinity demands suffering. We slip in through the side gate.” “And then?” “We blend.” “With children?” “With adults,” Bill said. “We pose as entertainers.” Molly stared. “I’m sorry. I need to hear you say that again because I want to be certain I’m leaving this profession for the correct reasons.” “We pose as entertainers.” “You’re six foot two and look like a debt collector who learned magic in prison.” “That’s hurtful. Also useful.” “And what am I?” “A face painter.” Molly pinched the bridge of her nose. “Bill, I can barely paint a wall.” “You don’t need talent. You need confidence and a butterfly stencil.” “What are you?” Bill spread his hands. “Balloon artist.” “You can’t tie your own shoelaces under pressure.” “I’ve been practicing.” He lifted a long pink balloon from his pocket and began twisting. It squeaked alarmingly, bulged in the middle, and exploded with the violence of a tiny gunshot. Both of them jumped. After a moment Molly said, “Well. The children will certainly remember your work.” Bill tossed the rubber remains aside. “Prototype.” She bent over the map again. “Where are the presents?” “Dining room. Back left. There’s a big table. We enter, mingle, locate the stash, transfer gifts into bin bags, and exit through the kitchen before anyone notices.” “You just said ‘bin bags’ about birthday presents. Even for us, that’s bleak.” “Laundry sacks, then. Classier.” “And if someone asks who invited us?” Bill smiled with the reckless confidence of a man whose best ideas usually ended in running. “We say, ‘Gary did.’ There’s always a Gary. No one knows exactly who he is, but he’s somehow involved.” Molly laughed despite herself. “That is infuriatingly true.” “Thank you.” “And if there’s no Gary?” “Then we invent administrative confusion. Parents fear two things: choking hazards and being rude in front of other parents. We weaponize manners.” Molly was quiet for a moment, listening to the rain, the old radiator clanking like loose cutlery. “You know,” she said, “there is something especially wicked about stealing from children.” Bill considered that. “Counterpoint: they’re small and can’t chase us far.” She looked at him, then snorted. “You are going to hell in a very silly hat.” “Maybe,” he said, folding the map. “But if all goes well, I’ll get there with a new bicycle.”
created_at
"2026-03-06T12:27:54.116Z"
3/6/2026, 12:27:54 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
406
moderation_latency
(null)
generation_time
34376
tokens_prompt
77
tokens_completion
1193
native_tokens_prompt
66
native_tokens_completion
925
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01404
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_029a6d01ff2111ad0169aac84a2c3081948b08df8fdce8141d"
is_byok
false
latency
122
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772800074-bUk4O6JP8TJANS3k4ZLW"
upstream_id
"resp_029a6d01ff2111ad0169aac84a2c3081948b08df8fdce8141d"
total_cost
0.01404
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
656
22.59%
Dialogue to Total Word Ratio
Ratio: 64.55%, Deviation: 5.45%
neededClean
false
wordsTotal
660
wordsDialogue
426
11.2957%