NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KK9QD0ASRBKTWCPNHBEZAFVN
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
31.0s
Cost:
$0.016500
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
26.63%
Dialogue to Total Word Ratio
(details)
13.3128%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again,” Molly said, peering over the garden wall, “why we’re robbing a seven-year-old.” “We are not robbing a seven-year-old,” Bill whispered, offended. “We are intercepting an inefficiently guarded concentration of luxury goods wrapped in festive paper.” “In a bouncy castle.” “Exactly. No one suspects crime near a giraffe-shaped slide.” Molly adjusted her cap and looked into the garden where balloons bobbed over a troop of shrieking children. “I count twelve small humans, one clown, two exhausted parents, and a magician who appears to be losing a fight with a dove.” Bill nodded gravely. “The dove improves our odds.” “The clown worsens them.” “He’s on stilts. If he gives chase, he’ll fold like a deckchair.” Molly sighed. “You know, in my youth, thieves stole jewels.” “In your youth,” Bill said, “you stole a church raffle ham.” “That ham was abandoned by Providence.” A child’s scream cut through the air, followed by delighted applause. “What was that?” Bill asked. “Pin the tail on the donkey.” “No,” Molly said after another beat. “That was the sound of a grandfather standing on a paper plate.” Bill rubbed his hands together. “Right. Let’s review. We enter as entertainment.” “Dangerous. You juggle like a man trying to swat bees.” “I wasn’t suggesting juggling. I was suggesting dignity. I carry the speaker, you carry the face paint box, we claim to be late additions.” “You carrying a speaker suggests theft before you’ve even started.” “Fine. New angle. We’re parents.” Molly looked at him. “Whose?” Bill looked back. “Temporary.” “Bill, you are sixty and dressed like a substitute geography teacher. No one will believe you’ve produced a child.” “I could be a grandfather.” “You’d have to show affection.” He frowned. “Fair point.” Another burst of cheering rose from inside the garden. Molly stood on tiptoe. “Present table’s by the patio doors. Good location if you’re a child. Terrible location if you’re a criminal.” “Which is why,” Bill said, tapping his nose, “we create a distraction.” “With what?” He brightened. “Pony.” “You have not got a pony.” “I know a man with access to one.” “At this speed the child will be thirty before we collect the loot.” Bill considered. “Then cake.” “You intend to distract them with their own cake?” “No, with the possibility of catastrophe involving cake. Works every time. Humans are moths around icing.” Molly gave him a long look. “That is the first intelligent thing you’ve said all week.” “It’s only Tuesday.” “And yet you’ve had time to suggest disguising us as twins.” “It was a bold idea.” “It was a bald idea. You wanted us both in the same wig.” Bill straightened his tie. “Listen. When the cake comes out, every adult turns sentimental and every child turns feral. In that window, I sweep the presents into the laundry basket by the shed.” “There is a laundry basket by the shed?” “There is if destiny loves professionals.” Molly glanced again. “There is, in fact, a laundry basket by the shed.” Bill smiled modestly. “I notice details.” “You once failed to notice a policeman.” “He was crouching.” “That’s what they do, Bill.” From the party came the wheeze of a microphone. A woman announced, “Time for the birthday song!” Bill inhaled sharply. “This is it. The sacred interval.” Molly caught his sleeve. “Wait. What’s our exit?” “Garden gate, alley, van, gone.” “The van with ‘Pete’s Plumbing’ written on the side?” “People trust plumbing.” “They do not trust you doing plumbing.” “I’m not doing plumbing. I’m transporting joy.” “You are stealing toy dinosaurs from children.” “Rescuing,” Bill corrected. “From a life of sticky fingers and neglect.” Molly snorted. “You are absurd.” “And yet,” Bill said, as the first notes of the birthday song drifted over the wall, “you’re still here.” She smiled despite herself. “Only because I’ve always wanted to rob somewhere with a piñata.” Bill offered his arm. “After you. Let us be elegant.” Inside the garden a child yelled, “Mum, the clown’s on fire!” Molly blinked. “That wasn’t us.” Bill grinned. “Then fortune has sent an opening.” They vaulted the wall together with all the grace of dropped furniture.
created_at
"2026-03-09T16:38:04.897Z"
3/9/2026, 4:38:04 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
3275
moderation_latency
(null)
generation_time
30903
tokens_prompt
77
tokens_completion
1280
native_tokens_prompt
66
native_tokens_completion
1089
native_tokens_completion_images
0
native_tokens_reasoning
124
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0165
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_02c842b6732b83a10169aef76cf9148194b73517bdf22e17b7"
is_byok
false
latency
185
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773074284-uBRiitbwGG2vWLzcE95G"
upstream_id
"resp_02c842b6732b83a10169aef76cf9148194b73517bdf22e17b7"
total_cost
0.0165
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
687
26.63%
Dialogue to Total Word Ratio
Ratio: 75.14%, Deviation: 5.14%
neededClean
false
wordsTotal
692
wordsDialogue
520
13.3128%