NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1HGNKRBYWM7V15C01CBJ0G
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
27.2s
Cost:
$0.014730
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
40.46%
Dialogue to Total Word Ratio
(details)
20.2294%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill crouched behind the privet hedge of Number Twelve, where the garden fairy lights twinkled with the sort of cheerful menace that only a children’s party could achieve. Beyond the gate came shrieks, the boing of a rented castle, and the shrill command of a mother who had said “indoor voice” so many times it had become abstract poetry. Bill adjusted his flat cap and peered through the leaves. “I still say we’re hitting the wrong demographic. Tiny people are unpredictable.” “They’re distracted by sugar and face paint,” Molly said, consulting a hand-drawn map on the back of a bus timetable. “This is a clean operation. Fifteen children enter. Fifteen children scream near a magician. We slip in, remove the presents, and vanish before anyone notices little Tarquin has been materially downsized.” Bill frowned. “His name is Tarquin?” “It is on the banner. Tarquin Turns Six. The alliteration alone tells you this family has no situational awareness.” In the garden, a clown on stilts folded in half to pass under a gazebo and emerged looking spiritually diminished. A table groaned under wrapped boxes in metallic paper, each one a little brick of middle-class optimism. Bill licked his lips. “There must be a games console in there. Maybe a telescope. Rich children get educational presents so their parents can feel superior.” “Focus,” Molly said. She tugged at the strings of her disguise: a peach cardigan, sensible shoes, and a name sticker reading AUNTIE PAM. “We are not shopping. We are liberating.” Bill looked down at his own costume: a balloon artist’s apron and a portable pump. “And I’m still not convinced this was the best role for me.” “You know three balloon shapes.” “I know one balloon shape and two accidents.” “Good enough. If challenged, make a dog.” Bill inhaled shakily. “What if it becomes a submarine again?” “Then tell them it’s modern.” They slipped through the side gate with the smooth confidence of people who had never once been invited anywhere honestly. No one stopped them. In a children’s party, adults moved with the haunted vacancy of wartime medics. Molly picked up a tray of sausage rolls and crossed the lawn as if she’d owned the place since the reign of Edward VII. Bill followed, immediately cornered by a girl with glitter eyebrows. “Can you make a unicorn?” He glanced at Molly in panic. She smiled brightly and mouthed, Improvise. “I can make,” Bill said, pumping furiously, “a horse that believes in itself.” The girl accepted the lumpy balloon creature with the grave generosity of youth and trotted off. Molly had reached the present table. “Right,” she murmured as Bill joined her. “I’ll bag, you block.” “With what?” “Your body. It’s what nature gave you.” He spread himself casually in front of the pile while Molly slid gifts into an enormous floral tote bag. A robot kit vanished. A boxed scooter disappeared. Something suspiciously expensive in embossed paper followed. Bill smiled at passing parents with the sickly warmth of a man impersonating wholesomeness under duress. “Lovely weather for it,” he said to no one in particular. Then a small boy in a paper crown appeared at Molly’s elbow. The birthday child. Tarquin himself, sticky with jam and authority. “That,” he said, pointing to the tote, “is my present bag.” Molly did not blink. “Quality control.” Tarquin narrowed his eyes. “You’re not my auntie.” “No,” said Bill quickly, twisting a balloon into a shape that looked like administrative panic, “she’s from the present inspection board.” Tarquin considered this. “There’s a board?” “Oh, absolutely,” Molly said. “Terrible year for counterfeit dinosaurs.” The boy glanced at his mountain of loot, suddenly protective. “I did ask for a T. rex.” “Then you’ve done the right thing,” Molly said, lifting the bulging bag. “We’ll just test these for safety.” At that exact moment the clown, still emotionally compromised, tripped over a croquet hoop and crashed into the bouncy castle. It deflated with a long despairing wheeze. Children screamed with delight. Parents stampeded. In the chaos, Molly seized Bill by the sleeve. “Exit, stage left.” Bill sprinted after her through the gate. “Did we get everything?” Molly looked into the bag bouncing against her hip and grinned. “No. Better. We got the good stuff.” From inside came Tarquin’s outraged cry. “Mum! The government took my presents!”
created_at
"2026-03-06T12:21:18.079Z"
3/6/2026, 12:21:18 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
236
moderation_latency
(null)
generation_time
27102
tokens_prompt
77
tokens_completion
1230
native_tokens_prompt
66
native_tokens_completion
971
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01473
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0682ca965ebbe5140169aac6be25208196970873e319a00e0b"
is_byok
false
latency
171
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772799678-ZHlFdRlN2ygLaR60azmc"
upstream_id
"resp_0682ca965ebbe5140169aac6be25208196970873e319a00e0b"
total_cost
0.01473
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
722
40.46%
Dialogue to Total Word Ratio
Ratio: 34.25%, Deviation: 4.25%
neededClean
false
wordsTotal
724
wordsDialogue
248
20.2294%