NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1GMCC3H5E2FDW8DH8K3114
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
29.3s
Cost:
$0.016200
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
1.92%
Dialogue to Total Word Ratio
(details)
0.9609%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again,” said Molly, flattening the crinkled invitation on the café table, “why we are robbing a seven-year-old.” Bill leaned back, offended in principle. “Not robbing a seven-year-old. Reallocating assets from an excessively gift-rich environment.” “It says here, ‘Bouncy castle, magician, pony rides, and a chocolate fountain.’” Molly tapped the card. “This child is basically a tiny duke.” “Exactly,” said Bill. “No one needs that many presents before they’ve learned long division.” Molly sipped her tea. “And your moral compass feels comfortable because?” “Because,” Bill said, counting on his fingers, “one: spoiled children build character through adversity. Two: wrapped presents are just loot with ribbons. Three: I already bought a clown wig.” “That’s not morality, Bill. That’s sunk cost.” He produced a notebook from inside his jacket. “Observe the plan.” “I’m already worried by your use of the word observe.” “Phase one,” Bill said, opening the notebook. “Insertion.” “Through the front gate?” “Disguised as entertainers.” Molly stared at him. “You want us to be clowns.” “I want us to be invisible,” Bill corrected. “At a children’s party, no one questions bad tailoring and emotional instability if there are balloons attached.” “I question it constantly.” “Phase two: reconnaissance. We identify the present table.” “Likely the table groaning under the weight of toy dinosaurs and educational lies.” “Precisely. Phase three: extraction.” “With what? A sack marked SWAG?” Bill smiled. “A gift van.” “That is not a thing.” “It is if we put magnets on the side and write ‘Happy Parcel Logistics.’ People believe any vehicle with a logo.” Molly rubbed her temple. “What about the parents?” “Distracted by the magician.” “What magician?” “You,” Bill said. There was a long pause. “I beg your pardon?” “You’re good with cards.” “I know one card trick, Bill. It is less a trick and more a statistical coincidence.” “Children are idiots,” he said. “You make the queen appear in your sleeve, they’ll scream like you’ve parted the Red Sea.” “And what do you do while I’m astonishing the underdeveloped?” “I’m the clown.” “Of course you are.” “I mingle. I balloon-animal. I map adult sightlines. Then, at the key moment, I announce a game.” “What game?” “Pass the Parcel,” Bill said triumphantly. Molly blinked. “That is appalling.” “It’s elegant.” “It’s theft themed around theft.” “It’s thematic,” he said. “I gather all the children in one place, blindfold the birthday boy—” “No.” “—lightly blindfold—” “No, Bill.” “Fine. Distract him with a kazoo. Meanwhile, I wheel out the presents under the pretense of enhancing suspense.” “Who wheels thirty presents without suspicion?” “The clown.” “Actually,” Molly admitted, “fair.” Bill leaned forward. “You see it now.” “I see prison with bunting.” Rain ticked against the window. Bill turned the invitation over and sketched a rectangle. “House here. Garden here. Bouncy castle here.” “You’ve drawn the bouncy castle larger than the house.” “It’s a tactical concern. Those things squeak. Hard to sneak past.” “What about the pony rides?” Bill froze. “There are ponies?” “There are ponies.” “Ponies complicate everything.” “How?” “They look at you,” Bill said darkly. “Like tiny police.” Molly laughed despite herself. “So what’s the contingency if the clown is intercepted by horse cops?” “Then we invoke phase four.” “There’s a phase four?” “There is now. Cake incident.” She narrowed her eyes. “Define incident.” “A dramatic but non-lethal collapse of the cake table. Buttercream chaos. Adults panic. Children howl. We depart.” “With the presents.” “With the presents.” Molly looked again at the invitation, with its cheerful balloons and impossible abundance. “This is a terrible plan.” Bill nodded. “The best ones are.” “And if we’re caught?” “We say we’re performance artists making a statement about consumerism.” “At a child’s birthday?” “Especially at a child’s birthday.” Molly set down her teacup. “I refuse to be a magician.” “Agreed.” “I will be face painting.” Bill considered. “Can you paint a tiger?” “I can paint despair with whiskers.” “Good enough,” he said, and tore the invitation neatly in half. “Saturday, two o’clock. Wear practical shoes and a smile that suggests recent police contact.” Molly stood, pulling on her gloves. “Bill.” “Yes?” “If you make me ride a pony during the escape, I’ll hand you to the tiny duke myself.” Bill rose with grave dignity. “Molly, I would never endanger a professional relationship with equestrian nonsense.” She headed for the door. “That means yes.” “It means maybe a small pony,” he called after her. “For speed.”
created_at
"2026-03-06T12:05:51.114Z"
3/6/2026, 12:05:51 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
420
moderation_latency
(null)
generation_time
29291
tokens_prompt
77
tokens_completion
1474
native_tokens_prompt
66
native_tokens_completion
1069
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0162
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_07a966144f01d6580169aac31f2d2881909bd808e90863a30e"
is_byok
false
latency
179
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772798751-Juzw8O0XchvMxdynnp7W"
upstream_id
"resp_07a966144f01d6580169aac31f2d2881909bd808e90863a30e"
total_cost
0.0162
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
732
1.92%
Dialogue to Total Word Ratio
Ratio: 78.89%, Deviation: 8.89%
neededClean
false
wordsTotal
739
wordsDialogue
583
0.9609%