NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KK9QP92NQ4S3ZBEV07YKJM4D
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
35.4s
Cost:
$0.017745
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
55.60%
Dialogue to Total Word Ratio
(details)
27.7977%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the garden wall, peering through a gap in the ivy at a sea of balloons, paper hats, and sugar-crazed children orbiting a trampoline like tiny deranged planets. Bill, beside her, adjusted a pair of opera glasses he’d stolen from a widow and never quite learned to use. “You see the table?” Molly whispered. “I see seventeen tables. One’s got sausage rolls, one’s got juice boxes, and one appears to be a shrine to a cartoon rabbit in a waistcoat.” “The present table, Bill.” “Oh. Yes. Under the gazebo. Blue wrapping paper, dinosaur wrapping paper, pink wrapping paper, one with little rockets. Honestly, it looks less like a birthday and more like a hostage situation for toys.” Molly grinned. “Beautiful, isn’t it?” “Beautiful is a cathedral at sunset. That is six months’ rent in battery-operated plastic.” A child shrieked with joy. Bill flinched. “Good Lord. Are they all meant to make that noise?” “They’re children. It’s their burglar alarm setting.” Bill lowered the glasses. “I still don’t like it.” “You don’t like anything.” “I like quiet pubs, cash in plain envelopes, and crimes where the witnesses have weaker memories.” Molly pointed toward the clown making balloon animals by the fence. “That’s our opening.” “The clown?” “The clown.” “I hate every word in this conversation.” “We wait until he starts the magic show. All eyes on him, all ears bleeding, and we slip in through the side gate.” “In what disguise?” “In these.” Molly held up two lurid party-shop costumes from a duffel bag: one pirate, one princess. Bill stared. “You expect me to commit felony theft dressed as a princess?” “I expect you to show flexibility.” “I had flexibility once. Prison took it out of me.” “You’re the princess.” “Why am I the princess?” “Because I’m not shaving my beard onto a fake pirate face.” Bill looked wounded. “That is a very private medical detail to weaponize.” Molly shoved the tiara into his hands. “Focus. We mingle, we smile, we say we’re from the entertainment company—” “We don’t look like entertainers.” “You especially don’t, but that’s where confidence comes in.” “My confidence left when you said ‘tiara.’” A little girl in the garden smacked a piñata with unearned fury. Bill watched the children scramble for sweets like piranhas in party shoes. “Animals,” he muttered. “Exactly. Chaos. During cake, we stack the presents into the inflatable castle trolley.” Bill turned slowly. “The what?” Molly nodded toward a bouncy castle shaped like a dragon. “Deflates into a carry bag. I bribed the delivery man to leave the trolley behind.” “You bribed the delivery man?” “With cupcakes.” “That man sold out his profession for frosting?” “Everyone has a price, Bill.” “Mine used to be higher than baked goods.” “Inflation.” He sighed and pulled the princess dress over his shirt. “If I’m arrested in tulle, I’m giving the police your real birthday.” “That’s fair.” He jammed the tiara on. “And what if someone asks whose side we’re on?” “Whose side?” “At children’s parties, apparently there are sides. Mums ask things like, ‘Are you with Oliver or with Poppy?’ One wrong answer and they look at you like you’ve brought plague.” Molly considered this. “Say you’re with catering.” “I don’t know anything about catering.” “You know how to carry a tray and look disappointed. That’s half of catering.” Bill peeked again through the ivy. “There’s a dog.” “So?” “It’s wearing a neckerchief. A dog in formalwear notices things.” “Then don’t act suspicious.” Bill blinked at her from under the tiara. “Molly, I am a middle-aged man dressed as a fairytale tax fraud. Suspicious is doing heavy lifting.” The clown honked a horn. Children swarmed toward him in a shrieking tide. Molly rose. “Showtime.” Bill clutched the skirt as they headed for the gate. “If I die here, bury me face down so the world can kiss my—” “Smile,” Molly hissed. A mother spotted them and beamed. “Oh wonderful, more entertainers!” Bill’s face froze in a ghastly grin. “That’s right,” he said in a strangled falsetto. “I’m Princess… Billow.” Molly swept into a bow. “And I’m Captain Sparklebeard.” The mother clapped. “Perfect timing! Can either of you make balloon swords?” Molly and Bill stared at her. Then Molly said, “Madam, we can make anything disappear.”
created_at
"2026-03-09T16:43:08.767Z"
3/9/2026, 4:43:08 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
3452
moderation_latency
(null)
generation_time
35296
tokens_prompt
77
tokens_completion
1299
native_tokens_prompt
66
native_tokens_completion
1172
native_tokens_completion_images
0
native_tokens_reasoning
160
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.017745
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0c49a9f895df9abe0169aef89cdbd88197a793faac52d32a6f"
is_byok
false
latency
170
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773074588-4JQW7fQTn7fDtH4yRh2U"
upstream_id
"resp_0c49a9f895df9abe0169aef89cdbd88197a793faac52d32a6f"
total_cost
0.017745
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
711
55.60%
Dialogue to Total Word Ratio
Ratio: 66.57%, Deviation: 3.43%
neededClean
false
wordsTotal
715
wordsDialogue
476
27.7977%