NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1GD4WWDB11DS8QYGF2RE31
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
31.1s
Cost:
$0.014385
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
4.10%
Dialogue to Total Word Ratio
(details)
2.0524%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again,” said Molly, peering over the dashboard at the balloon-arch swaying across the street, “why we are robbing a seven-year-old.” “We are not robbing a seven-year-old,” Bill said. “We are liberating a concentrated pile of boxed luxury from the oppressive regime of a suburban birthday.” “That sentence has too many syllables to be legal.” “It’s called professionalism. Look at that driveway. Twelve bicycles, three scooters, and one inflatable dinosaur wearing sunglasses. That party is financed.” Molly adjusted her cap. “Presents for children are usually sticky, noisy, and powered by batteries too small to live.” “Sometimes,” Bill said, “they’re game consoles, tablets, and gift cards tucked inside tasteful envelopes by emotionally unavailable uncles.” “Ah,” said Molly. “The uncle market. Ruthless.” Bill unfolded a hand-drawn map. “Entry point: side gate. Cover story: hired entertainers delayed by traffic.” “We don’t have costumes.” Bill reached into the back seat and produced two tote bags. “I have a magician’s cape and a fox onesie.” Molly stared. “Which one is for me?” “The cape says ‘Mystico.’ The onesie has a tail.” “You bought one adult fox onesie and one cape with rhinestones?” “I bought what was available at a discount party warehouse next to a carpet place. Great urgency rarely gets ideal tailoring.” Molly sighed. “Fine. I’m not wearing the tail.” “You’re absolutely wearing the tail. It distracts from your burglary posture.” “My burglary posture is elegant.” “Your burglary posture looks like a librarian sneaking out with a chandelier.” Across the street, children shrieked with the kind of joy that made responsible adults tired. A woman in a paper crown carried a cake toward the garden. “That must be the mother,” Molly said. “Target presents likely stacked near the dining room.” “Or under the snack table,” Bill said. “Parents love putting expensive things under a folding table covered in cartoon sharks. It gives them a false sense of camouflage.” Molly pulled on the fox hood. “If I get jam on this, I’m billing you.” “You can’t bill your partner.” “I can emotionally invoice you for years.” They crossed the street with the brisk confidence of people who hoped no one would ask follow-up questions. At the gate, Bill swept the cape over one shoulder. “Remember,” he murmured, “smile like you belong.” “I always smile like I belong,” Molly said. “I have the face of a woman returning soup at a hotel.” A father intercepted them by the patio. “Oh, thank goodness! Are you the entertainers?” Bill bowed. “Mystico, at your service. And this”—he gestured to Molly—“is Professor Whiskers.” Molly froze for half a blink. “Doctor Whiskers,” she corrected. “I didn’t spend eight years at clown medical school to be demoted in a garden.” The father blinked. “Right. Great. The kids are feral. Can you start immediately?” “Can we?” Bill cried. “Sir, containing chaos is our ministry.” He swept toward the lawn, where twenty children turned with predatory delight. Molly hissed, “Our plan did not include performance.” “Adapt,” Bill whispered through a grin. “I’ll misdirect. You locate the presents.” A child tugged Molly’s tail. “Are you a fox or a dog?” “I’m a cautionary tale,” Molly said. Bill clapped. “Children! Who wants to see a vanishing trick?” A roar answered him. “That bought us two minutes,” he muttered. “Maybe three if I set something harmless on fire.” Molly slipped through the patio doors. Inside, the dining room glittered with wrapping paper and bows. A mountain of presents sat beside a banner that read HAPPY BIRTHDAY OLLIE. Into her earpiece, Bill said, “Status?” “Jackpot,” Molly whispered. “Also, someone has gifted this child a drum kit. I suddenly feel morally justified.” “Take the lot.” She grabbed the biggest sack she could find—a laundry bag decorated with smiling planets—and began shoveling boxes into it. Then a small voice behind her said, “Why are you stealing my presents?” Molly turned. A boy in a crown stood in the doorway, holding a juice box like a tiny prosecutor. She smiled. “Counteroffer: I’m stress-testing your birthday.” From outside came Bill’s shout: “Good news, everyone! The rabbit has escaped!”
created_at
"2026-03-06T12:01:54.084Z"
3/6/2026, 12:01:54 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
714
moderation_latency
(null)
generation_time
31046
tokens_prompt
77
tokens_completion
1231
native_tokens_prompt
66
native_tokens_completion
948
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.014385
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_02ec4975ee76b9fc0169aac23245148195bffd76f2108b689b"
is_byok
false
latency
380
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772798514-pLBOqc0qejLELtXUSZoL"
upstream_id
"resp_02ec4975ee76b9fc0169aac23245148195bffd76f2108b689b"
total_cost
0.014385
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
679
4.10%
Dialogue to Total Word Ratio
Ratio: 62.01%, Deviation: 7.99%
neededClean
false
wordsTotal
687
wordsDialogue
426
2.0524%