NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KKAC4H7G70XGWY7CWFPCDASD
Run Details
Model:
GPT-5.4 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
25.0s
Cost:
$0.014490
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
99.86%
Dialogue to Total Word Ratio
(details)
49.9321%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly flattened the crayon-scribbled invitation on the café table with the seriousness of a general reviewing battle plans. Across from her, Bill adjusted the tiny paper crown he’d stolen from a bakery display and squinted at the map she’d drawn in purple glitter pen. “Tell me again,” Bill said, “why we’re robbing a seven-year-old?” “We are not robbing a seven-year-old,” Molly said. “We are intercepting assets in a high-sugar environment.” “The assets being six board games, three water pistols, one stuffed octopus, and whatever a ‘Craft Slime Mega Bucket’ is.” “Limited edition,” Molly said. “And expensive.” Bill leaned back. “I’m just saying, morally, this feels murky.” “You stole a brass telescope from a lighthouse.” “That was nautical. This is birthdayical.” Molly tapped the invitation. “Observe. Saturday, two o’clock, Sunny Nook Community Hall. Theme: pirates. Parents distracted by cupcakes, children distracted by screaming, presents stacked on the side table under a banner that says, very helpfully, PRESENTS.” Bill nodded despite himself. “That is bold signage.” “Exactly. This is not a vault. This is a trust exercise with balloons.” “What about security?” “Face painter by the entrance. One exhausted father on music duty. Possible grandmother near the sausage rolls. The dog is the only wildcard.” “There’s a dog?” “Small. Judging by the paw print on the RSVP, either a terrier or an ink catastrophe.” Bill rubbed his chin. “Dogs can be bought.” “With what?” “Sausage.” Molly smiled. “Good. You handle canine diplomacy. I’ll secure the package zone.” “And our disguises?” She slid two outfits from a shopping bag. Bill held up a striped shirt and a foam pirate hat with a skull wearing a bow. “I look ridiculous,” he said. “You look festive. There’s a difference.” “And you’re going as what?” “Balloon artist.” “You can’t make balloons.” “I can make a sword, a poodle if the child has poor standards, and something that accidentally resembles a submarine.” Bill considered this. “Actually, failure might help. Adults trust incompetence.” “Exactly. Meanwhile, you’re ‘Captain Bongo,’ a children’s entertainer of vague origin.” “I don’t know any tricks.” “You know one.” Bill brightened. “Coin behind the ear.” “Perfect. There will be dozens of ears.” He lowered his voice. “Extraction?” “Gift sacks loaded into the bouncy castle blower crate by the side exit.” “That seems conspicuous.” “Nothing is conspicuous at a children’s party. A man on stilts could explode and half the parents would just ask if he’s with the magician.” Bill looked around the café. “And if a child asks me to sing?” “Fake a tragic ukulele injury.” “If the birthday boy recognizes me?” “Have you met the birthday boy?” “No.” “Then he won’t.” Bill pointed to the invitation. “His name is Oliver. Children named Oliver are observant.” “Children named Oliver are usually sticky,” Molly said. “Stickiness slows deduction.” A waitress passed, giving them a wary look. Bill lowered the pirate hat. “I still think we need a contingency.” “We have three. If challenged by adults, we say we’re from the entertainment company. If challenged by children, we ask where the treasure is. They’ll scatter themselves. If challenged by the dog, sausage.” Bill exhaled. “You really thought this through.” Molly sat back with the serene confidence of a woman who had once stolen a chandelier during a fire drill. “Bill, I have cased christenings, infiltrated weddings, and escaped a petting zoo under active goose pursuit. This is a room full of short people wearing eye patches made of felt. We’ll be in and out before the cake knife appears.” Bill slowly grinned. “All right. We do it.” “We do it quietly.” “In a pirate hat.” “In a pirate hat,” Molly agreed. He raised his coffee cup. “To the perfect crime.” She clinked it with hers. “To loot, laughter, and leaving before the pinata creates witnesses.” From the shopping bag, the foam skull seemed to grin at them both, as if even it understood that no fortress on earth was less defensible than a children’s party with a buffet.
created_at
"2026-03-09T22:40:27.386Z"
3/9/2026, 10:40:27 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2282
moderation_latency
(null)
generation_time
24942
tokens_prompt
77
tokens_completion
1333
native_tokens_prompt
66
native_tokens_completion
955
native_tokens_completion_images
0
native_tokens_reasoning
37
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01449
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_02d1a3f458c86cfe0169af4c5b75ac8196ba5f8d90cd14945a"
is_byok
false
latency
160
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773096027-094Ls7agm8nwumWAqa7R"
upstream_id
"resp_02d1a3f458c86cfe0169af4c5b75ac8196ba5f8d90cd14945a"
total_cost
0.01449
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
661
99.86%
Dialogue to Total Word Ratio
Ratio: 70.16%, Deviation: 0.16%
neededClean
false
wordsTotal
667
wordsDialogue
468
49.9321%