NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1HN937D5ZCCVF1K5DYR979
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
25.4s
Cost:
$0.014160
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly stood beneath the dripping awning of a shut pet shop, studying the suburban house across the street with the stern concentration of a general and the posture of a woman pretending not to be soaked through. The place blazed with balloons, bunting, and a handmade banner that sagged in the middle like it had lost faith halfway through spelling HAPPY BIRTHDAY, OSCAR! Through the front window came the faint chaos of shrieking children, the thump of party music, and one heroic adult trying to sound cheerful over both. Bill joined her carrying a paper cup of tea he had somehow acquired from nowhere respectable. His wool cap was pulled low, making him look less like a criminal mastermind and more like a handyman on the run from his own invoice. “That,” Molly said, pointing with two fingers, “is a fortress disguised as poor taste.” Bill squinted. “It’s got a bouncy castle.” “Exactly. Diversion, noise cover, and a tripping hazard.” He sipped. “I still think stealing presents from a children’s party sounds morally iffy.” Molly turned to him. “We are thieves, Bill.” “Yes, but usually from people who own at least one grandfather clock.” “Children’s gifts are easier to fence. Half those boxes will be electronics, designer trainers, limited-edition plastic rubbish adults queue overnight to buy. We take the lot, and by Monday some collector in Croydon is paying triple for a robot dinosaur that also teaches coding.” Bill considered this. “I did like the dinosaur part.” Molly unfolded a hand-drawn map from inside her coat. It had coffee stains, arrows, and one corner torn off where she had apparently become too emotional during planning. “The parents are all in the back garden pretending prosecco counts as supervision. The children are in the lounge and garden in rotating waves of sugar frenzy. Presents are stacked in the dining room under the window.” “How do you know that?” “I went in dressed as a balloon artist.” Bill blinked. “You can make balloon animals?” “I can make fear and a surprisingly convincing swordfish.” A pack of children thundered past the hallway window inside, one wearing a paper crown and the face of a tiny emperor drunk on icing. Bill watched them with the air of a man nearing an ethical cliff edge in slippery shoes. “And our entrance?” “Side gate. Catering van’s due in five minutes with extra ice. We go in carrying these.” She tapped two folded aprons in her bag. “Kitchen help. Invisible people in every household.” Bill nodded slowly. “And then?” “And then,” Molly said, “we smile sadly, as if underpaid, collect the gifts into bin liners, and leave through the gate while everyone sings to a cake.” “That’s cold.” “It’s timing.” A woman inside opened the front door and shouted, “Has anyone seen the clown?” Molly and Bill exchanged a look. Bill said, “Please tell me that isn’t us.” “New plan,” said Molly instantly. “You’re the clown.” “I beg your pardon.” “You’ve got the proportions for tragedy.” “I refuse to honk for criminal purposes.” She peered at him critically. “Actually, no. Too sinister. You’ll frighten them into order, which is useful in schools but risky here.” The front door slammed. From the garden came a chorus of parents attempting a game they did not understand. Bill finished his tea and dropped the cup into a hedge with professional shame. “What if,” he said, “we cause a distraction bigger than the cake song?” Molly raised an eyebrow. “Go on.” “The bouncy castle. We untether it. It drifts.” She stared at him, impressed despite herself. “That is the stupidest thing I’ve ever heard.” “Thank you.” “It might also be perfect.” A beat passed as both imagined a large inflatable giraffe-patterned structure sailing majestically over detached housing. Bill grinned. “While every adult in the county chases airborne litigation, we’re out the gate with the presents.” Molly folded the map. “For a man with no principles, you do occasionally contribute.” He offered her one of the aprons. “After you.” She took it, eyes fixed on the glowing house. “Remember, Bill: no improvising, no souvenirs, and absolutely no eating party food.” He looked wounded. “I’m a professional.” From inside came the sudden musical wail of a hired entertainer dying on stage. Molly smiled. “Showtime,” she said.
created_at
"2026-03-06T12:23:49.102Z"
3/6/2026, 12:23:49 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
523
moderation_latency
(null)
generation_time
25344
tokens_prompt
77
tokens_completion
1218
native_tokens_prompt
66
native_tokens_completion
933
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01416
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_09bcf506d6585ead0169aac7552ae88190b56184dce06f0653"
is_byok
false
latency
260
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772799829-FHddtXX5H7tl3QqHmTbe"
upstream_id
"resp_09bcf506d6585ead0169aac7552ae88190b56184dce06f0653"
total_cost
0.01416
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
713
0.00%
Dialogue to Total Word Ratio
Ratio: 49.02%, Deviation: 19.02%
neededClean
false
wordsTotal
716
wordsDialogue
351
0.0000%