NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0AMJ35X3R93VJT07JV78CN
Run Details
Model:
GPT-5.4 Mini
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
6.3s
Cost:
$0.004383
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread a faded map across the hood of the van, pinning it down with a bag of stolen marbles and a wind-up clown that had long since lost one eye. Bill leaned over her shoulder, chewing a licorice whip like it was a tactical cigar, and squinted at the bright little house at the end of the lane. It was already shaking with music, shrieks, and the occasional burst of laughter that made Molly’s teeth hurt. “Tell me again,” Bill said, “why the richest haul of the month is apparently six-year-olds’ birthday loot.” “Because,” Molly said, tracing the route with a finger, “parents are sentimental, children are greedy, and gift paper is a thin disguise over a shocking amount of electronics, cash, and gift cards. Also, I have bills.” Bill nodded gravely. “A noble cause. We’re Robin Hoods with criminal records and poor impulse control.” “At least one of us has a sense of history,” Molly said. “We go in through the side gate, past the balloon arch, through the conservatory, and out the back with the presents before anyone notices.” Bill frowned. “You make it sound simple.” “It is simple.” “That’s what you said about the museum.” “That was simple too, until you touched the laser grid.” “I thought it was festive string lights.” Molly gave him a look that could have peeled paint. “If you see a red beam at waist height, Bill, it is not a suggestion.” A small child in a pirate hat ran past the van carrying a blue balloon and an alarming amount of cake on his sleeve. Bill ducked instinctively, then straightened and tugged at his collar. “Are we sure this is ethical?” he asked. Molly snorted. “We are stealing presents from a child’s party, Bill.” “Yes, but what if they’re for a child?” “They are for a child.” “Oh.” He considered this. “That does make it a bit worse.” “It makes it exact,” Molly said. “Children don’t count their gifts. Adults don’t track all the envelopes. Everything is glittering confusion. We exploit the confusion.” Bill brightened. “I do like exploiting confusion.” “Of course you do. You married it for three months.” He grinned despite himself. “Best woman I ever stole from.” “You stole her bicycle.” “Exactly. Romance.” Molly checked the watch on her wrist. “We’ve got seven minutes until cake, ten until the magician starts, and fifteen until the inflatable castle gets punctured. That’s our window.” Bill peered toward the house. “Why is there a magician at a child’s party?” “Because parents like to outsource chaos.” “Fair.” She handed him a small silver badge on a ribbon. “Wear this. It says STAFF.” Bill held it up. “This says BAST.” “It says STAFF if you stop squinting.” “Oh.” He pinned it to his jacket. “Elegant. Will this help us blend in?” “No, but it will help us look like we belong near a tray of juice boxes, which is the same thing.” He tested the badge in the van’s mirror and adjusted his expression into one of officious competence. It was terrible. He looked like a man who would misplace a clipboard and blame the weather. Molly slipped a black bag over her shoulder. “Remember: no noise, no grabbing, no heroics.” Bill blinked. “That’s a very strict list.” “It’s one rule repeated three ways.” He nodded. “Right. No noise, no grabbing, no heroics.” She opened the van door. The noise from the party rolled out at them in a warm, sugary wave. Somewhere inside, a child yelled, “Mine!” with the raw conviction of a tiny emperor. Molly smiled. “Ready?” Bill hauled himself upright and flexed his fingers. “Born ready.” “Liar.” “A trained liar,” he said. “Now let’s go rob a birthday.” They walked up the path with the grave, purposeful stride of caterers arriving too late. As they reached the gate, a woman in a sparkly headband turned and called, “You’re late!” Bill put on his nicest smile. “Terribly sorry. We had a minor issue with the cake.” Molly glanced at him. “What cake?” He whispered back, “The issue is that I want some.” And with that, they slipped into the party, where the presents waited in a shining mountain beneath a banner that read HAPPY BIRTHDAY, and no one yet had the faintest idea how quickly happiness could be transported out the back door.
created_at
"2026-03-18T11:17:32.915Z"
3/18/2026, 11:17:32 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
419
moderation_latency
(null)
generation_time
6204
tokens_prompt
77
tokens_completion
1201
native_tokens_prompt
66
native_tokens_completion
963
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004383
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_004cfcfd0750313f0169ba89ccfa6481908333ef83ba6eb9c2"
is_byok
false
latency
216
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832652-joxFV3N3EYoAMWWBwoDr"
upstream_id
"resp_004cfcfd0750313f0169ba89ccfa6481908333ef83ba6eb9c2"
total_cost
0.004383
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
723
0.00%
Dialogue to Total Word Ratio
Ratio: 50.55%, Deviation: 20.55%
neededClean
false
wordsTotal
726
wordsDialogue
367
0.0000%