NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM08WMDSVZM738D8KW81X2YK
Run Details
Model:
GPT-5.4 Mini
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
6.7s
Cost:
$0.004387
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0001%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the bakery truck, peering through the rain-streaked window at the brightly lit community hall. Balloons pressed against the glass like cheerful little spies. Inside, a banner read HAPPY BIRTHDAY, TILLY! in glittering pink letters, and from somewhere beyond it came the shriek of a kazoo and the delighted roar of children. Bill adjusted the collar of his too-small black jacket. “You’re sure this is a good idea?” “It’s a terrible idea,” Molly said. “That’s what makes it beautiful.” “I don’t know if beautiful is the word I’d use for stealing presents from a seven-year-old.” “They’re not from a seven-year-old. They’re for a seven-year-old. Big difference.” Bill squinted through the wet glass. “How many presents are we talking about?” “Enough to make a clean getaway worth it.” “That’s not an answer.” “It’s a range.” He sighed. “I hate when you say ‘range.’ It always means more boxes than the last time.” Molly smiled, checking the straps of her duffel bag. “Relax. We go in with the delivery crew, blend with the chaos, and leave with the goods before anyone notices.” Bill looked at the clown parking beside the entrance. “You say ‘blend’ like there’s a chance of that happening.” “There’s always a chance. What’s wrong with your confidence tonight?” “My confidence is fine. My nerves are doing the talking.” “Then tell your nerves to whisper.” A gust of wind rattled the truck door. Bill flinched. “And if someone asks who we are?” “We’re here to collect the extra party supplies.” “From a party that is actively happening?” “Yes.” “That sounds suspicious.” “It sounds official if you say it quickly.” Bill rubbed his hands together. “I still think taking presents from children is low.” Molly gave him a look. “We’re not taking from children. We’re taking from a pile of wrapping paper and hopeful relatives.” “Hopeful relatives?” “Parents. Grandparents. The aunt who definitely brought the giant stuffed giraffe to show off.” Bill peered back inside. “There is, in fact, a giant stuffed giraffe.” “See? You’re already learning.” “Learning to feel guilty, maybe.” Molly nudged him with an elbow. “Bill, focus. The hall is packed. The adults are distracted, the kids are sugar-powered, and the presents are stacked near the stage. We only need ten minutes.” “Ten minutes is how long it takes children to form a mob.” “Then we don’t get caught by the mob.” “That’s your strategy?” “It’s a strong strategy.” He looked at her. “You ever have a plan that didn’t sound like it was written by a raccoon?” “Only the successful ones.” Bill snorted despite himself, then glanced toward the side door where a paper sign announced PRESENT DROP-OFFS THIS WAY. “Fine. We go in, we grab the presents, and we leave before the birthday song ends.” Molly patted the side of her bag. “That’s the spirit.” “And if there’s a piñata?” “We do not acknowledge the piñata.” “What if somebody hands us cake?” “We refuse politely.” “What if the clown recognizes us?” Molly shrugged. “Then we commit to the bit.” Bill stared at her. “What bit?” “The one where we’re concerned relatives retrieving donations for a literacy program.” “You made that up on the spot.” “I’m very talented.” He took a breath and straightened his jacket. “I can’t believe I’m saying this, but I’d rather be in a bank.” “Banks have lasers and security guards.” “This party has a balloon arch shaped like a unicorn.” “Exactly. Far more dangerous.” From inside came the sound of children chanting, “Open it! Open it! Open it!” followed by an adult voice saying, “One at a time, sweetheart.” Bill winced. “That’s going to make this harder.” Molly grinned and held out her hand. “Come on, partner. Let’s relieve these people of their burdens.” Bill took her hand, then immediately regretted the phrase. “You really need better wording.” “Later. Now we move.” They slipped from the truck and headed for the side entrance, rain tapping softly on the presents waiting inside, each wrapped box bright with ribbons and trust, while Molly whispered, “Remember, cheerful faces.” Bill forced a grin. “Mine always looks like a man planning a felony.” “Perfect,” Molly said. “You fit right in.”
created_at
"2026-03-18T10:47:00.295Z"
3/18/2026, 10:47:00 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
944
moderation_latency
(null)
generation_time
6378
tokens_prompt
77
tokens_completion
1259
native_tokens_prompt
66
native_tokens_completion
964
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0043875
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0ce97c9672d7c1790169ba82a45cb08196ab0989a99be30d04"
is_byok
false
latency
198
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830820-wZM71tuLDjB0Y2TIK2Sv"
upstream_id
"resp_0ce97c9672d7c1790169ba82a45cb08196ab0989a99be30d04"
total_cost
0.0043875
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
695
0.00%
Dialogue to Total Word Ratio
Ratio: 66.52%, Deviation: 16.52%
neededClean
false
wordsTotal
705
wordsDialogue
469
0.0001%