NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0ABXW5SRHRE8WGHHJTM0PH
Run Details
Model:
GPT-5.4 Mini
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
5.3s
Cost:
$0.004491
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
76.74%
Dialogue to Total Word Ratio
(details)
38.3695%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“You’re late,” Molly said, not looking up from the map spread across the hood of the van. Bill ducked under a dangling ribbon of fake ivy and closed the door with his heel. “I’m not late. I’m fashionably synchronized with catastrophe.” “You’re twenty minutes late.” “I hit traffic.” “In the alley behind a cake shop?” “There was a parade.” “There was not a parade.” “A lone man with a trumpet and a very determined grin.” Molly tapped the map. “Focus. We have one shot tonight.” Bill peered at the scribbled floor plan. “A children’s party. How hard can it be?” Molly gave him a flat look. “You’ve met children, right?” “I’ve been to a dentist.” “That’s not the same.” “It is if they’re handing out balloons.” Molly pointed at the warehouse photo. “The party is in the community center. Two entrances, one kitchen door, one main hall. Presents piled by the stage.” Bill leaned in. “That’s an unsettling amount of wrapping paper.” “They told the parents to label everything.” “So the gifts are organized.” “Yes.” “That makes stealing them rude and efficient.” Molly smiled thinly. “Exactly.” Bill rubbed his hands together. “What’s the plan?” “We go in as entertainers.” Bill blinked. “Entertainers?” “Yes. You’re the magician.” “I am not a magician.” “You own a cape.” “That was for weather.” “You also own a top hat.” “I inherited it.” “You rehearsed the disappearing rabbit trick for six weeks.” “I was trying to help a rabbit.” “You were trying to get a rabbit into a tiny box.” Bill frowned. “It was a novelty box.” “It was a shoebox.” “Well, it was still novelty to me.” Molly folded the map. “You do the magic act. I’ll handle the sound system. Once the parents are distracted, we load the gifts into the laundry bins.” Bill squinted. “Laundry bins?” “Borrowed from the daycare next door.” “You stole laundry bins before the heist?” “I borrowed them.” “From a daycare.” “Temporarily.” “That’s theft with a smile.” “It’s logistics.” Bill nodded slowly. “And the children?” “What about them?” “They’re the main obstacle.” Molly glanced toward the community center, where tiny voices and shrieks of laughter leaked through the walls. “We keep them busy.” “With what?” “With you.” “Me?” “You’re charismatic.” “I’m suspiciously available.” “You can juggle.” “I can barely carry soup.” “You’ll juggle oranges.” “I don’t have oranges.” “I brought oranges.” Bill stared at her. “You brought fruit.” “Children like fruit.” “Children like sugar and chaos.” “Then we’ll offer both. It’s a party.” Bill looked toward the lit windows. “You really think we can walk in, distract a room full of six-year-olds, and leave with every present?” Molly shrugged. “We’re thieves, Bill. Ambition is our brand.” He grinned despite himself. “Fine. But if one child looks at me with those huge guilty eyes and asks me to guess a card, I’m folding.” “You won’t fold.” “I fold under pressure.” “Then don’t think of it as stealing presents.” Bill raised an eyebrow. “What should I think of it as?” Molly lifted the van door and motioned him inside. “Relocation.” He climbed in, then paused. “That sounds so much cleaner than it is.” “That’s why it works.” “And if we get caught?” Molly started the engine. “We deny everything.” Bill settled into the passenger seat. “What if they have security?” “It’s a children’s party.” “Exactly. Children are security.” She glanced at him. “You’re nervous.” “I’m not nervous.” “You’re chewing your own collar.” “I’m tasting success.” Molly laughed once, sharp and brief. “Good. Because in ten minutes, you’re the Great Billini, and I’m the woman with a clipboard.” “A clipboard?” “You need authority at a children’s party.” Bill sighed. “I hate how prepared you are.” “I hate how easy this is going to be.” From inside the community center came a sudden burst of singing, followed by delighted screaming. Bill winced. “That’s either a game or a small riot.” Molly pulled on her gloves. “Same thing. Ready?” He straightened his cape, trying for dignity and landing somewhere near panic. “If this goes wrong, I want it on record that I objected to the fruit.” Molly opened the van door and flashed him a grin. “Too late, Bill. We’re going in for the presents.”
created_at
"2026-03-18T11:12:50.068Z"
3/18/2026, 11:12:50 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
387
moderation_latency
(null)
generation_time
5053
tokens_prompt
77
tokens_completion
1368
native_tokens_prompt
66
native_tokens_completion
987
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004491
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_06ae32db3a4260f70169ba88b21f348196951cf0e5a6e44055"
is_byok
false
latency
184
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832370-BhXZT7zXQ8ckLx3TE9c3"
upstream_id
"resp_06ae32db3a4260f70169ba88b21f348196951cf0e5a6e44055"
total_cost
0.004491
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
702
76.74%
Dialogue to Total Word Ratio
Ratio: 72.30%, Deviation: 2.30%
neededClean
false
wordsTotal
704
wordsDialogue
509
38.3695%