NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKANG87KXHP07YG9CQ1GCJK
Run Details
Model:
GPT-5 Mini
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
1m 26s
Cost:
$0.018132
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly perched on the windowsill like a cat who had just stolen someone else's tuna, glitter on her sleeve and a ridiculous grin. Bill fiddled with a glittery party horn as if it were a stethoscope, peering through cheap curtains at the cluster of shoes and balloons below. They were professional in an unprofessional way: two careers of tiny cons and big regrets, aiming for a children's birthday. The plan smelled faintly of napkins and frosting. The backyard was a minefield of paper crowns and a piñata that looked like a mythological fruit. They had rehearsed scripts for locksmiths and distracting aunties, but improvisation would be required when a sugar-high mob of five-year-olds arrived. A heist always had one rule for them: leave no evidence, especially glitter, which clung like guilt. They traded looks that meant mischief and logistics, and then they spoke, as they always did, in loud whispers. "Molly: Okay, we're doing this tonight. Birthday cake, balloons, seven dozen wrapped treasures and one suspiciously ornate unicorn piñata." "Bill: Do you always catalog the loot like it's a library?" "Molly: Habit. Lists keep my nerves from glittering too loudly. Also, that unicorn piñata is probably full of chocolate and tiny plastic swords." "Bill: Tiny plastic swords are the currency of playgrounds. We need maps. Where's the table with presents?" "Molly: Against the fence, under the banner that says 'Happy Birthday, Molly'—not our Molly. Say it with me: diversion, sleight, exit." "Bill: Diversion how? I can juggle cupcakes. I refuse to juggle cupcakes." "Molly: You promise never to juggle cupcakes again? Fine. We'll do the classic—balloon attack. Release a flock of balloons; parents chase, chaos ensues." "Bill: Parents chasing balloons is both believable and cinematic. Who releases the balloons?" "Molly: You. You're charming. Also, bring glitter-free gloves. Last time you sneezed and sprinkled evidence for weeks." "Bill: I sneezed once. That sneeze led to an entire bakery investigation." "Molly: Exactly. No sneezing. Also, we need an aunt." "Bill: An aunt?" "Molly: Yes, an aunt who cries over everything. She will spot a 'lost' ring and start a melodrama that draws humans away from the gift pile." "Bill: Where do we get an aunt?" "Molly: We borrow Mr. Patterson's tan terrier costume from the community theater. The terrier plays an emotional companion and the costume comes with an imaginary aunt." "Bill: Terrible plan. Also practical. What's my role while you extract the pile?" "Molly: You are the distraction and the wheelbarrow. You wheel, I lift—under twenty seconds per box." "Bill: Wheelbarrow? Do we have a wheelbarrow?" "Molly: We will have a collapsible cart disguised as a decorative present. Remember the accordion trick?" "Bill: Yes, but last time your 'decorative present' unfurled into a cloud of confetti and caused an intervention." "Molly: That intervention is why we rehearse. This time we rehearse with polite neighbors. We practice faces." "Bill: Faces?" "Molly: Timing is everything. Password is 'Cake or cash.'" "Bill: Exit plan?" "Molly: Back gate, compost, over fence, run." "Bill: Gloves, cart."
created_at
"2026-02-16T13:37:58.891793+00:00"
2/16/2026, 1:37:58 PM
model
"openai/gpt-5-mini-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2533
moderation_latency
(null)
generation_time
85405
tokens_prompt
77
tokens_completion
2018
native_tokens_prompt
66
native_tokens_completion
9058
native_tokens_completion_images
0
native_tokens_reasoning
8320
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0181325
router
(null)
provider_responses
0
id
"resp_0b0e4e81d48f08160169931d60a38881908b34840650333e50"
status
200
is_byok
false
latency
152
endpoint_id
"c4f66d01-20b0-4c27-a225-438ea22fda43"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-mini-2025-08-07"
api_type
"completions"
id
"gen-1771248992-wHtBCfvLjT57oiIgbRhf"
upstream_id
"resp_0b0e4e81d48f08160169931d60a38881908b34840650333e50"
total_cost
0.0181325
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
97.6276%
Matches word count
n/a
neededClean
false
words
507
99.8426%
Dialogue to Total Word Ratio
Ratio: 69.82%, Deviation: 0.18%
neededClean
false
wordsTotal
507
wordsDialogue
354
98.7351%