NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHK9KVCW89PKEZQGMTWTM98H
Run Details
Model:
GPT-5
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
2m 30s
Cost:
$0.081462
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Rain freckled the Party Palace windows, a soft drumming that made the plastic banners shiver, while Molly and Bill crouched over a crumpled flyer on a food court table. The ice cream kiosk hissed beside them. Children in glitter shoes ricocheted past, sugar-drunk meteors, and somewhere, a DJ tested a siren that politely apologized afterward. "We’re really going to crash a birthday and liberate every bow-covered rectangle in sight?" "Language, Bill; we are performing a socially adjacent gift migration under controlled whimsy." "I brought courage and antacids, but my soul still squeaks like wet balloons." "Squeaks translate as festive enthusiasm; if anyone asks, you’re just auditioning for the confetti." "What if a mother eyes me and detects felony through sheer maternal sonar?" "Smile like an apologetic cupcake and say you’re here to assist in ribbon logistics." "Ribbon logistics sounds like my college minor, right between Panic Studies and Avoidance." "Perfect; your expertise qualifies you to compliment packages until they enter our portable universe." "And how does a package enter our portable universe without, you know, gravity noticing?" "We enact benevolent pageantry: exaggerated wows, a kazoo solo, then casual relocation toward destiny." "You keep saying destiny like it’s a duffel bag with excellent posture." "Destiny loves posture; so do reception committees, which we shall not encounter if we sparkle." Molly flipped the flyer, revealing a maze of cartoon arrows promising Bounce Zone Euphoria and Cupcake Hour. Bill drummed nervous rhythms on the table with a plastic spoon. A balloon bop sounded nearby, like a muffled idea popping before it could introduce itself to anyone. "Do we actually sparkle, or is this metaphorical twinkle endorsed by management?" "Metaphorical with backup stickers; if cornered, we present Hello, I’m Helpful badges." "If they ask who sent Helpful, I’ll panic and say destiny again, louder." "Louder sells confidence; also, hold something vaguely official, perhaps a clipboard wearing glitter." "I knew glitter was dangerous; it turns any noun into a plausible authority." "Exactly; if questioned, we murmur 'inventory' and squint at the horizon of possibility." "This still feels like robbing joy from tiny shoes and their bewildered laces." "We return some later by anonymous mail; think of us as emotional boomerangs." "Anonymous mail requires handwriting, and my handwriting looks like a chicken lament." "Perfect camouflage; nobody suspects the penitent poultry." "If there’s a magician, we move during the gasp; otherwise, cake song?" "Yes; applause fogs attention, frosting fogs ethics, and we drift like benevolent helium." "What about a lookout with the power of stern squinting and a Bluetooth?" "We counter-squint and compliment their shirt as if rescuing it from self-doubt." "I am not morally prepared to weaponize compliments, Molly." "You’re not weaponizing; you’re moisturising the atmosphere with upscale kindness." "Fine; I’ll moisturise, you migrate, destiny postures, everybody inhales confetti." They stood, rehearsing smiles that could pass as charitable emergencies. Outside the Party Palace doors, rain brightened the pavement to a mirror; in it they watched themselves become brighter: two reflections, resolute, walking toward music like permission.
created_at
"2026-02-16T13:20:40.742174+00:00"
2/16/2026, 1:20:40 PM
model
"openai/gpt-5-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
16825
moderation_latency
(null)
generation_time
149920
tokens_prompt
77
tokens_completion
2352
native_tokens_prompt
66
native_tokens_completion
8138
native_tokens_completion_images
0
native_tokens_reasoning
7424
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0814625
router
(null)
provider_responses
0
id
"resp_094d585b3ebe7d390169931911fa948197a65f4e6528fdb8e6"
status
200
is_byok
false
latency
178
endpoint_id
"7c2f859a-7890-4e8e-b1de-1cd1c0a800b4"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-2025-08-07"
api_type
"completions"
id
"gen-1771247889-8SUfQqtlr6ktJRb8Yb2g"
upstream_id
"resp_094d585b3ebe7d390169931911fa948197a65f4e6528fdb8e6"
total_cost
0.0814625
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.7443%
Matches word count
n/a
neededClean
false
words
504
70.9659%
Dialogue to Total Word Ratio
Ratio: 72.62%, Deviation: 2.62%
neededClean
false
wordsTotal
504
wordsDialogue
366
85.3551%