NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKA01N6EE7P7MQ0CDBG55ZV
Run Details
Model:
GPT-5
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
2m 3s
Cost:
$0.051092
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill hunched over a kiddie map of the community center, surrounded by glitter, ribbon shrapnel, and a suspiciously cheerful inflatable unicorn. It wheezed. "Okay, Bill, we are not committing grand larceny; we are relocating joy to a better home, namely ours, and we’ll invoice the universe later. Tonight." "Relocating joy sounds noble, but children are tiny chaos engines, Molly; we need a plan that survives cake, confetti, and a rogue mascot stampede. scenario." Molly tapped the unicorn’s snout, which sighed again. She grinned the way raccoons probably dream. Bill checked his wristwatch like it owed him apologies today. "Here’s the beauty: we blend with the festivities by being helpful, the sort of volunteers who alphabetize piñatas and offer tactical guidance on musical chairs." "Helpful until someone asks us to actually help, at which point my moral compass starts tap-dancing, and not even in time with the birthday anthem." A sugary chorus leaked from the gym next door. Streamers trembled like nervous cobras. Somewhere, a kazoo practiced villainy. Bill lifted the map, upside down. "Relax, we execute during peak chaos, a precise ninety-second window I’m calling Operation Glitter Eclipse, when even the chaperones forget their names and dietary restrictions." "Peak chaos, sure, but what’s our camouflage, specifically, besides your radiant confidence and my ability to look like I’m supposed to carry clipboards at birthdays." The unicorn listed toward them like a juror. Molly unzipped a tote, producing sequined aprons and name tags reading HELPER, which winked under fluorescent lights. "Authenticity matters; we wield trash bags, offer extra napkins, nod knowingly at frosting disasters, then escort gifts to a safer, less sticky environment far away." "I adore euphemisms that sound like yoga poses. But contingencies, please; what if someone small, loud, and jam-fingered asks where we’re taking Great-Aunt Gertrude’s blender?" Molly considered, then shrugged with theatrical innocence. Bill twirled a pom-pom stick he absolutely hadn’t borrowed. The gym doors wheezed, releasing confetti like impatient punctuation. "We speak fluent Clipboard, Bill; we say inventory, logistics, and audit, and suddenly everyone backs away respectfully, unsure whether numbers bite or simply lecture forever." "Numbers do bite; I once owed interest to a stack of coupons. Still, I like it. Do we reward ourselves with cupcakes as professional development?" Beyond the wall, a chant swelled: cake, cake, cake. Molly and Bill shared a conspiratorial smile that might have earned detention in three counties already. "Focus, scholar; once the candles ignite, we initiate our benevolent relocation, glide like considerate ghosts, and remember: never run, always briskly purpose-walk with jazz-hands confidence." "If we get caught, we apologize poetically, return everything, and found a nonprofit for rehabilitating ribbon. Also, I cry on command; it’s disarming to toddlers." "Excellent; your tears, my spreadsheets. Remember, Bill, we are not villains; we’re gift relocation consultants, bravely preventing regifting tragedies and socks being mistaken for excitement." "And if anyone asks why we’re leaving early, we say there’s a quota on delight transportation, and we’ve reached it. Union rules. Take that, universe."
created_at
"2026-02-16T13:26:53.173087+00:00"
2/16/2026, 1:26:53 PM
model
"openai/gpt-5-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
9992
moderation_latency
(null)
generation_time
122957
tokens_prompt
77
tokens_completion
1906
native_tokens_prompt
66
native_tokens_completion
5101
native_tokens_completion_images
0
native_tokens_reasoning
4352
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0510925
router
(null)
provider_responses
0
id
"resp_000451d22c2b88300169931aa186788196a9648b4cf140792e"
status
200
is_byok
false
latency
168
endpoint_id
"7c2f859a-7890-4e8e-b1de-1cd1c0a800b4"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-2025-08-07"
api_type
"completions"
id
"gen-1771248289-izlJAoRtUekXr8tTzWbG"
upstream_id
"resp_000451d22c2b88300169931aa186788196a9648b4cf140792e"
total_cost
0.0510925
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
97.6276%
Matches word count
n/a
neededClean
false
words
507
99.7649%
Dialogue to Total Word Ratio
Ratio: 70.22%, Deviation: 0.22%
neededClean
false
wordsTotal
507
wordsDialogue
356
98.6963%