NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKAYXJH9EGKA2WS8YJZNHYJ
Run Details
Model:
GPT-5 Mini
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
1m 7s
Cost:
$0.012925
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
They crouched behind a hydrangea, the party lights bleeding through the leaves like soft fluorescent confetti. Molly tucked a strand of fairy lights into her pocket and pretended to adjust an invisible necklace, which was her subtle signal that she had checked the back gate. Bill balanced a backpack that rattled faintly with items whose usefulness could not legally be advertised. Around them a flock of parents hummed like a dangerous choir, eyes blurring into a single watchful mass. A train of toddlers escorted a piñata by rope, their shrieks filling the yard with honest chaos. Molly: "Third-floor balcony, pinata-light, and a hallway full of glitter. We in or out?" Bill: "We're in. But what's the getaway? I don't want to end up explaining to a police officer why I used a spoon to open gifts." Molly: "Operation Sugar Rush: small, silent, candy-motivated. We wear cake bags and carry folding scissors." Bill: "Folding scissors?" Molly: "Multi-purpose. Prongs to unlock ribbons, teeth for opening tape, also excellent for dramatic gestures." Bill: "Dramatic gestures are my specialty. What about kids?" Molly: "We distract with a puppet sloth named Reginald. I can do sad eyes." Bill: "You and sad eyes are contraband." Molly: "We have twenty minutes between pinata pops and cake collapse. Timing is everything." Bill: "I brought a list of presents to prioritize. Toys, then cash envelopes, then suspiciously healthy snacks." Molly: "What about the glitter?" Bill: "We embrace it. Glitter is our calling card." Molly: "Deal." Molly checked her watch: a child's birthday hour is a narrow window, and in the world of sticky frosting and nap-time fatigue, miracles happened if you timed them right. Her plan was not cruel, she kept telling herself; presents are transient, and she only had a soft rebellious streak against consumerism—plus a fondness for sparkly tissue paper. Bill, who believed every perfect crime needed an element of theater, grinned with the kind of confidence that suggested he had read one too many caper novels. He had a grocery list and a conscience on layaway. They rehearsed movements like dancers practicing a ridiculous duet: one distracts, one slips through, both exit with less noise than a falling cupcake. The night smelled of lemon-scented hand sanitizer and animal crackers. Somewhere a toddler tried on a superhero cape and immediately insisted he could fly. It made Molly smile. It made Bill calculate trajectories. The neighborhood was a shallow pool of suburban enchantment; they were two fish in it, disguised as shadows. Their mission required precision, absurdity, and a puppet sloth. Molly and Bill traded glances, and a plan that had been scribbled on a napkin the week before began to snap into focus like velcro. Molly fished a crumpled children's party hat from her pocket and placed it on Bill's head as a test run of ridiculousness. He sighed theatrically and saluted with a cupcake napkin. Nearby, a dog considered stealing the whole scene and wisely decided a squeaky toy was the better prize and clearly amused.
created_at
"2026-02-16T13:42:48.506878+00:00"
2/16/2026, 1:42:48 PM
model
"openai/gpt-5-mini-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
34096
moderation_latency
(null)
generation_time
65902
tokens_prompt
77
tokens_completion
1944
native_tokens_prompt
66
native_tokens_completion
6454
native_tokens_completion_images
0
native_tokens_reasoning
5760
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0129245
router
(null)
provider_responses
0
id
"resp_0964348bed651c640169931e952e6c81939d32ba7ff2c31461"
status
200
is_byok
false
latency
145
endpoint_id
"c4f66d01-20b0-4c27-a225-438ea22fda43"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-mini-2025-08-07"
api_type
"completions"
id
"gen-1771249301-q4dIOXoAlhE9ARkZTB3r"
upstream_id
"resp_0964348bed651c640169931e952e6c81939d32ba7ff2c31461"
total_cost
0.0129245
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
97.6276%
Matches word count
n/a
neededClean
false
words
507
78.6893%
Dialogue to Total Word Ratio
Ratio: 27.81%, Deviation: 2.19%
neededClean
false
wordsTotal
507
wordsDialogue
141
88.1585%