NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKB5ED8RWD0TG28THJ9Z90Q
Run Details
Model:
GPT-5 Mini
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
1m 29s
Cost:
$0.008346
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The alley smelled of leftover cake, rain, and a hint of mischief. Molly perched on the fire escape, a silhouette in a red coat, scanning the apartment windows. Bill crouched below, knees tucked, twirling a paper party hat between two fingers. They had rehearsed this night in a dozen ways, each version sillier than the last, because laughter made the danger taste less like fear. He tapped the hat to his forehead, as if blessing the plan. Streetlight flickered and a dog barked twice, a metronome for their timing. Molly mouthed numbers, the guest list, and rewound logistics in her head. The party would be upstairs, small living room packed with cousins, sugar, balloons, and a suspiciously glittery pile of presents. Their objective was simple: remove every wrapped box, leave behind only an apologetic note and perhaps a rubber duck. Bill blew air through his teeth. He was the smooth part of the duo, practiced at pockets and pockets' pockets, while Molly was the spark, quick with improvisation and sticky fingers. They traded a look, the kind thieves know, a catalog of jokes and contingencies that fit their low budget expertise. Outside, a party balloon popped like confetti in their eardrums, and the building hummed with waiting. Molly tapped her watch, grinned, and whispered that the cake had a candle, the perfect timed distraction. Bill nodded, rehearsing his role. They were ridiculous, and they liked it, which made the plan feel almost moral. Tonight they would be professionals, at play. "Molly: Okay, the cake moment is minute three. We blow candles, kids scream, grownups clap, and we are a human wave." "Bill: Human wave, huh? I practiced the wave. On stairs it looks like a suspiciously enthusiastic earthquake." "Molly: You're in charge of making the clown balloon animals. Distract with papier mache sorrow and misplaced hats." "Bill: Clown balloon animals? My speciality. I can make a poodle that steals watches and a giraffe that recites nursery rhymes." "Molly: Good. I'll take the presents nearest the windows. You take the stack by the couch, pretend to fix the stereo, and then slip them into the duffel." "Bill: Slip them into the duffel? Which duffel? The one with the fuchsia zipper or the one that squeaks like a pigeon?" "Molly: The silent duffel. The squeaker draws suspicion and also shame. Nobody needs to know we replaced the corner gift with a brick of chocolate." "Bill: Solid. For escape, we use the fog machine from the attic. Nothing says 'we're gone' like a theatrical fog drifting down the stairs." "Molly: And if someone notices the missing gifts?" "Bill: We leave a note: 'Santas on strike. Taking unpaid overtime.' And maybe one of your terrible cookies as an apology." "Molly: My cookies are not terrible, they are avant garde. But okay, one cookie, two at most." "Bill: Timing: Cake blow at three, fog at four, exit stage left at five. We'll be gone before the piƱata is questioned." "Molly: Rubber ducks and fake mustaches."
created_at
"2026-02-16T13:46:44.137264+00:00"
2/16/2026, 1:46:44 PM
model
"openai/gpt-5-mini-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
11978
moderation_latency
(null)
generation_time
87624
tokens_prompt
77
tokens_completion
2255
native_tokens_prompt
66
native_tokens_completion
4165
native_tokens_completion_images
0
native_tokens_reasoning
3456
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0083465
router
(null)
provider_responses
0
id
"resp_08ec8a31b0a6494a0169931f6aff1081908ca9a801b3585cb2"
status
200
is_byok
false
latency
153
endpoint_id
"c4f66d01-20b0-4c27-a225-438ea22fda43"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-mini-2025-08-07"
api_type
"completions"
id
"gen-1771249514-evfzpAiE7T7SiY8bNdqY"
upstream_id
"resp_08ec8a31b0a6494a0169931f6aff1081908ca9a801b3585cb2"
total_cost
0.0083465
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
100.0000%
Matches word count
n/a
neededClean
false
words
500
71.3195%
Dialogue to Total Word Ratio
Ratio: 52.60%, Deviation: 2.60%
neededClean
false
wordsTotal
500
wordsDialogue
263
85.6598%