NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHK9Q3DENXHAQQ4T9V2DNMA7
Run Details
Model:
GPT-5
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
1m 25s
Cost:
$0.053512
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The alley behind the community center smelled like frosting and helium. Molly and Bill crouched behind an overflowing bin of deflated balloons, poring over a hand-drawn diagram on the dull side of wrapping paper. Someone had practiced their alphabet with jelly; the sticky A's and B's clung to Molly's elbow every time she pointed. A glittery tiara had become their paperweight. A cupcake, previously sat upon by Bill, trembled on a railing like a crime-scene witness. From inside, muffled squeals rose and fell with the bounce house, and the sound of a kazoo orchestra made the streetlights seem uncertain. Molly twanged a party blower like a measuring tape. Bill wore a conical hat the way a condemned statue wears a pigeon defiantly. "All right, Bill, mission parameters: liberate the gifts, respect the cake, avoid clowns." "Avoid clowns? That's our entire childhood coping mechanism walking in big shoes." "Clowns smell fear and unpaid invoices; we bring neither, just charm and ribbon." "And a wagon." "A wagon is a metaphor that also carries thirty-seven wrapped metaphors." "I practiced my innocent face; it looks like I swallowed a doorknob." "Perfect, children respect architecture; you'll pass as a sculpture of poor choices." "What about the dragon at the door, the mother with the guest list?" "We arrive as late replacement magicians; your hat already screams prestigious rabbit landlord." "I cannot produce a rabbit; I can produce a panic and a kazoo." "The trick is misdirection; no one tracks parcels when confetti follows union guidelines." "Do confetti have a union? Are we crossing picket lines of sparkles?" "We honor them by leaving a tip jar and excellent exit choreography." "How do we start without starting a fire or a birthday existential crisis?" "We knock, we bow, we promise wonders, we produce an inflatable bewilderment." "Then I toot the kazoo in A minor and everyone ages three years." "Meanwhile, you redirect attention to the cake with a heartfelt frosting soliloquy." "I have strong opinions about buttercream; I can filibuster until naptime." "Excellent; I shall wheel the wagon past the applause like a shy moon." "What if a toddler latches onto the axle like an adorable barnacle?" "We adopt them briefly, return them promptly, and earn goodwill and stickers." "What about the piƱata; it's shaped like an ethical dilemma with glitter." "If it bursts, candy covers our tracks; if it doesn't, we do." "I can juggle two gifts and a conscience, but not three." "Two is plenty; the conscience is recyclable and the wagon is hungry." "If someone asks our names, do we say Mister and Missus Surprise?" "We say Professional Helpers; it sounds generous and smells faintly of receipt." "You always smell of receipt; it's your cologne, Eau de Alibi." "Focus, maestro; timing is everything, like when to inhale around balloons." "I once dated a balloon; it ended with a pop and therapy." "May our relationships today be strictly transactional and filled with bows." "Then onward, partner; let us dignify chaos with manners and wheel bearings."
created_at
"2026-02-16T13:21:21.982447+00:00"
2/16/2026, 1:21:21 PM
model
"openai/gpt-5-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
17879
moderation_latency
(null)
generation_time
84847
tokens_prompt
77
tokens_completion
1791
native_tokens_prompt
66
native_tokens_completion
5343
native_tokens_completion_images
0
native_tokens_reasoning
4672
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0535125
router
(null)
provider_responses
0
id
"resp_008200f009cdd48f016993197c6ac081948e55ddc70e808b4e"
status
200
is_byok
false
latency
170
endpoint_id
"7c2f859a-7890-4e8e-b1de-1cd1c0a800b4"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-2025-08-07"
api_type
"completions"
id
"gen-1771247996-1jPUWkSRJbxUiRZFXgF2"
upstream_id
"resp_008200f009cdd48f016993197c6ac081948e55ddc70e808b4e"
total_cost
0.0535125
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.9190%
Matches word count
n/a
neededClean
false
words
503
23.9307%
Dialogue to Total Word Ratio
Ratio: 75.35%, Deviation: 5.35%
neededClean
false
wordsTotal
503
wordsDialogue
379
61.9249%