NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKABYANGATF5Q7PXB23D9A2
Run Details
Model:
GPT-5
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
1m 18s
Cost:
$0.052582
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
They crouched behind a hedge shaped like a suspicious rabbit, watching a bouncy castle wobble in slow motion. Streamers trembled on the porch as a chorus of sugar-fueled shrieks rose and fell like sirens operated by squirrels. Bill twirled a map that was actually a restaurant menu and nodded gravely at the dessert section. Molly checked her bag: kazoos, a rubber chicken, a receipt labeled “Philanthropy,” and three granola bars for plausible fitness. Somewhere inside, a magician argued with a balloon pump and lost. The presents glittered under a banner that mispelled birthday twice, which felt like permission from fate. "We are not robbing toddlers; we are liberating festive opportunities." "Right, a humanitarian heist, with sprinkles, confetti, and plausible deniability." "Disguises first: glitter beards, volunteer badges, and emotional support llamas." "I call logistics: balloons distract, cupcakes stall adults, we waltz." "The presents go under Operation Borrow; capital O, temporary custody." "Contingencies include kazoo solos, decoy piñatas, and emergency nap time." "What about guards aged five, wielding frosting, stickers, and sincerity?" "We counter with dinosaur roars, bubble wrap armor, and fingerpaint." "Schedule says cake at two; we synchronize watches and appetites." "If conscience rings, we donate upgrades: kazoos and premium crayons."
created_at
"2026-02-16T13:32:38.553591+00:00"
2/16/2026, 1:32:38 PM
model
"openai/gpt-5-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
9540
moderation_latency
(null)
generation_time
78282
tokens_prompt
77
tokens_completion
1190
native_tokens_prompt
66
native_tokens_completion
5250
native_tokens_completion_images
0
native_tokens_reasoning
4928
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0525825
router
(null)
provider_responses
0
id
"resp_096589effaa530bb0169931c2752548195bb35dee3a354d33e"
status
200
is_byok
false
latency
164
endpoint_id
"7c2f859a-7890-4e8e-b1de-1cd1c0a800b4"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-2025-08-07"
api_type
"completions"
id
"gen-1771248679-k9aVWOp3QSPEBm4UO2g8"
upstream_id
"resp_096589effaa530bb0169931c2752548195bb35dee3a354d33e"
total_cost
0.0525825
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.9990%
Matches word count
n/a
neededClean
false
words
201
100.0000%
Dialogue to Total Word Ratio
Ratio: 50.25%, Deviation: 0.25%
neededClean
false
wordsTotal
201
wordsDialogue
101
99.9995%