NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKB08Q87T06Z2NJYWF74J4M
Run Details
Model:
GPT-5
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
2m 50s
Cost:
$0.111073
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Rain misted the cul-de-sac where balloons bobbed like moons over an inflatable castle. Molly peered from the van window with scholar’s intensity, nose smudging glass, while Bill adjusted a thrifted blazer three sizes too small. Streamers jittered across a lawn like nervous snakes. Somewhere, toddlers sang atonally about cake. "You sure about the blazer? You look like an usher." Molly whispered. "It says, 'Authority,' but also, 'I can't lift my arms.'" Bill preened. Across the street, a caterer in a unicorn apron ferried a cake so large it needed diplomatic plates. Parents clustered beneath a tent, sipping coffee with the seriousness of surgeons. By the bouncy castle, a mountain of wrapped gifts reached critical altitude. The mountain glittered as though it had swallowed a rainbow and kept the receipt. "Behold the summit. Our mission is festive mountaineering." Molly said. "Do we have a sherpa, or just your paperclips?" Bill asked. "Paperclips are democratic lockpicks and great for sandwich emergencies." Molly plucked a clip from her sleeve. "I brought a balloon, two glitter bombs, and a marvelous lying smile." Bill produced a deflated giraffe. It wheezed out like a very tired accordion. Molly unfolded a crayon-drawn map stolen from her nephew’s backpack: crooked arrows, a stick dragon labeled ‘Piñata,’ X’s marking everywhere a toddler had previously cried. The legend read, in purple, Please do not sit in the guacamole again. "Entrance is through the front, obviously, because we are legends." Molly declared. "Correction: We are glamorous rumors with receipts." Bill tapped his too-small shoes. Thunder grumbled a parental warning. Somewhere a piñata twitched in fear. A clown, already resigned to chaos, rehearsed a smile that could survive frosting. "Timing is crucial. We move during Freeze Dance." Molly checked her watch. "My dignity died when I mistook fondant for soap." Bill confessed. "Backup plan: blame the magician. Everyone suspects a man with scarves." Molly grinned. Music thumped, a sugar-fueled anthem. Children erupted from the house like popcorn attempting flight. Gift wrap flashed in the sun, each square an invitation, each bow an audacious dare. Bill swallowed, which sounded suspiciously like gift bags rubbing together. "Is it wrong if the presents haven’t experienced love yet?" Bill wondered. "It’s redistribution of glitter. We’re Robin Hood in sequins." Molly adjusted her beanie. "Operation Borrowed Birthday begins with the very first spilled juice." Bill raised a finger. As if on cue, a cup performed a graceful swan dive. Parents bowed to the altar of napkins. The air filled with the shriek of kazoo liberty. "Ready to dance like chairs don’t exist tonight, Bill?" Molly breathed. "I was born chairless. Lead on, Duchess of Mischief." Bill rolled his shoulders. They stepped from the van as confetti began to fall, not from the sky, but from a rogue party cannon with excellent comedic timing. Somewhere a magician sneezed and produced a rabbit that judged them quietly. The gift mountain gleamed, naive and doomed, and the two thieves joined the stampede disguised as grown-ups who had definitely been invited.
created_at
"2026-02-16T13:45:18.323008+00:00"
2/16/2026, 1:45:18 PM
model
"openai/gpt-5-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
7874
moderation_latency
(null)
generation_time
169259
tokens_prompt
77
tokens_completion
2363
native_tokens_prompt
66
native_tokens_completion
11099
native_tokens_completion_images
0
native_tokens_reasoning
10368
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.1110725
router
(null)
provider_responses
0
id
"resp_04161cf3c46ebd8c0169931ec1584481908b0a7d82534c4592"
status
200
is_byok
false
latency
168
endpoint_id
"7c2f859a-7890-4e8e-b1de-1cd1c0a800b4"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-2025-08-07"
api_type
"completions"
id
"gen-1771249345-VKY0GJbDqI7FVgvQVu5m"
upstream_id
"resp_04161cf3c46ebd8c0169931ec1584481908b0a7d82534c4592"
total_cost
0.1110725
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
98.7124%
Matches word count
n/a
neededClean
false
words
506
90.3720%
Dialogue to Total Word Ratio
Ratio: 31.42%, Deviation: 1.42%
neededClean
false
wordsTotal
506
wordsDialogue
159
94.5422%