NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKA8RP765T4CVH4DYPRSH6V
Run Details
Model:
GPT-5
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
2m 4s
Cost:
$0.076003
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The community center smelled like frosting and panic, which was admirable, considering the balloons had not started squealing. Streamers draped the doorway like exhausted snakes. At dusk. On a folding table, a sign read HAPPY BIRTHDAY, which felt more like a threat than a promise. Molly adjusted a ridiculous paper crown she had liberated from a previous mistake, and Bill clicked a pen as if it were a detonator and not a coupon he intended to wave with authority. They studied the map they had drawn on a napkin, a syrup ring marking the treasure. Outside, minivans circled like cautious whales. Inside, a cake waited, and somewhere under it, a hoard of brightly wrapped gifts. Molly said, "We enter as party planners, demand a rehearsal of the surprise, then relocate every present into our rehearsal space, also known as the getaway stroller." Bill said, "We will be caught immediately unless we appear cheerful, so I propose forced whimsy, exaggerated bows, and a unicycle I cannot ride but can befriend." Molly said, "The unicycle is management material, but we need cover; balloons create chaos, confetti conceals movement, and a very loud kazoo can pass for official ceremony." Bill said, "I brought name tags that say ADULT SUPERVISION, which is bold branding, and a clipboard covered in stickers, because stickers emit authority at all times." Molly said, "The presents are stacked beneath the banner and guarded by frosting, which I respect; we must pretend to transport them somewhere safer for the cake." Bill said, "I will announce a safety evacuation of glitter, scientifically credible, and then lead a solemn procession, three steps per giggle, into our designated rehearsal hallway." Molly said, "Meanwhile I will consult the cake about its feelings, draw attention, and casually swap the gift mountain with decoy boxes labeled Science, Responsibility, and Naptime." Bill said, "Parents will thank us for introducing Naptime, and children will cheer for Science; Responsibility might draw boos, which is fair, because it looks like homework." Molly said, "Once the stroller is heroically heavy, we pivot, salute the inflatable dinosaur, and blend into the parking lot by performing the ancient Car-Alarm dance ritual." Bill said, "If anyone questions us, I shall produce this receipt for seventeen ounces of sprinkles, nod gravely, and whisper, logistics, which dissolves ninety percent of curiosity." Behind the door, the thump of tiny sneakers rose like distant thunder. A chorus rehearsed the ancient anthem of chaos about bathrooms. Someone tested a whoopee cushion with scholarly rigor. Bill tucked the clipboard under his arm and practiced his smile, the one with responsible molar. Molly squared her paper crown until it declared its candidacy. The map folded upon itself like a guilty bat and dived into her pocket. They checked the stroller, which was actually a repurposed wagon draped in tulle and false confidence. Confetti winks glittered from their sleeves. The door handle rattled. They nodded together, thieves who preferred applause to alarms, and braced to debut their questionable service: professional adult supervision.
created_at
"2026-02-16T13:31:39.451271+00:00"
2/16/2026, 1:31:39 PM
model
"openai/gpt-5-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
22655
moderation_latency
(null)
generation_time
123456
tokens_prompt
77
tokens_completion
1659
native_tokens_prompt
66
native_tokens_completion
7592
native_tokens_completion_images
0
native_tokens_reasoning
6912
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0760025
router
(null)
provider_responses
0
id
"resp_022972848b7336db0169931bbf4c4081938fe4bc98a74cf643"
status
200
is_byok
false
latency
194
endpoint_id
"7c2f859a-7890-4e8e-b1de-1cd1c0a800b4"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-2025-08-07"
api_type
"completions"
id
"gen-1771248575-QbBfSzoU6RZP2Isr3IKm"
upstream_id
"resp_022972848b7336db0169931bbf4c4081938fe4bc98a74cf643"
total_cost
0.0760025
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.9990%
Matches word count
n/a
neededClean
false
words
501
99.9502%
Dialogue to Total Word Ratio
Ratio: 50.10%, Deviation: 0.10%
neededClean
false
wordsTotal
501
wordsDialogue
251
99.9746%