NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHK9Y9X2AEKACXKR5VD12RAA
Run Details
Model:
GPT-5
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
2m 25s
Cost:
$0.065723
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The community center on Maple Street pulsed like a beehive dipped in frosting, and Molly laid out blueprints on a pizza box while Bill tested the squeak of a rubber clown shoe. "Think of it as redistributing joy, but with better inventory control systems." Molly said. He eyed the kiddie banner flapping over the double doors; a cartoon dragon breathed soap bubbles that drifted across their stakeout car like glittery land mines. "I prefer jobs without confetti cannons, Mol. Last time I smelled unicorns." Bill said. Molly clicked a stopwatch, more for drama than timing, and produced a laminated schedule titled Operation Candle Melt, featuring stickers, color codes, and an ominous cupcake icon. "The invitation says two piñatas, three clowns, and a magician named Petite Steve." Molly said. Bill squinted at the map like it owed him money, nodding at arrows that led from the gift mountain to a side exit protected only by a cardboard castle and optimism. "Great, we’ll blend in as assistants and liberate packages during the finale elegantly." Bill said. He tugged on his borrowed vest, sequins protesting with every move; Molly uncapped a marker and began labeling rolling duffels with fake names like Aunt Mystery. "Elegance stops where sheet cake begins; we raid before candles get lit." Molly said. Through the window, a balloon arch arched harder than necessary, and a DJ sound-checked a playlist that seemed to be ninety percent bass and ten percent sugar. "What about the dragon mascot? That thing breathes bubble solution chemical warfare." Bill said. Molly flipped to a contingency page featuring doodled flames and a tiny extinguisher; beside it, she’d clipped a coupon for cupcakes, because morale and frosting are cousins anyway. "I bribed it with gummy worms. The handler thinks it’s a fundraiser for literacy." Molly said. Bill considered this, then pulled a compact mirror to practice his non-threatening smile; the mirror cracked a little under the strain but rallied with plucky resilience. "We are promoting literacy, specifically the fine print on return policies." Bill said. He rifled through a tote of disguises: a quiver of face brushes, googly eyes, rubber duck decals, and a foam tiara labeled Security that seemed sarcastic by design. "Blueprint: enter as balloon sculptors, exit as legends dragging rolling duffels stuffed triumphantly." Molly said. A child in a cape sprinted past the hedges, pursued by a parent wielding wet wipes with the haunted focus of a surgeon; both ignored the getaway van. "I’m allergic to balloons. They squeak like crime confessing itself to grandmothers at brunch." Bill said. Molly tossed him a palette and a foldable stool. He caught both with the reflexes of a raccoon near a compost bin and pretended it was elegance. "Fine, you’re the face painter. I’ll manage logistics and dramatic reveals myself." Molly said. "I can draw raccoons. Symbolism feels on-brand for moral ambiguity and snacks." Bill said. They bumped fists, thieves consecrating a plan with hand sanitizer, rolled toward destiny, which had juice boxes and nap time.
created_at
"2026-02-16T13:26:18.321194+00:00"
2/16/2026, 1:26:18 PM
model
"openai/gpt-5-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
7519
moderation_latency
(null)
generation_time
145135
tokens_prompt
77
tokens_completion
1986
native_tokens_prompt
66
native_tokens_completion
6564
native_tokens_completion_images
0
native_tokens_reasoning
5888
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0657225
router
(null)
provider_responses
0
id
"resp_000988d9392a4d340169931a6878ac81948eaad010f3e2f7ac"
status
200
is_byok
false
latency
179
endpoint_id
"7c2f859a-7890-4e8e-b1de-1cd1c0a800b4"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-2025-08-07"
api_type
"completions"
id
"gen-1771248232-y4J1oEtRrL06YprxU08i"
upstream_id
"resp_000988d9392a4d340169931a6878ac81948eaad010f3e2f7ac"
total_cost
0.0657225
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.9190%
Matches word count
n/a
neededClean
false
words
503
99.9980%
Dialogue to Total Word Ratio
Ratio: 30.02%, Deviation: 0.02%
neededClean
false
wordsTotal
503
wordsDialogue
151
99.9585%