NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKAGWH3G99HPRNA9J9JXJ22
Run Details
Model:
GPT-5
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
1m 59s
Cost:
$0.057103
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The community center squatted under balloons like a suspect pretending to be cheerful. Molly sketched a battle plan on a napkin sticky with box casualties, mapping exits, gift tables, and the inflatable unicorn named Sir Neighs-a-Lot. Bill scoped the perimeter through a window painted with finger suns, noting the cupcake minefield and a clown dabbing. "Molly, we hit the bouncy castle at midnight," Bill whispered. "Focus. Wrapping paper rustles like alarms," she said. "Disguises?" "Clowns." "Terrifying." "Grab bows, quietly." Between them lay a duffel of morally complicated sacks, glitter-proof gloves, and a squeaker for plausible innocence. A DJ rehearsed Baby Shark; shadows did the choreography of regret. Molly checked her watch, a prize from a cereal box sweepstakes; it ticked with the gravity of confetti. They rehearsed the silent handshake: tap, tap, jazz hands. Bill had practiced crawling under tables; he could recite the underside of a folding chair. The security team consisted of toddlers hopped up on fondant and one grandmother who could smell mischief across counties. Plan: infiltrate as party entertainers, misdirect with balloon giraffes, liberate parcels, exfiltrate via piƱata confusion. Contingency: surrender a decoy gift labeled For Grandma (contains kale). The night tasted like frosting and fear.
created_at
"2026-02-16T13:36:00.102805+00:00"
2/16/2026, 1:36:00 PM
model
"openai/gpt-5-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
16702
moderation_latency
(null)
generation_time
117525
tokens_prompt
77
tokens_completion
949
native_tokens_prompt
66
native_tokens_completion
5702
native_tokens_completion_images
0
native_tokens_reasoning
5376
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0571025
router
(null)
provider_responses
0
id
"resp_0a3511b4456021b60169931cc95c0c81938612eb56289b0852"
status
200
is_byok
false
latency
181
endpoint_id
"7c2f859a-7890-4e8e-b1de-1cd1c0a800b4"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-2025-08-07"
api_type
"completions"
id
"gen-1771248841-nhAJMKr5xFh3mfXYqhFc"
upstream_id
"resp_0a3511b4456021b60169931cc95c0c81938612eb56289b0852"
total_cost
0.0571025
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.9190%
Matches word count
n/a
neededClean
false
words
203
100.0000%
Dialogue to Total Word Ratio
Ratio: 9.85%, Deviation: 0.15%
neededClean
false
wordsTotal
203
wordsDialogue
20
99.9595%