NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKAFDZF9J1X9Q5NV0ZFWZK0
Run Details
Model:
GPT-5
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
1m 7s
Cost:
$0.038863
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Rain tickled the neon marquee of the community hall as two shadows hunched beneath a bouncy-castle repair tarp. Molly flicked confetti off her sleeve, a constellation mapping out every bad idea she’d ever loved. Bill balanced a cupcake reconnaissance kit on his knee, frosting forming diplomatic peaks. Inside, a DJ tested siren-squeal kazoo, and balloons patrolled like jellyfish. The banner screamed HAPPY BIRTHDAY in fonts so cheerful it hurt. Molly squinted at the gift table, a mountain range with bow-shaped weather systems. Bill breathed like a yogi trying not to sneeze. The night smelled like sugar, rubber, and sticky consequences. "Molly: We blend in, grab the mountain, and moonwalk out." "Bill: Blend in how? I don’t speak balloon." "Molly: Face paint, smiles, and the sacred art of casual gift-lifting." "Bill: Casual? Those bows look like tiny alarm octopi." "Molly: Relax. We’re guests from the Department of Surprise." "Bill: Do we have badges?" "Molly: We have confidence and a recycle bin cape." "Bill: I call dibs on the cape. It rustles with authority." "Molly: Timing’s key: cake frenzy, loud chorus, strategic scoot." "Bill: I’m allergic to choruses." "Molly: Then hum and carry three boxes." "Bill: What if a toddler interrogates me?"
created_at
"2026-02-16T13:34:21.88741+00:00"
2/16/2026, 1:34:21 PM
model
"openai/gpt-5-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
63967
moderation_latency
(null)
generation_time
67412
tokens_prompt
77
tokens_completion
328
native_tokens_prompt
66
native_tokens_completion
3878
native_tokens_completion_images
0
native_tokens_reasoning
3520
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0388625
router
(null)
provider_responses
0
id
"resp_05a36f6c35cdacf50169931c99aa188197ae19f69618817789"
status
200
is_byok
false
latency
155
endpoint_id
"7c2f859a-7890-4e8e-b1de-1cd1c0a800b4"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-2025-08-07"
api_type
"completions"
id
"gen-1771248793-avB2P14uanxZJUZO0uku"
upstream_id
"resp_05a36f6c35cdacf50169931c99aa188197ae19f69618817789"
total_cost
0.0388625
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.7443%
Matches word count
n/a
neededClean
false
words
204
99.9994%
Dialogue to Total Word Ratio
Ratio: 49.51%, Deviation: 0.49%
neededClean
false
wordsTotal
204
wordsDialogue
101
99.8719%