NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKAE3M75PKN9NDBVBABYCKZ
Run Details
Model:
GPT-5
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
53.4s
Cost:
$0.034603
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly: Observe the battlefield, Bill—balloons, frosting, and unguarded loot wrapped like tiny signal flares. Bill: Operation Jingle Swipe commences at cake o’clock. Molly: It's a dinosaur cake; we strike when the tail is sliced. Bill: Distraction plan? Molly: I juggle cupcakes and shout, "The magician lost a rabbit!" Bill: Bold. Sticky, but bold. Molly: Meanwhile, you wheel in the gift wagon. Bill: We have a wagon? Molly: Yes. It's a stroller with swagger. Bill: If questioned by toddlers? Molly: Speak fluent awe. Say, "Wow, is that a glow-in-the-dark sticker?" Then point elsewhere. Bill: Morally, is this low? Molly: We’re height-adjusting our ethics. Bill: Contingencies? Molly: Balloon animal smoke screen. Twist five giraffes, release, instant visual chaos. Bill: What about parents? Molly: Complimentary napkins. No adult resists unexpected napkins. Bill: I fear frosting prints. Molly: Wear sprinkle-proof gloves; fashion meets function. Bill: Exit route? Molly: Pinata evacuation. We blend into candy-based jubilation. Bill: And if they sing Happy Birthday mid-heist? Molly: We harmonize, then steal on the applause. Bill: Final inventory objective? Molly: Presents labeled "From Aunt Linda"—statistically socks. Leave them. Bill: Conscience clause? Molly: We leave a thank-you balloon. Shaped like remorse. Bill: Remorse floats. Socks don't. Helium improves morale.
created_at
"2026-02-16T13:33:24.052136+00:00"
2/16/2026, 1:33:24 PM
model
"openai/gpt-5-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
42586
moderation_latency
(null)
generation_time
53031
tokens_prompt
77
tokens_completion
777
native_tokens_prompt
66
native_tokens_completion
3452
native_tokens_completion_images
0
native_tokens_reasoning
3072
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0346025
router
(null)
provider_responses
0
id
"resp_066dba898940bbaf0169931c6e77dc81958805ae9fe195a14e"
status
200
is_byok
false
latency
238
endpoint_id
"7c2f859a-7890-4e8e-b1de-1cd1c0a800b4"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-2025-08-07"
api_type
"completions"
id
"gen-1771248750-rtJpZiXO3ZKg0wUyuKiH"
upstream_id
"resp_066dba898940bbaf0169931c6e77dc81958805ae9fe195a14e"
total_cost
0.0346025
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
90.4837%
Matches word count
n/a
neededClean
false
words
210
0.0000%
Dialogue to Total Word Ratio
Ratio: 8.10%, Deviation: 81.90%
neededClean
false
wordsTotal
210
wordsDialogue
17
45.2419%