NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHK97V9EPKQPZ6RA6DE77PYS
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
50.0s
Cost:
$0.018693
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Molly, I can't believe we're robbing a six-year-old's birthday party." "Calm down, Bill. Rich six-year-old. Those gift bags are basically ATMs with glitter." "There's a unicorn bouncy castle. That's not security, that's a trap for my dignity." "Dignity doesn't pay rent. Presents do. Now, schedule: clown shows at three, magician at three-thirty, we slip in during the balloon-animal chaos." "Explain how we don't get spotted while hauling a sack of Paw Patrol merchandise out the door." "Simple. We bring our own sack of decoy presents, swap, and vanish." "So we're… reverse Santas." "Exactly. Anti-Clauses." "I didn't grow up dreaming I'd become the Grinch's unpaid intern." "Consider it character development. Did you print the fake 'Adult Supervision Inspector' badges?" "Yes, but they say 'Supversion.' The printer jammed and I panicked." "Perfect. No one reads badges at a sugar high." "Fine. What about the kids? They see everything. They're like tiny drunk detectives." "That’s why we bring camouflage." "You bought camouflage for a suburban birthday?" Molly pulled out two oversized mascot heads from the duffel. "We go in as backup entertainment: Princess Llama and Sir Dance-a-lot." Bill stared. "I quit." "You dance, I loot." "…I'm billing overtime." "And I demand cake hazard."
created_at
"2026-02-16T13:12:26.816203+00:00"
2/16/2026, 1:12:26 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
6901
moderation_latency
(null)
generation_time
49861
tokens_prompt
77
tokens_completion
993
native_tokens_prompt
66
native_tokens_completion
1861
native_tokens_completion_images
0
native_tokens_reasoning
1557
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0186925
router
(null)
provider_responses
0
id
"resp_0e1aa7c32b58a672016993178890488193886bfd0a2707277a"
status
200
is_byok
false
latency
158
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771247496-OMKFbxi6pYFDfVPufoau"
upstream_id
"resp_0e1aa7c32b58a672016993178890488193886bfd0a2707277a"
total_cost
0.0186925
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
93.6496%
Matches word count
n/a
neededClean
false
words
209
96.7652%
Dialogue to Total Word Ratio
Ratio: 94.26%, Deviation: 4.26%
neededClean
false
wordsTotal
209
wordsDialogue
197
95.2074%