NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHK9QC3P2JVPE4FR6TPWDX6K
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
35.1s
Cost:
$0.023113
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill crouched behind the inflatable castle, faces smeared with camouflage icing. Balloons bobbed above the fence like confused surveillance drones, squeaking with each suspicious breeze. Molly checked her watch, a pink plastic unicorn blinking midnight over and over. Bill clutched a crumpled map of the backyard, annotated with glittery stickers and ominous arrows. From the open patio doors drifted the chaotic soundtrack of small humans sugared beyond reason. Somewhere inside, a piƱata trembled in existential terror. The scent of cake and impending moral compromise hung thick in the suburban night. Molly adjusted her ridiculous sparkly cat-ear mask nervously. "Molly, this is the worst thing we've ever planned," Bill muttered. "Worse than the time you tried to pickpocket a magician's assistant during the sawing trick?" "She still sends me death threats made of glitter," Bill said. "Focus, Bill. Timeline: cake is served at six thirty, sugar-coma meltdown by seven, parents vulnerable by seven fifteen." "And we slip in disguised as exhausted, underpaid entertainers," Bill said. "I brought balloon animals and a fake clown license," Molly said proudly. "Do not flash that at the police again," Bill hissed. "Relax. We grab the presents, blame Santa, and moonwalk out." "What could?"
created_at
"2026-02-16T13:20:40.503779+00:00"
2/16/2026, 1:20:40 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
12373
moderation_latency
(null)
generation_time
34899
tokens_prompt
77
tokens_completion
1081
native_tokens_prompt
66
native_tokens_completion
2303
native_tokens_completion_images
0
native_tokens_reasoning
2024
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0231125
router
(null)
provider_responses
0
id
"resp_0066a29578aafcc30169931985517c8193bda1f9360fe1fe56"
status
200
is_byok
false
latency
190
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771248005-7TsvqUYVD5laVoU9881J"
upstream_id
"resp_0066a29578aafcc30169931985517c8193bda1f9360fe1fe56"
total_cost
0.0231125
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.9840%
Matches word count
n/a
neededClean
false
words
202
91.5820%
Dialogue to Total Word Ratio
Ratio: 44.55%, Deviation: 5.45%
neededClean
false
wordsTotal
202
wordsDialogue
90
95.7830%