NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKA60XJBPQDASG8S6RA6EZ0
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
43.9s
Cost:
$0.025433
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In the shadow of the inflatable castle, Molly crouched behind a hedge, studying the battlefield of balloons and frosting. Bill lay beside her, clutching binoculars obviously stolen from a much more competent criminal. Beyond the fence, children screamed with the weaponized joy of unlimited sugar, circling a mountainous altar of wrapped boxes. Streamers fluttered like warning flags nobody could read. Molly adjusted her backpack, where lockpicks nestled between glitter glue sticks and a cartoon unicorn thermos, stolen earlier. She’d disguised their getaway car as a bouncy-castle delivery van, complete with wobbling plastic giraffe bolted to the roof. Bill checked his watch, a waterproof model in case of unexpected sprinklers or morally outraged parents wielding garden hoses. "Observation: tiny humans, no security," Molly whispered. "Except that clown," Bill muttered. "Clown's nothing; focus on presents," she said. "Fine." He sighed. A gust of wind rattled the paper plates, sending cartoon dinosaurs skittering across the grass like fleeing witnesses. From the speakers, bubblegum pop blared instructions about clapping hands and turning around, which doubled as a security lullaby. Molly tapped her blueprint: a crayon drawing of the backyard annotated with arrows, escape routes, and one ominous unicorn. She set her timer for chaos.
created_at
"2026-02-16T13:28:49.498171+00:00"
2/16/2026, 1:28:49 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4229
moderation_latency
(null)
generation_time
43518
tokens_prompt
77
tokens_completion
1101
native_tokens_prompt
66
native_tokens_completion
2535
native_tokens_completion_images
0
native_tokens_reasoning
2255
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0254325
router
(null)
provider_responses
0
id
"resp_0f995be5d99cfb1b0169931b656360819096362d0caf831cff"
status
200
is_byok
false
latency
142
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771248485-iTxzWWIoYJGHE7vX5cFi"
upstream_id
"resp_0f995be5d99cfb1b0169931b656360819096362d0caf831cff"
total_cost
0.0254325
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.9990%
Matches word count
n/a
neededClean
false
words
201
99.1553%
Dialogue to Total Word Ratio
Ratio: 6.97%, Deviation: 3.03%
neededClean
false
wordsTotal
201
wordsDialogue
14
99.5772%