NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHK9XMPN4A4PVS579FT5W9JN
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
51.6s
Cost:
$0.027432
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Molly, I feel morally conflicted about robbing a children's party," Bill whispered. "Bill, the children will be fine; they'll get character development," Molly replied. "Character development doesn't fit in a gift bag." "Exactly why we're taking the gift bags. Minimalist upbringing." "What about the clown?" "We bribe the clown with cake." "And the parents?" "Decoys. You wear the dragon costume, breathe glitter, create panic." "I can't breathe glitter; it clumps in my sinuses." "Then cough dramatically. They'll think it's immersive theater." "Suppose a kid cries on me." "Free camouflage. No one tackles the sticky dragon." "Where do I park the getaway car?" "Next to the bouncy castle. If we get chased, we bounce to freedom." "That sounds like cardio." "It's crime, Bill, not Pilates." "Fine. Timeline?" "At 'Happy Birthday' we applaud, grab the mountain of presents, shout 'educational audit,' and roll out." "Educat— why?" "Adults freeze when bureaucracy appears." "And if someone films us?" "We go viral, publish a self-help book: 'The Life-Changing Magic of Other People's Gifts.'" "I still feel bad." "Steal faster; you'll feel less."
created_at
"2026-02-16T13:24:22.727181+00:00"
2/16/2026, 1:24:22 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4881
moderation_latency
(null)
generation_time
51249
tokens_prompt
77
tokens_completion
1066
native_tokens_prompt
66
native_tokens_completion
2735
native_tokens_completion_images
0
native_tokens_reasoning
2446
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0274325
router
(null)
provider_responses
0
id
"resp_0e9d1f59863f33110169931a52ba048195908c013c51cc1986"
status
200
is_byok
false
latency
160
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771248210-AN2kN8jJkpvhUmCQbXdR"
upstream_id
"resp_0e9d1f59863f33110169931a52ba048195908c013c51cc1986"
total_cost
0.0274325
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
14.3014%
Matches word count
n/a
neededClean
false
words
179
69.5156%
Dialogue to Total Word Ratio
Ratio: 97.77%, Deviation: 7.77%
neededClean
false
wordsTotal
179
wordsDialogue
175
41.9085%