NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKATA8GEGRZ3AQRRW3FMA4A
Run Details
Model:
GPT-5.2
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
48.7s
Cost:
$0.040421
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Behind the bouncy castle, Molly and Bill hid in plain sight. "Rule one, we blend in. Smile like you understand Paw Patrol." "I practiced. Look: 'Woof, justice.'" "Stop. We’re here for loot, not canine law." "How many presents?" "Thirty-seven, plus a suspiciously large box labeled 'For Chloe’s pony.'" "We can’t steal a pony." "We’re not stealing the pony. We’re stealing the box. Might be a pony costume. Might be a real pony. Either way, it’s value." "What’s the entry plan?" "You’re the clown. I’m the magician. We distract, then vanish the gifts." "I can’t juggle." "Neither can clowns. Just throw things and apologize." "And the parents?" "Offer them a coupon for free face paint. Nobody can resist corporate nonsense." "What about the birthday kid?" "I’ll ask her to pick a card. She’ll pick 'Debt,' and I’ll pull 'Profit' from her ear." "Where do we stash everything?" "In the piñata. They’ll beat it later. We’re outsourcing the getaway." "When do we move?" "When they sing. That song lasts forever. We steal on the second 'cha-cha-cha.'" "And if anyone asks, we’re collecting donations for the Orphans of Unopened Toys today." "Also, Bill, don’t eat the cupcakes; frosting fingerprints are a rookie’s confession."
created_at
"2026-02-16T13:39:59.55977+00:00"
2/16/2026, 1:39:59 PM
model
"openai/gpt-5.2-20251211"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
33071
moderation_latency
(null)
generation_time
48496
tokens_prompt
77
tokens_completion
976
native_tokens_prompt
66
native_tokens_completion
2879
native_tokens_completion_images
0
native_tokens_reasoning
2574
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0404215
router
(null)
provider_responses
0
id
"resp_06a41fdaf2c796400169931dfe584c81949bc73cdcd46fcb2e"
status
200
is_byok
false
latency
210
endpoint_id
"f00142c2-6a93-49ce-9e36-5593b904ce3b"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.2-20251211"
api_type
"completions"
id
"gen-1771249150-WUW0bBBY6lrk5yqSEK0T"
upstream_id
"resp_06a41fdaf2c796400169931dfe584c81949bc73cdcd46fcb2e"
total_cost
0.0404215
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.9190%
Matches word count
n/a
neededClean
false
words
203
95.6906%
Dialogue to Total Word Ratio
Ratio: 94.58%, Deviation: 4.58%
neededClean
false
wordsTotal
203
wordsDialogue
192
97.8048%