NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1J912BT232WXQFP5EKZ6G6
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
9.1s
Cost:
$0.005145
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
86.38%
Matches word count
(details)
36.79%
Dialogue to Total Word Ratio
(details)
61.5841%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Bill, stop jingling. You sound like a haunted ice-cream van.” “It’s not jingling, Molly, it’s tactical equipment.” “It’s six party whistles and a yo-yo.” “The yo-yo is for misdirection.” “You are the misdirection. Listen: we go in smiling, leave carrying every present, and nobody cries until at least bedtime.” “Harsh. Efficient. I like it. What’s the route?” “Through the side gate, past the inflatable castle, under the banner, straight to the gift table.” “The one guarded by six-year-olds with sticky hands?” “Exactly. Sticky hands are slow hands.” “What about the clown?” “Bribed.” “With what?” “A coupon for balloon upgrades and my respect.” “Dangerous generosity, Molly. And disguises?” “You’re a magician.” “I hate magicians. Smug sleeve people.” “You already own the cape.” “It’s a blackout cape.” “It has stars on it.” “Blackout stars.” “Fine. I’m the face painter.” “You can’t paint faces.” “I don’t need to. I need to say, ‘Eyes closed,’ then pivot to presents.” “And if someone asks why your moustache is crooked?” “I’ll say it’s a tiger.” “Good. Escape plan?” “Bouncy castle collapse, controlled panic, bicycles waiting by the hedge.” “My bicycle squeaks.” “Then pedal in rhythm. It’ll sound festive.” “Molly?” “Yes?” “This is our lowest job.” “Bill, they got three hoverboards and a karaoke machine. Show some ambition.”
created_at
"2026-03-06T12:34:36.242Z"
3/6/2026, 12:34:36 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
568
moderation_latency
(null)
generation_time
9061
tokens_prompt
77
tokens_completion
451
native_tokens_prompt
66
native_tokens_completion
332
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.005145
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0726f47d9af381ea0169aac9dc4d2481959c53e2f2f5b5ff5f"
is_byok
false
latency
186
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772800476-9eWWWgyuuC0JhRAOfmMC"
upstream_id
"resp_0726f47d9af381ea0169aac9dc4d2481959c53e2f2f5b5ff5f"
total_cost
0.005145
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
86.38%
Matches word count
n/a
neededClean
false
words
211
36.79%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
216
wordsDialogue
216
61.5841%