NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KKAC1GD1CJP84GDPFHPQ40P7
Run Details
Model:
GPT-5.4 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
11.7s
Cost:
$0.005445
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
51.93%
Matches word count
(details)
99.99%
Dialogue to Total Word Ratio
(details)
75.9593%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Bill, this is our classiest job yet.” “Robbing a clown-themed birthday party is classy?” “Compared with the ferret smuggling, absolutely.” “I still have bite marks shaped like Wales.” “Focus. Seven o’clock, the magician arrives. Seven-oh-five, every child stares at the rabbit. We slip through the patio doors, sweep the gift table, and vanish.” “With thirty brightly wrapped boxes and a plastic pony?” “The pony is optional. The box with the gold ribbon is not. Rich parents always use gold ribbon.” “Or they’re tasteless.” “Tasteless people buy expensive toys. We respect that.” Bill adjusted his tiny paper hat. “Why are we disguised as uncles?” “Because no one questions an uncle wearing regret and barbecue sauce.” “I’m not wearing regret. I’m wearing suspicion.” “Good. Suspicion looks natural on you.” “And what’s your role?” “I’m Auntie Molly from Croydon. I kiss cheeks, misname children, and complain about parking.” “That is disturbingly convincing.” From beyond the hedge came shrieking and a burst of party music. “That’s our cue,” Molly whispered. “One concern.” “You only get one.” “What if the children notice the presents moving away?” “Then smile and say, ‘The birthday boy requested a surprise redistribution.’” “That’s terrible.” “It’s confident. Confidence is nine-tenths of crime.” “And the other tenth?” “Balloon animals. Now straighten your hat, Uncle Bill. We’re going in.”
created_at
"2026-03-09T22:38:48.235Z"
3/9/2026, 10:38:48 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
3063
moderation_latency
(null)
generation_time
11638
tokens_prompt
77
tokens_completion
553
native_tokens_prompt
66
native_tokens_completion
352
native_tokens_completion_images
0
native_tokens_reasoning
34
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.005445
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_03c1722867978bf20169af4bf852dc8190bf997efbc81828a4"
is_byok
false
latency
236
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773095928-R6mylRrAOqaa4z3uU6cR"
upstream_id
"resp_03c1722867978bf20169af4bf852dc8190bf997efbc81828a4"
total_cost
0.005445
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
51.93%
Matches word count
n/a
neededClean
false
words
216
99.99%
Dialogue to Total Word Ratio
Ratio: 90.91%, Deviation: 0.91%
neededClean
false
wordsTotal
220
wordsDialogue
200
75.9593%