NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM09XNG2Z83M2NF76WF1GC3B
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
5.1s
Cost:
$0.002246
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.21%
Matches word count
(details)
68.97%
Dialogue to Total Word Ratio
(details)
34.5925%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Molly, are you sure this is the right house?” “Bill, I am looking at a lawn full of balloons, a banner that says Happy Seventh, and a clown crying in a hedge. Yes, this is the right house.” “I hate clowns.” “That’s not relevant.” “It feels relevant.” “Bill, we are not here for emotional growth. We are here for presents.” “Right. All the presents.” “All of them.” “Even the giant wrapped tricycle?” “Especially the giant wrapped tricycle.” Bill peered through the window. “There are a lot of children.” “Of course there are. It’s a children’s party.” “And a magician.” “A bad magician.” “He just made a rabbit disappear.” “No, Bill, that rabbit is behind the cake eating frosting.” “Oh. He is bad.” “Mildly encouraging, though. Everyone’s distracted.” “I see the parents by the punch bowl.” “Then keep your head down and your conscience lower.” “My conscience is already sitting.” “Good. We go in through the back, swap the presents into our sacks, and leave before anyone sings.” “What if they start singing?” “Then we move faster.” “What if a child asks us our names?” “You are Uncle Bill from the Funny Furniture Store.” “I’m not saying that.” “Then you’re the man with the emergency balloon insurance.” “That’s worse.” “And I’m Aunt Molly, here to check the stuffing.” “Why does everything you say sound illegal?” “Because, Bill, it is.”
created_at
"2026-03-18T11:05:02.736Z"
3/18/2026, 11:05:02 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4054
moderation_latency
(null)
generation_time
4419
tokens_prompt
77
tokens_completion
571
native_tokens_prompt
66
native_tokens_completion
488
native_tokens_completion_images
0
native_tokens_reasoning
172
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0022455
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_00cc181146b55c650169ba86deca0081958194db96d47da9e7"
is_byok
false
latency
187
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773831902-mJd0fEgjHrjlHjQrkK0B"
upstream_id
"resp_00cc181146b55c650169ba86deca0081958194db96d47da9e7"
total_cost
0.0022455
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.21%
Matches word count
n/a
neededClean
false
words
228
68.97%
Dialogue to Total Word Ratio
Ratio: 97.81%, Deviation: 7.81%
neededClean
false
wordsTotal
228
wordsDialogue
223
34.5925%