NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM07KW9FJHC6J8QJ4G22GPGB
Run Details
Model:
GPT-5.4 Mini (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
3.5s
Cost:
$0.001647
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
27.17%
Matches word count
(details)
98.90%
Dialogue to Total Word Ratio
(details)
63.0336%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Molly,” whispered Bill, “are you sure this is the right house?” “It has balloons, streamers, and a giant banner that says Happy Birthday, so unless the neighbors are throwing a very confusing divorce, yes.” “Good point. Now, remind me why we’re stealing presents from a children’s party?” “Because the client said the loot was easy, cheerful, and unguarded.” “Your client also once paid us in coupon books.” “And yet I still feel he understands value.” Bill peered through the hedge. “I see a clown.” “That’s not a clown.” “It has a red nose.” “That’s Mrs. Dalton’s husband.” “Oh. He does look like a clown.” “Might be useful.” “Useful how?” “If he asks questions, distract him with balloons. If he asks more questions, give him cake.” “And if he asks where the presents went?” “Molly smiled. “We say the gifts were taken for a surprise.” “A surprise for who?” “For us.” Bill grinned. “I like your style. Quietly in, quietly out?” “Exactly. No fuss, no screams, no glitter.” “Why no glitter?” “Because glitter is the fingerprint of chaos.” Bill nodded solemnly. “And the presents?” “We load them into the van.” “Then?” “Then we become the world’s worst party guests.” Bill sighed. “Molly?” “Yes?” “If there’s a pony, I’m stealing the pony too.” “You always say that.” “And I’m always right.”
created_at
"2026-03-18T10:24:44.858Z"
3/18/2026, 10:24:44 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2618
moderation_latency
(null)
generation_time
3264
tokens_prompt
77
tokens_completion
431
native_tokens_prompt
66
native_tokens_completion
355
native_tokens_completion_images
0
native_tokens_reasoning
38
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.001647
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_04146d293daf02330169ba7d6cf1308193889bf2bf9c0f5e01"
is_byok
false
latency
319
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829484-Fs7oK4KUJpJTIS1C5HlG"
upstream_id
"resp_04146d293daf02330169ba7d6cf1308193889bf2bf9c0f5e01"
total_cost
0.001647
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
27.17%
Matches word count
n/a
neededClean
false
words
219
98.90%
Dialogue to Total Word Ratio
Ratio: 86.76%, Deviation: 3.24%
neededClean
false
wordsTotal
219
wordsDialogue
190
63.0336%