NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM088JY2PNGCGMNQFWRT74YG
Run Details
Model:
GPT-5.4 Mini (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
3.1s
Cost:
$0.001827
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
60.28%
Matches word count
(details)
18.52%
Dialogue to Total Word Ratio
(details)
39.3986%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Bill, tell me again why we’re crouched behind a hedge by a bouncy castle,” Molly whispered. “Because, darling, that is where the riches are,” Bill said, peering at a paper banner reading HAPPY BIRTHDAY, LEO. “See that table? Mountains of presents.” “You call pirate Lego riches?” “Absolutely. One child’s plastic kingdom is another thief’s retirement fund.” Molly adjusted her black gloves. “We are awful people.” “We are misunderstood logistics specialists.” A shriek of laughter floated over the fence. “Will the children notice?” Molly asked. Bill frowned. “Not if we blend in.” “With what?” “With enthusiasm. You carry a balloon, I’ll carry a slice of cake. Nobody suspects confectionery.” Molly looked at the clown at the gate. “And the clown?” “Dead giveaway.” “So what’s the plan?” “Simple. I charm the parents, you distract the magician, then we roll the presents into the laundry cart by the garage.” “Laundry cart?” “Borrowed from the venue. I’m calling it recycling.” Molly grinned despite herself. “And if anyone asks?” Bill straightened his tie. “We’re the aunt and uncle from accounting.” “You don’t look like an uncle.” “Neither do you, but the children’s party is about to learn a painful lesson in operational excellence.” “Bill?” “Yes?” “If there’s a bouncing dinosaur, I’m stealing that too.” “Now you’re thinking like a professional.”
created_at
"2026-03-18T10:36:03.408Z"
3/18/2026, 10:36:03 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2418
moderation_latency
(null)
generation_time
2818
tokens_prompt
77
tokens_completion
543
native_tokens_prompt
66
native_tokens_completion
395
native_tokens_completion_images
0
native_tokens_reasoning
81
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.001827
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_07fd91bd7f7b4e6e0169ba8013774481958f72eb500aa5690c"
is_byok
false
latency
222
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830163-QhA28fODdX303nVx39xu"
upstream_id
"resp_07fd91bd7f7b4e6e0169ba8013774481958f72eb500aa5690c"
total_cost
0.001827
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
60.28%
Matches word count
n/a
neededClean
false
words
215
18.52%
Dialogue to Total Word Ratio
Ratio: 78.60%, Deviation: 11.40%
neededClean
false
wordsTotal
215
wordsDialogue
169
39.3986%