NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM09EP4FP196XC3NPF2JC666
Run Details
Model:
GPT-5.4 Mini (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
3.3s
Cost:
$0.001683
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
93.65%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
46.8248%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly flattened herself behind the bakery van and watched the children’s party glittering in the community hall, where balloons bobbed like cheerful spies. Inside, a clown was twisting a giraffe from a balloon, and somewhere a tiny voice was singing off-key about cake. Bill adjusted his black gloves, which were theatrical, impractical, and somehow made him look like a nervous penguin. “Remind me,” he whispered, “why are we robbing a room full of toddlers?” Molly checked her list, which read: presents, bonus ribbons, possibly a regrettable amount of glitter. “Because the presents are stacked by the stage,” she said. “Easy access. No alarms. Only one security guard.” Bill peered through the window. The guard was an elderly man asleep in a folding chair, holding a paper plate on his chest like a medal. “That’s not a guard,” Bill said. “That’s Mr. Nap.” Molly grinned. “Exactly. We go in as entertainers.” Bill blinked. “I don’t know any magic.” “Neither do the children,” Molly said. “We’re safe.” She lifted a pile of gift bags from their trunk. “You’re the juggler.” Bill stared at the bowling pins. “These are soup cans.” “Then be an inventive juggler,” Molly replied, and together they marched toward the music, smiling like professionals with absolutely no shame.
created_at
"2026-03-18T10:56:51.869Z"
3/18/2026, 10:56:51 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2459
moderation_latency
(null)
generation_time
2843
tokens_prompt
77
tokens_completion
458
native_tokens_prompt
66
native_tokens_completion
363
native_tokens_completion_images
0
native_tokens_reasoning
67
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.001683
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_003a292d80d1ae930169ba84f3ec5c8190b1601a67725494d3"
is_byok
false
latency
216
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773831411-Z4f3umkX1jsNR465IOVH"
upstream_id
"resp_003a292d80d1ae930169ba84f3ec5c8190b1601a67725494d3"
total_cost
0.001683
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
93.65%
Matches word count
n/a
neededClean
false
words
209
0.00%
Dialogue to Total Word Ratio
Ratio: 30.00%, Deviation: 20.00%
neededClean
false
wordsTotal
210
wordsDialogue
63
46.8248%