NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM08E5D0RF5YCBQ147QE0M0Z
Run Details
Model:
GPT-5.4 Nano
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
2.1s
Cost:
$0.000427
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0014%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly slid along the hallway like a tasteful shadow, wearing a fake mustache that looked offended by the job. Bill crouched beside the stolen delivery cart, studying the door keypad with the focus of a detective and the luck of a goldfish. “Okay,” Molly whispered, adjusting her crowbar-shaped key. “We go in, we grab presents, we go out before anyone asks our names.” Bill held up a printed floor map, mostly blank except for a hand-drawn lollipop and an angry stick figure labeled Toddler Guardian. “I don’t see a way around the cupcakes,” Bill murmured. Molly pointed at the ceiling vent. “That’s our way. Also, the cupcakes are probably a decoy. Like an alarm, but edible.” Bill snorted and pressed his ear to the door. A muffled cheer floated through, followed by the thump of bouncing feet and the squeal of someone learning how to blow bubbles. “If we’re caught,” Bill said, “I’m claiming we’re surprise magicians.” Molly grinned. “Perfect. I’ll be the magician who accidentally drops the entire sack of joy.” Bill checked his gloves. “Do we really have to steal all of them?” “Yes,” Molly hissed, “every single shiny little box. It’s the only acceptable business plan.” Bill exhaled. “Fine. Quick in, quick out.” Molly nodded toward the vent. “On three.” “One,” Bill whispered. “Two,” Molly said. “Th—” The doorbell chirped, bright and cheerful, like it had just learned manners.
created_at
"2026-03-18T10:39:06.154Z"
3/18/2026, 10:39:06 AM
model
"openai/gpt-5.4-nano-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
485
moderation_latency
(null)
generation_time
1913
tokens_prompt
77
tokens_completion
376
native_tokens_prompt
66
native_tokens_completion
331
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00042695
router
(null)
provider_responses
0
endpoint_id
"0c835f2e-c18d-4e8c-b245-e1e3bd08b97f"
id
"resp_01b0d8c59444ed3f0169ba80ca37108197bf83fa82bba5f9af"
is_byok
false
latency
213
model_permaslug
"openai/gpt-5.4-nano-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830346-mURnB7bAZoaWJLdOAyYI"
upstream_id
"resp_01b0d8c59444ed3f0169ba80ca37108197bf83fa82bba5f9af"
total_cost
0.00042695
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
232
0.00%
Dialogue to Total Word Ratio
Ratio: 38.46%, Deviation: 28.46%
neededClean
false
wordsTotal
234
wordsDialogue
90
0.0014%