NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM09NPV01H352X1ZVHF156T5
Run Details
Model:
GPT-5.4 Nano
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
2.1s
Cost:
$0.000476
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
94.96%
Dialogue to Total Word Ratio
(details)
47.4817%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Molly, check the frosting budget on the door—if it’s too thick, we’ll trigger the alarm,” Bill whispered. “It’s a children’s party, Bill. The only thing thick is their sugar tolerance.” “Don’t underestimate sugar. Sugar makes people inventive,” Molly said, adjusting her tiny headlamp. “Inventive like tap-dancing guards?” “Inventive like screaming ‘OH NO, THE CAKE!’ and forgetting we exist.” Bill grinned. “Okay, Plan A: we melt into the crowd.” “We don’t melt,” Molly corrected. “We blend. Like… stealth pudding.” “Pudding doesn’t have a felony license.” “Neither do you, yet here we are.” Bill held up a roll of gift wrap. “I brought replacements. If we swap, we don’t cause trauma.” “Trauma is the point,” Molly said, then sighed. “Fine, light trauma. Minimal tears. Mostly cash-out.” Bill glanced at the yard. “The presents are right behind the mascot bounce house.” “Perfect,” Molly said. “Bouncy equals distraction. Kids will stare at it like it’s the moon.” “And parents will take photos instead of looking for criminals.” “Yes. Say it louder so the universe agrees.” “Fine. Loud universe.” Molly whispered, “When the clock hits three, you go for the big pile.” “And you?” “I handle the smaller stuff.” “Like what, toothbrushes?” “No, Bill—like the ‘ages 3–5’ toys. Those are the real prizes.” “Because they’re cute?” “Because parents hide money in the batteries.” Bill blinked. “That’s… disturbingly smart.” “Now stop talking and start stealing.” “Aye aye, professional stealth pudding.”
created_at
"2026-03-18T11:00:41.966Z"
3/18/2026, 11:00:41 AM
model
"openai/gpt-5.4-nano-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
469
moderation_latency
(null)
generation_time
1901
tokens_prompt
77
tokens_completion
433
native_tokens_prompt
66
native_tokens_completion
370
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0004757
router
(null)
provider_responses
0
endpoint_id
"0c835f2e-c18d-4e8c-b245-e1e3bd08b97f"
id
"resp_069ed445c58cc90f0169ba85da0610819483b8c8bca562f300"
is_byok
false
latency
201
model_permaslug
"openai/gpt-5.4-nano-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773831641-GM3uGxfGSXATckJ3yBsD"
upstream_id
"resp_069ed445c58cc90f0169ba85da0610819483b8c8bca562f300"
total_cost
0.0004757
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
235
94.96%
Dialogue to Total Word Ratio
Ratio: 85.23%, Deviation: 4.77%
neededClean
false
wordsTotal
237
wordsDialogue
202
47.4817%