NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM08ZF984QX865SN1G7H0FZQ
Run Details
Model:
GPT-5.4 Nano (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
4.2s
Cost:
$0.000564
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
51.93%
Matches word count
(details)
19.63%
Dialogue to Total Word Ratio
(details)
35.7755%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Molly, tell me we’re not seriously doing this,” Bill whispered, eyeing the inflatable slide. “We’re absolutely seriously doing this,” Molly said. “Step one: be cute. Step two: be invisible. Step three: steal Christmas.” “Step three feels suspicious,” Bill said. “It’s not suspicious if we do it politely,” Molly replied. “How do you steal politely?” “We ask the presents if they’d like to come with us,” Molly said. Bill snorted. “Presents don’t talk.” “Not out loud,” Molly said. “But I bet they squeak when you lift them.” Bill leaned closer to the door. “What’s the plan if someone yells ‘Where are the presents?’” “Then we scream back, ‘Right here!’” Molly said. “Like… really loudly?” “Like enthusiastic aunts,” Molly replied. “Smile, wave, and claim we’re ‘present wranglers.’” Bill swallowed. “We’re present wranglers?” “Professionally,” Molly said. “You juggle. I carry. Together we become unstoppable.” “You’re calling this unstoppable?” Bill asked. “It’s unstoppable,” Molly insisted. “Also, you’re wearing the cat hat. Nobody suspects a man in a cat hat.” “I feel like I’m about to get adopted,” Bill murmured. “That’s the spirit,” Molly said. “Now, timing: when the birthday song starts, we move.” “And if the clown notices?” “We compliment the clown,” Molly said. “Then we distract it with glitter.” Bill blinked. “Glitter?” “Trust me,” Molly said. “Glitter solves everything.”
created_at
"2026-03-18T10:48:33.335Z"
3/18/2026, 10:48:33 AM
model
"openai/gpt-5.4-nano-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
3242
moderation_latency
(null)
generation_time
3985
tokens_prompt
77
tokens_completion
538
native_tokens_prompt
66
native_tokens_completion
441
native_tokens_completion_images
0
native_tokens_reasoning
76
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00056445
router
(null)
provider_responses
0
endpoint_id
"0c835f2e-c18d-4e8c-b245-e1e3bd08b97f"
id
"resp_06beb8affb0cd81c0169ba830164388193a7c0e2daa4eb595a"
is_byok
false
latency
256
model_permaslug
"openai/gpt-5.4-nano-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830913-arUjiB6jFZiNEgZ0lpWz"
upstream_id
"resp_06beb8affb0cd81c0169ba830164388193a7c0e2daa4eb595a"
total_cost
0.00056445
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
51.93%
Matches word count
n/a
neededClean
false
words
216
19.63%
Dialogue to Total Word Ratio
Ratio: 78.70%, Deviation: 11.30%
neededClean
false
wordsTotal
216
wordsDialogue
170
35.7755%