NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM079AMHT9NVJ4S0XSC1G1C9
Run Details
Model:
GPT-5.4 Nano (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
34.9s
Cost:
$0.005831
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
86.38%
Matches word count
(details)
86.05%
Dialogue to Total Word Ratio
(details)
86.2128%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill crouched behind the birthday balloons. “Bill, quiet. We’re stealing the good stuff from the kids’ party.” “Quiet? The bounce house is louder than my childhood.” “Exactly—blend in. You’re wearing the llama hat; I’m the polite balloon cloud.” “You can’t be a balloon cloud. You’re a woman with plans.” “Plans are balloons. Inflate them with theft.” “Uh-oh. Host’s dad just yelled, ‘Anyone have allergies?’” “We do. Allergies to boring decorations.” “I brought the present sack.” “Name it something cute so you don’t sound like a criminal in line: ‘Giggle Bag.’” “Giggle Bag it is. What’s the target?” “Every gift under the table, plus the ones people ‘ooh’ at.” “And the unwrapped pile?” “Only if it’s still wrapped in its innocence.” “How do we grab them without triggering the toy-inator?” “We use the distraction: you spill punch.” “I can’t spill punch. I’m a professional.” “Perfect. Spill it professionally.” “On three?” “On three. One—walk like we belong. Two—smile. Three—steal, then exit as gently as a kindergarten fart.” “What about the magician? I stole from a circus; the rabbit filed a complaint.” “No rabbits today. Only toddlers. If anyone asks, we’re ‘party helpers’ with wishes.” “And if the presents start screaming?” “Then you whisper, ‘Be brave, it’s a nap.’”
created_at
"2026-03-18T10:18:59.308Z"
3/18/2026, 10:18:59 AM
model
"openai/gpt-5.4-nano-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
33937
moderation_latency
(null)
generation_time
34330
tokens_prompt
77
tokens_completion
1370
native_tokens_prompt
66
native_tokens_completion
4654
native_tokens_completion_images
0
native_tokens_reasoning
4309
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0058307
router
(null)
provider_responses
0
endpoint_id
"0c835f2e-c18d-4e8c-b245-e1e3bd08b97f"
id
"resp_0c0a6f1763ee78a40169ba7c1387f48194bd16ea5c9d370aa9"
is_byok
false
latency
200
model_permaslug
"openai/gpt-5.4-nano-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829139-e5RJWtjqLzDfknaufzVs"
upstream_id
"resp_0c0a6f1763ee78a40169ba7c1387f48194bd16ea5c9d370aa9"
total_cost
0.0058307
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
86.38%
Matches word count
n/a
neededClean
false
words
211
86.05%
Dialogue to Total Word Ratio
Ratio: 96.23%, Deviation: 6.23%
neededClean
false
wordsTotal
212
wordsDialogue
204
86.2128%