NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM08S3R9WV8640RYAVWGCERW
Run Details
Model:
GPT-5.4 Nano (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
21.8s
Cost:
$0.003361
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
99.9995%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Bill adjusted his fake mustache in the mirror of a parked minivan, as if facial hair could disguise intent. Molly crouched beside him, studying the inflatable castle through binoculars. “Tonight, we take every present.” The party’s music leaked from the house like cupcake batter—sweet, sticky, and louder than their courage. Two balloon animals bobbed near the door, guarding it with squeaky neutrality. Molly slid a wafer-thin lock-pick into a pocket, then checked the timer on her wristwatch: fifteen minutes until the magician began. Bill whispered, “Shh—kids cry, not alarms.” He held a glitter sack to blend with the wrapping paper. Inside, parents laughed, and the smell of frosting made everyone brave. Molly glanced at the sidewalk, where a trail of confetti glittered like breadcrumbs from an evil fairy. “We’ll be in and out before frosting.” The back window was slightly ajar, the perfect invitation for criminals with etiquette. They climbed in through the gap, silent except for Bill’s knee complaining like an old pirate. “Gloves on, hearts off.” Molly moved first, tapping each present tag with a little adhesive scanner, collecting names, ages, and chocolate varieties. The living room became a polite crime scene: stolen gifts, no casualties so far.
created_at
"2026-03-18T10:45:04.919Z"
3/18/2026, 10:45:04 AM
model
"openai/gpt-5.4-nano-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
21002
moderation_latency
(null)
generation_time
21288
tokens_prompt
77
tokens_completion
1112
native_tokens_prompt
66
native_tokens_completion
2678
native_tokens_completion_images
0
native_tokens_reasoning
2405
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0033607
router
(null)
provider_responses
0
endpoint_id
"0c835f2e-c18d-4e8c-b245-e1e3bd08b97f"
id
"resp_068250f67abaf0570169ba8230fbd481959489031942b1ad2d"
is_byok
false
latency
195
model_permaslug
"openai/gpt-5.4-nano-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830704-cUELIJc7WcubmF1ZuPNC"
upstream_id
"resp_068250f67abaf0570169ba8230fbd481959489031942b1ad2d"
total_cost
0.0033607
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
201
100.00%
Dialogue to Total Word Ratio
Ratio: 9.85%, Deviation: 0.15%
neededClean
false
wordsTotal
203
wordsDialogue
20
99.9995%