NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM09MN9E6JC3ZXB3TF0BBEQX
Run Details
Model:
GPT-5.4 Nano (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
2m 16s
Cost:
$0.030229
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
10.28%
Dialogue to Total Word Ratio
(details)
55.1375%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Night pooled in the driveway as Molly and Bill hunched under the streetlamp’s weak halo. Across the lawn, a children’s party roared with taped-on laughter, flashing lights, and the smell of frosting hiding under rain-soaked air. Gift bags formed a towering city near the tree, each one promising paper cuts and tiny grudges. Molly held up a roll of masking tape like it was holy scripture. Bill adjusted a borrowed “Clown Helper” name tag that kept peeling off his sweat. “Ready?” Molly asked. “As ready as a criminal can be when the target is… candy,” Bill said. Molly nodded toward the doorway. “We don’t fight. We charm. I’ll do the volunteer check-in.” Bill sighed. “You mean the part where you pretend to ‘verify allergies’?” “Exactly,” Molly said. “Kids love paperwork. Adults hate it. We steal while they’re busy.” Bill leaned closer. “And my job?” Molly’s eyes twinkled. “You carry the ‘gift return cart.’ It sounds official, therefore it’s unstoppable.” Bill frowned at the hand truck. “It’s called a hand truck.” “It’s a cart,” Molly corrected. “People will assume wheels equal responsibility.” From inside, a birthday host shouted for everyone to wash up. Chairs squeaked. Bill whispered, “What if they ask for our names?” “They’ll get names,” Molly said. “Yours is Uncle Bill. Mine is Aunt Molly. We’re very trustworthy relatives who ‘just happened’ to bring supplies.” Bill pocketed the badge; Molly smoothed her grin. They crept to the porch steps. Molly rang the bell with the solemn patience of a tax collector. A moment later, the door flew open, revealing a smiling parent in a dinosaur hoodie. “Hi! We’re here to help,” Molly announced brightly. Bill stepped forward. “We’re with the gift return program.” The parent beamed. “Oh! We didn’t order anything.” Molly laughed. “That’s because it’s a surprise donation. For safety, we need the presents staged before opening.” Bill’s hand truck rattled as he rolled it toward the tree, “Just for a quick repositioning,” he added. The parent blinked once, then nodded. “Okay, wow, you two are efficient.” Molly reached behind a candy-striped bag and unhooked it from a ribbon like a surgeon. Bill swallowed. “Do we take all of them?” “We take the ones that look expensive,” Molly said, tossing the bag into the duffel labeled PARTY SUPPLIES. “The rest go back. Raccoon charity.” Bill pointed at a small box. “What if there’s a bike in there?” Molly paused. “Then we steal it carefully.” Bill said, “No opening, right?” Molly replied, “Only relocating. Paper stays sacred.” Molly winked. “We do it slow enough to seem normal.” Behind them, music swelled and a child shrieked, “Is this the magic part?” The parent waved, distracted. Molly grabbed three more presents in quick succession. “Go, go, go.” Bill’s Santa hat slid sideways. “If I’m Uncle Bill, why do I keep getting de-shaped?” Molly didn’t look up. “Because family doesn’t fit. Now move.” By the time the parent returned to the party, the tree looked bare, like it was relocated. For now.
created_at
"2026-03-18T11:00:07.612Z"
3/18/2026, 11:00:07 AM
model
"openai/gpt-5.4-nano-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
134985
moderation_latency
(null)
generation_time
136023
tokens_prompt
77
tokens_completion
3385
native_tokens_prompt
66
native_tokens_completion
24173
native_tokens_completion_images
0
native_tokens_reasoning
23454
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03022945
router
(null)
provider_responses
0
endpoint_id
"0c835f2e-c18d-4e8c-b245-e1e3bd08b97f"
id
"resp_0ef5b1cd985564e20169ba85b7aaec8197a27103ccdfdc1bf6"
is_byok
false
latency
215
model_permaslug
"openai/gpt-5.4-nano-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773831607-JjMxVCEXcvEagghEi5YK"
upstream_id
"resp_0ef5b1cd985564e20169ba85b7aaec8197a27103ccdfdc1bf6"
total_cost
0.03022945
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
500
10.28%
Dialogue to Total Word Ratio
Ratio: 43.25%, Deviation: 6.75%
neededClean
false
wordsTotal
504
wordsDialogue
218
55.1375%