NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0AC4NBSWH42CGHSHA0W4DX
Run Details
Model:
GPT-5.4 Nano (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
3.0s
Cost:
$0.000741
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
91.76%
Dialogue to Total Word Ratio
(details)
45.8806%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge like a professional shrub impersonator, scanning the backyard through a gap in the leaves. Bill knelt beside her, pressing an earbud to the air as if the party might whisper a schedule directly into his skull. From inside, squeals of laughter bounced off lawn chairs and the inflatable slide leaned like it was trying not to listen. “Tonight, no squeaky toys—just presents.” Molly held up two fingers, then a third, as if counting was a crime. Bill nodded solemnly, the way people do right before they definitely don’t die. “They’ve got a gift table near the doorway,” Bill murmured, pointing with his elbow. “Guarded by… sugar.” He grinned and rubbed his hands together, which made a soft crinkling sound from his stolen snack bag. “And we take the shiny ones first.” Molly slid a tiny flashlight between her teeth and checked the string lights. One blinked out, like the house itself had winked at them. Their plan was elegant: Bill would distract the party by loudly “finding” the lost piñata, while Molly would glide in, grab every present with ribbon so loud it practically sang, and stack them into the backup stroller. A balloon bobbed past, tethered by a string trailing right across the entryway like a tripwire. Bill stared at it, then at Molly. “If it pops,” he said, “we’ll need a whole new disguise.” Molly sighed. “Fine. We’ll blame the balloon.” Then she adjusted her glove and slipped through the shadow of the open door.
created_at
"2026-03-18T11:12:57.017Z"
3/18/2026, 11:12:57 AM
model
"openai/gpt-5.4-nano-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2004
moderation_latency
(null)
generation_time
2805
tokens_prompt
77
tokens_completion
395
native_tokens_prompt
66
native_tokens_completion
582
native_tokens_completion_images
0
native_tokens_reasoning
249
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0007407
router
(null)
provider_responses
0
endpoint_id
"0c835f2e-c18d-4e8c-b245-e1e3bd08b97f"
id
"resp_09b197f6fb2364250169ba88b911848194b7c192a0d667db72"
is_byok
false
latency
181
model_permaslug
"openai/gpt-5.4-nano-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832377-DmaPq9Ukyg6hlm25L4sR"
upstream_id
"resp_09b197f6fb2364250169ba88b911848194b7c192a0d667db72"
total_cost
0.0007407
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
253
91.76%
Dialogue to Total Word Ratio
Ratio: 15.42%, Deviation: 5.42%
neededClean
false
wordsTotal
253
wordsDialogue
39
45.8806%