NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1JB6QBZ8XFY53PN6ANNZ5G
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
11.4s
Cost:
$0.005130
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.21%
Matches word count
(details)
85.15%
Dialogue to Total Word Ratio
(details)
42.6826%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge, studying the inflatable castle bobbing over the garden fence like a cheerful surveillance balloon. Bill unfolded a crayon-smeared invitation he'd stolen from a coat pocket and squinted as if decoding military plans. “Ten children, one clown, two exhausted parents,” Molly whispered. “Frankly, Bill, it’s less a party and more a security collapse with cake.” Bill adjusted his flat cap. “I don’t like the clown. Clowns notice things. Tiny shoes, giant intuition.” “The clown is making balloon giraffes. No professional detective does giraffes.” A shriek rose from the garden, followed by the fizzy pop of a burst balloon and wild applause. Molly smiled grimly. “We go in as entertainers,” Bill said. “I juggle badly, you do that ventriloquist thing.” “My ventriloquist thing terrifies vicars and one horse. We need subtlety.” “Subtlety left when you suggested stealing from children.” “Borrowing,” Molly corrected. “Temporarily. Until we sell everything.” Bill peered through the slats. “Presents are stacked by the patio doors. Bright paper, ribbons, minimal adult supervision. It’s practically a charity drive for us.” Molly produced two party hats from her pocket. “Blend in, grab the loot, retreat through the side gate.” Bill took a hat reluctantly. “If anyone asks, I’m Uncle Kevin.” “You don’t look like an Uncle Kevin.” “Nobody does until destiny calls,” Bill muttered, and together they marched toward the sound of party games.
created_at
"2026-03-06T12:35:47.57Z"
3/6/2026, 12:35:47 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
434
moderation_latency
(null)
generation_time
11388
tokens_prompt
77
tokens_completion
416
native_tokens_prompt
66
native_tokens_completion
331
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00513
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_066a3d157008a40a0169aaca23a04c8196a60d42953a71b06a"
is_byok
false
latency
150
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772800547-I99KdxZUy2faXD80SmOm"
upstream_id
"resp_066a3d157008a40a0169aaca23a04c8196a60d42953a71b06a"
total_cost
0.00513
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.21%
Matches word count
n/a
neededClean
false
words
228
85.15%
Dialogue to Total Word Ratio
Ratio: 56.33%, Deviation: 6.33%
neededClean
false
wordsTotal
229
wordsDialogue
129
42.6826%