NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1HZY6A97ACKHS1680GFFK0
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
9.9s
Cost:
$0.004755
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
2.01%
Matches word count
(details)
90.67%
Dialogue to Total Word Ratio
(details)
46.3394%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill crouched behind the hedge of number twelve, staring at a garden gone rabid with bunting. Balloons bobbed like surveillance devices. A papier-mâché dinosaur guarded the gate with the dead-eyed confidence of hired security. Beyond it, children in cone hats shrieked around a trestle table sagging under cakes, juice cartons, and a mountainous pile of presents wrapped in enough glitter paper to blind an honest man. Bill adjusted his flat cap and studied the scene with criminal gravity. “Tiny guests, big haul.” “Focus,” Molly muttered. “Parents are the real police.” They had robbed jewellers, a racecourse, and once, disastrously, a magician, but this felt riskier. Children were erratic. They ran in packs. They asked direct questions. Worse, they noticed things. Molly, in a cardigan stolen from a lost property box, had decided they would pass as entertainers. Bill, wearing a crimson clown bow tie that made him look like a depressed tomato salesman, had argued for plumbers, then vicars, then surrender. A woman emerged carrying a tray of sausage rolls, and both thieves ducked so sharply the hedge shivered. Somewhere inside the garden, a little voice began crying over a burst balloon with the despair of opera. Molly grinned. Chaos: the old friend. By the slide, the present mountain gleamed, unattended for one glorious second, like treasure left foolishly in a dragon’s lunchroom.
created_at
"2026-03-06T12:29:38.386Z"
3/6/2026, 12:29:38 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
401
moderation_latency
(null)
generation_time
9823
tokens_prompt
77
tokens_completion
367
native_tokens_prompt
66
native_tokens_completion
306
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004755
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_071970a2e0a7a47a0169aac8b2720c8195858ee688bc233de8"
is_byok
false
latency
141
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772800178-SpGlZuWfP8uHZG4VYhiP"
upstream_id
"resp_071970a2e0a7a47a0169aac8b2720c8195858ee688bc233de8"
total_cost
0.004755
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
2.01%
Matches word count
n/a
neededClean
false
words
225
90.67%
Dialogue to Total Word Ratio
Ratio: 4.41%, Deviation: 5.59%
neededClean
false
wordsTotal
227
wordsDialogue
10
46.3394%