NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1GZ906AS0ENRZ0V8DS18JE
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
9.1s
Cost:
$0.004860
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
86.38%
Matches word count
(details)
36.79%
Dialogue to Total Word Ratio
(details)
61.5841%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Molly, tell me again why we’re robbing a children’s party.” “Because, Bill, nobody suspects two adults in paper hats.” “I suspect us, and I’m one of us.” “Relax. It’s elegant. Balloons, chaos, weak perimeter security.” “It’s a bouncy castle, not Fort Knox.” “Exactly. Their guard is down. Emotionally and architecturally.” “And we’re stealing presents from children.” “We’re liberating boxed goods from minors with no resale discipline.” “That is the grimmest sentence I’ve heard all week.” “Listen. You go in as the magician.” “I don’t do magic.” “You know three card tricks.” “I know one card trick, and it ends with me losing a card.” “Perfect. Low expectations are camouflage. I’ll be the face painter.” “You can’t paint faces.” “I can paint tigers badly. Children love tigers badly.” “What about the parents?” “Distractible. Give them coffee, mention school admissions, they’ll turn on each other.” “And when do we take the presents?” “During the cake. All eyes on sugar and open flame.” “What if the birthday child notices?” “We compliment the child’s crown, create a false sense of monarchy.” “That’s your plan? Flattery and sponge cake?” “Bill, every empire falls to flattery and sponge cake.” “I hate how convincing that sounds.” “Good. Now practice your magician voice.” “Abra-cadaver?” “No. Less felony, more whimsy.”
created_at
"2026-03-06T12:11:48.11Z"
3/6/2026, 12:11:48 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
410
moderation_latency
(null)
generation_time
9027
tokens_prompt
77
tokens_completion
432
native_tokens_prompt
66
native_tokens_completion
313
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00486
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0ccb276feac2c68e0169aac4842e9c81909bcca76d42fbd7bd"
is_byok
false
latency
159
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772799108-z3u4E3182A7iFGnFD3IK"
upstream_id
"resp_0ccb276feac2c68e0169aac4842e9c81909bcca76d42fbd7bd"
total_cost
0.00486
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
86.38%
Matches word count
n/a
neededClean
false
words
211
36.79%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
211
wordsDialogue
211
61.5841%