NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KKAC4CZGJ2A3VC83471SWWNT
Run Details
Model:
GPT-5.4 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
8.6s
Cost:
$0.005145
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
99.98%
Matches word count
(details)
66.12%
Dialogue to Total Word Ratio
(details)
83.0529%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Bill, this is our classiest job yet,” Molly whispered. “Silk masks, soft shoes, and a tactical knowledge of balloon animals.” “Children’s parties are chaos with frosting,” Bill said. “We slip in during the magician, fill the sacks, and vanish before the parents finish saying, ‘Who wants juice?’” “Good. I refuse to wrestle a clown again.” “That clown started it.” “You stole his unicycle.” “I borrowed his getaway vehicle. Focus. Presents by the cake table, yes?” “Under the giant banner that says HAPPY SEVENTH, which is helpful, because subtlety is dead.” “Any obstacles?” “Three mothers with hawk eyes, one grandfather who was allegedly in the navy, and a terrier in a bow tie.” “I can handle the navy. The dog worries me.” “The dog’s name is Princess Sprinkles.” “That worries me more.” “We enter as entertainers.” “Can you juggle?” “No, but I can look disappointed in people, and that passes for authority.” “I’ve got a puppet.” “Bill, your puppet is terrifying.” “Exactly. Children stare, adults apologize, and no one watches the presents.” “What about the birthday child?” “Bribed with an extra slice of cake.” “Efficient.” “When do we move?” “After the piñata.” “Why after?” “Because during the piñata, Bill, everyone is already armed.”
created_at
"2026-03-09T22:40:23.033Z"
3/9/2026, 10:40:23 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2312
moderation_latency
(null)
generation_time
8540
tokens_prompt
77
tokens_completion
504
native_tokens_prompt
66
native_tokens_completion
332
native_tokens_completion_images
0
native_tokens_reasoning
33
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.005145
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_00874fa70c45a4720169af4c571a1c8193bbde8e6745c35e2a"
is_byok
false
latency
231
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773096023-eisEbLMsrvmSTDYLWJSB"
upstream_id
"resp_00874fa70c45a4720169af4c571a1c8193bbde8e6745c35e2a"
total_cost
0.005145
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.98%
Matches word count
n/a
neededClean
false
words
202
66.12%
Dialogue to Total Word Ratio
Ratio: 98.02%, Deviation: 8.02%
neededClean
false
wordsTotal
202
wordsDialogue
198
83.0529%