NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KKAC3VV8YFB2PC8R2RZWMB37
Run Details
Model:
GPT-5.4 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
34.6s
Cost:
$0.040005
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
99.90%
Dialogue to Total Word Ratio
(details)
99.9515%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly studied the bungalow from behind a hedge shaped like a rabbit. Balloons bobbed above the fence, and the shrieks of seven-year-olds rose and fell like a fire alarm. On the patio, a clown made a balloon sword, then accidentally dueled himself with it. Bill adjusted his cap and peered through binoculars he’d stolen from a whale-watching tour. “Presents by the cake, three big boxes, red bows, easy lift,” Molly whispered. “Excellent. Distract clown, dodge dove, avoid piñata, and absolutely no mingling with toddlers,” Bill said. She rolled out a crayon-colored blueprint on an upturned wheelbarrow. The route was simple: through the side gate, across the bouncy castle during Happy Birthday, then a left past the magician and his suspicious dove. Bill’s job was distraction. He had dressed as a parent, which mainly meant wearing beige and looking tired. Molly, in a giant cupcake costume, would blend in among the hired nonsense. A small dog in a party hat trotted over carrying a streamer and growled. Bill froze. Molly flicked him a sausage roll from her pocket, and the beast accepted the bribe with the solemnity of a customs officer. Beyond the hedge, candles were lit. It was time now.
created_at
"2026-03-09T22:40:05.49Z"
3/9/2026, 10:40:05 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
32363
moderation_latency
(null)
generation_time
34470
tokens_prompt
77
tokens_completion
1080
native_tokens_prompt
66
native_tokens_completion
2656
native_tokens_completion_images
0
native_tokens_reasoning
2383
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.040005
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_087b1bb098598a800169af4c458de08196befc7eb61d28f40c"
is_byok
false
latency
190
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773096005-4L72GgRcrwNaoTVxuu0r"
upstream_id
"resp_087b1bb098598a800169af4c458de08196befc7eb61d28f40c"
total_cost
0.040005
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
200
99.90%
Dialogue to Total Word Ratio
Ratio: 11.76%, Deviation: 1.76%
neededClean
false
wordsTotal
204
wordsDialogue
24
99.9515%