NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KK9QCPBDHB8WH7YWAB9VN56P
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
39.6s
Cost:
$0.038895
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
100.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the inflatable castle, notebook open, goggles fogged by strawberry-scented machine smoke. The garden beyond twinkled with paper lanterns, melted icing, and the doomed confidence of adults who thought a hired magician could control twelve sugar-loaded children. Bill lay beside her in the petunias, dressed in burglar black except for a glittery party hat he had borrowed from the gift table and called tactical disguise. On the patio, a mountain of presents towered beside the cake like a brightly wrapped vault protected by juice boxes and one suspicious golden retriever. She reviewed the operation with military solemnity. Enter during the rabbit trick. Cross behind the piñata. Distract the dog with cocktail sausages. Load everything into the red wagon parked near the hedge. Exit through the side gate before any parent remembered how to count. “Bill, plan only: gifts, gate, wagon, gone before cake appears.” “Molly, if balloons attack, I’m surrendering and taking the clown.” Inside, children shrieked, a whistle blew, and something sticky hit a window. Molly smiled. At last, chaos was providing professional cover. Nearby, a toddler in a pirate cape marched past the hedge, saluted the darkness, and announced mutiny to a baffled lawn flamingo twice.
created_at
"2026-03-09T16:37:54.677Z"
3/9/2026, 4:37:54 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
37148
moderation_latency
(null)
generation_time
39500
tokens_prompt
77
tokens_completion
868
native_tokens_prompt
66
native_tokens_completion
2582
native_tokens_completion_images
0
native_tokens_reasoning
2310
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.038895
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0d1575656cfe007f0169aef762be2c8197a455db1b6958d08b"
is_byok
false
latency
176
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773074274-SoBQZH3MYFmGRVU8qD6V"
upstream_id
"resp_0d1575656cfe007f0169aef762be2c8197a455db1b6958d08b"
total_cost
0.038895
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
200
100.00%
Dialogue to Total Word Ratio
Ratio: 9.90%, Deviation: 0.10%
neededClean
false
wordsTotal
202
wordsDialogue
20
100.0000%