NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KKABVYETB3ARN1X4VQ0JW0DP
Run Details
Model:
GPT-5.4 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
14.9s
Cost:
$0.006825
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
75.16%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
87.5778%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Bill, stop calling it a burglary,” said Molly. “It’s a gift redistribution event.” “Fine,” Bill said. “A festive, highly targeted redistribution event involving balloons.” “And cake,” Molly said. “Never underestimate cake. Parents get sentimental around sheet cake.” “I’ve got the blueprint,” Bill said. “Bouncy castle here, magician here, distracted adults clustered near the coffee urn like exhausted wildebeest.” “Excellent. Entry?” “We pose as entertainers.” Molly sighed. “You juggle one orange and a grievance.” “I also do a convincing pirate voice.” “This is a princess party.” “Then I’m a princess with range.” “No. We go in as the face painters.” “Can you paint faces?” “I can paint panic. Close enough.” Bill leaned over the table. “Gift table by the window. Big haul. Dolls, dinosaurs, suspiciously expensive electronics from competitive uncles.” “Exit route?” “Through the kitchen, past the uncle with the camera, over the garden wall.” “Over?” “Through, if it’s the Hendersons. Their fence gives up easily.” “What about the children?” Molly asked. “We create a diversion.” “No fake fire.” “I was going to say treasure hunt.” “That’s better.” “I hide chocolate coins.” “You mean steal chocolate coins.” “I’m trying to stay in theme.” Molly grinned. “Masks?” “Animal balloons.” “Vehicle?” “Ice cream van.” “Bill, that’s brilliant.” “I know,” Bill said. “Crime is easier with sprinkles.”
created_at
"2026-03-09T22:35:46.243Z"
3/9/2026, 10:35:46 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
3588
moderation_latency
(null)
generation_time
14520
tokens_prompt
77
tokens_completion
465
native_tokens_prompt
66
native_tokens_completion
444
native_tokens_completion_images
0
native_tokens_reasoning
123
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.006825
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0c49f3ea0cbacb6b0169af4b42690881959c46ddac58f3c6f2"
is_byok
false
latency
350
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773095746-5Vb8cMP1fU1sLD2aGRiY"
upstream_id
"resp_0c49f3ea0cbacb6b0169af4b42690881959c46ddac58f3c6f2"
total_cost
0.006825
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
75.16%
Matches word count
n/a
neededClean
false
words
213
100.00%
Dialogue to Total Word Ratio
Ratio: 90.14%, Deviation: 0.14%
neededClean
false
wordsTotal
213
wordsDialogue
192
87.5778%