NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM087GV1ZN6Z60745450GK2B
Run Details
Model:
GPT-5.4 Mini (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
2.3s
Cost:
$0.001611
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
97.63%
Matches word count
(details)
0.46%
Dialogue to Total Word Ratio
(details)
49.0415%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the bakery dumpster, consulting a grease-stained map of the cul-de-sac while Bill adjusted his bow tie in the reflection of a spoon. From the street came the muffled shrieks of a children’s party, punctuated by balloons popping like tiny gunshots and the desperate trumpeting of a clown horn. “The presents are in the sunroom,” Molly whispered. “Past the cupcakes, left of the bouncy castle, under the banner that says BRAVO, BIRTHDAY CHAMP.” Bill squinted. “That banner is humiliating.” Molly slid a glittery headband over her cap. “We’re professionals.” “We’re dressed as magicians,” Bill said. “Exactly. No one suspects the man in velvet.” They waited until the parents turned toward the karaoke machine, where a man in a pirate hat was singing a lovesick version of The Wheels on the Bus. Molly nodded. Bill produced a clipboard, a fake smile, and the confidence of a raccoon with a grant proposal. They crossed the lawn, accepted two sticky paper plates, and vanished into the house with the solemnity of museum curators. Inside, the presents stood stacked like a colorful fortress: rockets, dolls, a bicycle with streamers. Bill rubbed his hands together. “Shall we?” Molly grinned. “Take only the wrapped ones. It’s rude to steal practical gifts.”
created_at
"2026-03-18T10:35:28.491Z"
3/18/2026, 10:35:28 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1512
moderation_latency
(null)
generation_time
2131
tokens_prompt
77
tokens_completion
338
native_tokens_prompt
66
native_tokens_completion
347
native_tokens_completion_images
0
native_tokens_reasoning
55
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.001611
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_07abf4804ad3bbaa0169ba7ff08ca4819489bd84149a10743e"
is_byok
false
latency
199
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830128-XfaauA5kjogfpTyKBS0y"
upstream_id
"resp_07abf4804ad3bbaa0169ba7ff08ca4819489bd84149a10743e"
total_cost
0.001611
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
97.63%
Matches word count
n/a
neededClean
false
words
207
0.46%
Dialogue to Total Word Ratio
Ratio: 25.24%, Deviation: 15.24%
neededClean
false
wordsTotal
210
wordsDialogue
53
49.0415%