NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM086NNKMGQYZK55621JKDNP
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
43.7s
Cost:
$0.043304
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
98.51%
Dialogue to Total Word Ratio
(details)
99.2553%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
At dusk, the Higgins’ backyard flickered with balloons and joy. “Those presents are ripe for the taking, Bill, surely, tonight.” Molly crouched behind a hedge, polishing her lock picks slowly. “I hate children’s parties; they smell like sugar and subpoenas.” Bill adjusted his bowtie, which featured a smiling cartoon raccoon. “Stay calm, and remember: we are entertainers, not criminals, tonight.” From the house came shrieks, a kazoo solo, and disputes. “Good. Chaos means adults stop counting gift bags, usually, altogether.” Molly handed him a sack big enough to swallow optimism. “And the children? We leave them the wrapping paper kindly.” Bill grinned, picturing the getaway cart filled with teddy bears. “You do love your souvenirs. That’s why I keep you.” A trumpet blasted somewhere inside; a parent yelled for frosting. “Focus, Bill. After this, we can buy better disguises, finally.” They tiptoed past a cupcake tower guarded by one dog. “If this goes badly, I was never your accomplice, officially.” Molly reached the gift table, where silver paper glittered invitingly. “Pick the biggest boxes. If we’re lucky, they’re toy trains.” At the back gate, thieves froze as the clown sneezed. “Run, Molly. I just noticed the clown is your cousin.”
created_at
"2026-03-18T10:35:00.67Z"
3/18/2026, 10:35:00 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
43179
moderation_latency
(null)
generation_time
43473
tokens_prompt
77
tokens_completion
1433
native_tokens_prompt
66
native_tokens_completion
9612
native_tokens_completion_images
0
native_tokens_reasoning
9322
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0433035
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_09ddf5d093f4affc0169ba7fd4bbac81938b9edc0d2aced484"
is_byok
false
latency
196
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830100-WTeNnaoslPslMgNgkXOt"
upstream_id
"resp_09ddf5d093f4affc0169ba7fd4bbac81938b9edc0d2aced484"
total_cost
0.0433035
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
200
98.51%
Dialogue to Total Word Ratio
Ratio: 53.50%, Deviation: 3.50%
neededClean
false
wordsTotal
200
wordsDialogue
107
99.2553%