NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM082DYSK46KYSBEWBYC8MQ7
Run Details
Model:
GPT-5.4 Mini (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
2.2s
Cost:
$0.001476
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
86.38%
Matches word count
(details)
0.01%
Dialogue to Total Word Ratio
(details)
43.1945%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the bakery van, watching the suburban garden glow with paper lanterns and unreasonable optimism. Inside, a banner declared HAPPY BIRTHDAY, and somewhere a trumpet of children’s laughter rose like a badly tuned alarm. Bill adjusted his black gloves, which were mostly for effect. He was also wearing a clown nose, because he believed in commitment. “Tell me again why we’re doing this,” he whispered. “Because,” Molly said, peering through the hedge, “the presents are unguarded, the adults are distracted, and the host hired a magician who can’t even keep a rabbit from escaping.” Bill nodded toward the pile of wrapped boxes near the cake table. A bicycle helmet. A tower of books. At least one suspiciously large parcel shaped like a spaceship. “That one’s mine,” he said. “You don’t know that.” “I can feel it.” A child squealed as a balloon popped, and every parent turned simultaneously toward the disaster. Molly smirked. “That’s our cue.” “And if they ask why we’re here?” She lifted a clipboard. “Party entertainment survey. We’re from the council.” Bill grinned, revealing a missing tooth and dangerous confidence. Together they marched through the gate, blending into the chaos with the ease of men who had once stolen a wedding cake and blamed the weather.
created_at
"2026-03-18T10:32:41.702Z"
3/18/2026, 10:32:41 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1042
moderation_latency
(null)
generation_time
2030
tokens_prompt
77
tokens_completion
358
native_tokens_prompt
66
native_tokens_completion
317
native_tokens_completion_images
0
native_tokens_reasoning
36
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.001476
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_087b3b88df9e62b30169ba7f49c1448195ad60ff002258f489"
is_byok
false
latency
176
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829961-KfyXu829xSsgJcG8sr1C"
upstream_id
"resp_087b3b88df9e62b30169ba7f49c1448195ad60ff002258f489"
total_cost
0.001476
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
86.38%
Matches word count
n/a
neededClean
false
words
211
0.01%
Dialogue to Total Word Ratio
Ratio: 27.49%, Deviation: 17.49%
neededClean
false
wordsTotal
211
wordsDialogue
58
43.1945%