NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM07RERCVTTR2GHMY8KA6DDF
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
11.1s
Cost:
$0.010692
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
98.71%
Matches word count
(details)
41.13%
Dialogue to Total Word Ratio
(details)
69.9201%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the bakery van, eyeing the pyramid of presents outside the community hall. Inside, the children’s party pulsed with music, frosting, and a clown whose shoes looked morally confused. Bill adjusted his fake mustache and smiled like a man about to make a terrible decision beautifully. “Plan is simple,” Molly whispered. “We enter as entertainers, load the gifts onto the trolley, and vanish before anyone notices the magician has disappeared.” Bill tapped the trolley. “I’m the magician?” “You’re whatever keeps you quiet.” He peered through the window. “You really think toddlers won’t notice two suspicious adults with a folding cart and a prayer?” “They’ll be too busy eating cake.” A crash sounded, then applause. Bill nodded. “Cake. Excellent.” “At seven, distract the parents,” Molly said. “Use your card trick.” “I don’t know a card trick.” “Then improvise.” “I hate when you say that. It sounds like a hospital bill.” Molly grinned. “Relax. We’re not robbing children.” “We’re not?” “We’re liberating gifts from an unsafe environment.” Bill straightened his bow tie. “That’s the nicest criminal thing you’ve ever said.” Molly checked her watch. “Ready?” Bill swallowed. “As ready as a man can be when the loot is glittering.” “Excellent,” she said. “Try to look festive.”
created_at
"2026-03-18T10:27:14.839Z"
3/18/2026, 10:27:14 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
10686
moderation_latency
(null)
generation_time
10901
tokens_prompt
77
tokens_completion
642
native_tokens_prompt
66
native_tokens_completion
2365
native_tokens_completion_images
0
native_tokens_reasoning
2070
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.010692
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_04b8e0ee5913c8040169ba7e02e81081969bdfb5c25c8bf264"
is_byok
false
latency
199
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829634-rjmA15E2xyjtqiCIfWrm"
upstream_id
"resp_04b8e0ee5913c8040169ba7e02e81081969bdfb5c25c8bf264"
total_cost
0.010692
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
98.71%
Matches word count
n/a
neededClean
false
words
206
41.13%
Dialogue to Total Word Ratio
Ratio: 59.71%, Deviation: 9.71%
neededClean
false
wordsTotal
206
wordsDialogue
123
69.9201%