NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM093RWKB3RR4ZMMMFH1VTTG
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
13.2s
Cost:
$0.007331
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
99.92%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.9596%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
By dusk, Molly and Bill crouched behind the gazebo, watching the children’s party wobble under balloons, streamers, and an alarming amount of frosting. Somewhere inside the rented hall, a dozen presents sat in glittering piles, each wrapped with the kind of optimism only adults and toddlers could afford. Molly adjusted her black gloves, which were less stealthy than squeaky. “Simple job,” she whispered. “We slip in, charm the adults, and vanish with the loot.” Bill peered through the crack in the curtain. “Loot? Molly, they’re six-year-olds. We are robbing crayon-sized billionaires.” “Exactly,” she said. “Small hands, big gifts.” A clown near the punch bowl honked sadly at nobody in particular. Bill flinched. “I hate how festive this is.” Molly nodded toward the gift table, where a mountain of boxes glittered like a dragon’s idea of tax evasion. “Focus. We take the presents, exit through the kitchen, and no one notices.” Just then, a child shrieked with joy over a talking dinosaur toy, and both thieves froze. Bill sighed. “If that thing starts investigating, I’m confessing everything.” Molly grinned. “Relax. We’re professionals.” From inside came the squeal of more laughter, more tape ripping, more innocently terrible generosity. Molly squared shoulders. Tonight belonged to them.
created_at
"2026-03-18T10:50:54.24Z"
3/18/2026, 10:50:54 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
12653
moderation_latency
(null)
generation_time
12890
tokens_prompt
77
tokens_completion
879
native_tokens_prompt
66
native_tokens_completion
1618
native_tokens_completion_images
0
native_tokens_reasoning
1321
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0073305
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0b1b31c50629fa800169ba838e4c808190ae7a6121282a5522"
is_byok
false
latency
181
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773831054-55x4bx5QjdM4kudnTdYg"
upstream_id
"resp_0b1b31c50629fa800169ba838e4c808190ae7a6121282a5522"
total_cost
0.0073305
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.92%
Matches word count
n/a
neededClean
false
words
203
0.00%
Dialogue to Total Word Ratio
Ratio: 29.13%, Deviation: 19.13%
neededClean
false
wordsTotal
206
wordsDialogue
60
49.9596%