NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM07DQKMHCFAT1G1GSJKQ3VM
Run Details
Model:
GPT-5.4 Mini
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
2.2s
Cost:
$0.001399
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
14.30%
Matches word count
(details)
62.16%
Dialogue to Total Word Ratio
(details)
38.2325%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge with a stolen map of the cul-de-sac, peering at the bright bungalow where balloons bobbed over the roof like cheerful warnings. Inside, a children’s party was in full shriek, all sugar, confetti, and the tinny thump of a cartoon soundtrack. Bill adjusted his black gloves and tried not to sneeze at the smell of hot chips drifting from next door. “Cake room is in back,” Molly whispered. “Presents by the fireplace.” Bill nodded, though he was staring at the garden gnome in a paper crown. “This feels morally complicated.” “It’s robbery, Bill. It should.” A child burst out the rear door, sticky-faced and waving a plastic sword. They flattened themselves in the hydrangeas as the child charged past yelling about dragons and ketchup. From inside came a chorus of “Happy birthday!” followed by a squeal that suggested the pony had arrived. Bill swallowed. “What if they see us?” Molly pulled a roll of wrapping paper from her bag and grinned. “Then we say we’re the entertainment.” “Brilliant,” he said. “What act are we?” “The silent, highly motivated sort.” She slipped toward the gate, and Bill followed, trying to look like a responsible uncle. Behind the frosted window, a mountain of presents waited beneath a banner reading PARTY TIME, as if daring them to improve the atmosphere.
created_at
"2026-03-18T10:21:23.457Z"
3/18/2026, 10:21:23 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
440
moderation_latency
(null)
generation_time
2086
tokens_prompt
77
tokens_completion
363
native_tokens_prompt
66
native_tokens_completion
300
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0013995
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0e3623934b39a5ac0169ba7ca3826881909530d4f16d244329"
is_byok
false
latency
218
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829283-E23Ux4Fmd7iFgSU73RsA"
upstream_id
"resp_0e3623934b39a5ac0169ba7ca3826881909530d4f16d244329"
total_cost
0.0013995
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
14.30%
Matches word count
n/a
neededClean
false
words
221
62.16%
Dialogue to Total Word Ratio
Ratio: 18.30%, Deviation: 8.30%
neededClean
false
wordsTotal
224
wordsDialogue
41
38.2325%