NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM07C1KF0SSNX9ATE77WT9AZ
Run Details
Model:
GPT-5.4 Mini
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
3.6s
Cost:
$0.001422
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
75.16%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
37.5781%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedges, peering at the rented manor where balloons bumped against the windows like nervous ghosts. “That’s the place,” she whispered. “The children are inside, shrieking with sugar and trust.” Bill adjusted his gloves and frowned at the paper crown on his head. “I still don’t like the crown,” he said. “It makes me look festive.” “Good,” Molly replied. “Festive people are ignored.” From the garden, came the wobble of a bouncy castle, the tinny warble of a clown’s trumpet, and the blessed clatter of adults pretending this was fun. Through the open patio doors, towers of wrapped gifts gleamed under fairy lights, each one tagged with a child’s name in glittery marker. Bill’s eyes widened. “That’s a lot of loot for a room full of people who can’t spell ‘tomorrow.’” Molly smiled. “We go in as entertainers.” She lifted a cheap accordion from the grass. “You’re the magician’s assistant.” Bill stared. “I’m not wearing tights.” “Then be the magician.” A burst of laughter rolled across the lawn as the clown slipped on a cupcake and performed a perfect accidental somersault. “Now,” Molly breathed. They straightened their fake mustaches, smoothed their borrowed vests, and walked toward the party, looking like the worst surprise in the middle of the best birthday.
created_at
"2026-03-18T10:20:28.156Z"
3/18/2026, 10:20:28 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1773
moderation_latency
(null)
generation_time
3416
tokens_prompt
77
tokens_completion
357
native_tokens_prompt
66
native_tokens_completion
305
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.001422
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_0ce048687549c8560169ba7c6c37148193bd086fe116e87e4a"
is_byok
false
latency
1547
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829228-veDwcXAr0HWUbfPv7Njg"
upstream_id
"resp_0ce048687549c8560169ba7c6c37148193bd086fe116e87e4a"
total_cost
0.001422
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
75.16%
Matches word count
n/a
neededClean
false
words
213
0.00%
Dialogue to Total Word Ratio
Ratio: 28.64%, Deviation: 18.64%
neededClean
false
wordsTotal
213
wordsDialogue
61
37.5781%