NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0A9RBVZDEBMHA81FZVZMHA
Run Details
Model:
GPT-5.4 Mini (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write unattributed dialogue
Duration:
2.1s
Cost:
$0.001472
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
13.53%
Count dialogue tags
(details)
Model input
User/Human
You are an expert novel writer in the cozy mystery genre. Write a scene of about 200 words, with dialogue between two characters. The dialogue should be unattributed, meaning the speaker is not explicitly mentioned. The scene should be set in a small town bakery. Only return the story, nothing else.
Model response
Messages
Meta
AI
The bell over the bakery door gave its cheerful jingle just as Nora slid a tray of lemon scones into the display case. Outside, Maple Street was damp from an early drizzle, and the windows were fogged with the scent of cinnamon and sugar. “You’re baking before sunrise again?” “I’m trying to convince myself this is a charming lifestyle choice.” “It is charming. Exhausting, but charming.” Nora laughed and wiped flour from her cheek with the back of her hand. “You say that because you only have to eat the results.” “Someone has to do the important work.” The familiar voice belonged to Eli Mercer, the town librarian, who was standing by the register with a paper bag in one hand and a suspiciously curious expression in the other. “Coffee cake?”
he asked
. “Blueberry streusel. Still warm.” “Dangerous.” “Only to your willpower.” He opened the bag and peered inside. “Mrs. Wainwright was in here yesterday.” Nora paused. “Was she?” “She mentioned seeing something odd behind the bakery last night.” The warm hum of the kitchen seemed to thin. “Odd how?” “Like someone trying your back door.” Nora looked toward the little pane of glass in the rear hallway, where rain tapped softly against the frame. “Well,”
she said
, reaching for the rolling pin, “then I suppose today’s special comes with a little mystery.”
created_at
"2026-03-18T11:11:38.886Z"
3/18/2026, 11:11:38 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
767
moderation_latency
(null)
generation_time
1925
tokens_prompt
81
tokens_completion
385
native_tokens_prompt
67
native_tokens_completion
316
native_tokens_completion_images
0
native_tokens_reasoning
22
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00147225
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_03ec5b61e5190ce60169ba886af2348194af46b10c150c2dd8"
is_byok
false
latency
206
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832298-y1NnVEkjaHVVJRIhu2Lz"
upstream_id
"resp_03ec5b61e5190ce60169ba886af2348194af46b10c150c2dd8"
total_cost
0.00147225
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
13.53%
Count dialogue tags
n/a
neededClean
false
dialogueTags
0
"he asked"
1
"she said"