NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM079AMNMQE572G03ZDKMKER
Run Details
Model:
GPT-5.4 Nano (Reasoning)
Test:
Dialogue tags
Scenario:
Write unattributed dialogue
Duration:
2.3s
Cost:
$0.000391
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
1.11%
Count dialogue tags
(details)
Model input
User/Human
You are an expert novel writer in the cozy mystery genre. Write a scene of about 200 words, with dialogue between two characters. The dialogue should be unattributed, meaning the speaker is not explicitly mentioned. The scene should be set in a small town bakery. Only return the story, nothing else.
Model response
Messages
Meta
AI
The bell over the door gave a soft jingle as a tray of cinnamon rolls steamed up between shelves of jars. Marlowe paused with a frosting knife in hand, watching the clock like it might confess. “Did you hear about the mayor’s fundraiser?”
a voice asked
from behind the counter, too casual for the way their eyes kept darting to the glass display. “No one’s supposed to be cutting into the good stuff until tomorrow,”
Marlowe said
, sliding a loaf into place with unnecessary care. “Yet here we are.” The other person leaned closer, lowering their voice. “Someone left a crumb trail from the back door to the pantry. Like they wanted to be noticed.” Marlowe’s smile stayed put. “Or like they wanted us to look in the wrong place.” “You think it’s the flour?”
Lila asked
, grabbing a paper bag as if it could provide answers. “Because that’s what everybody’s saying. That someone tampered with the supply.” “Flour doesn’t disappear on its own.” Marlowe wiped a smear off the counter that wasn’t there. “And cinnamon rolls don’t go missing without hands.” Lila’s gaze dropped to the empty corner where the mayor’s special batch should’ve been. “Then whose hands were in our kitchen?” Marlowe turned, inspecting the oven mitt hanging by the door—freshly twisted, not yesterday’s fold. “The ones that knew exactly when we’d be busy.”
created_at
"2026-03-18T10:18:59.37Z"
3/18/2026, 10:18:59 AM
model
"openai/gpt-5.4-nano-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
475
moderation_latency
(null)
generation_time
1629
tokens_prompt
81
tokens_completion
374
native_tokens_prompt
67
native_tokens_completion
302
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0003909
router
(null)
provider_responses
0
endpoint_id
"0c835f2e-c18d-4e8c-b245-e1e3bd08b97f"
id
"resp_013444b5f6d28e190169ba7c13aec48197a2ebf99b0165b8ec"
is_byok
false
latency
197
model_permaslug
"openai/gpt-5.4-nano-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829139-TF4Grvb7Ta6zdYeD0tBX"
upstream_id
"resp_013444b5f6d28e190169ba7c13aec48197a2ebf99b0165b8ec"
total_cost
0.0003909
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
1.11%
Count dialogue tags
n/a
neededClean
false
dialogueTags
0
"a voice asked"
1
"Marlowe said"
2
"Lila asked"