NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0AABBZGQK51H8RVS3VS4V6
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
3.0s
Cost:
$0.001935
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
14.30%
Matches word count
(details)
51.82%
Dialogue to Total Word Ratio
(details)
33.0600%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly pressed her ear to the bakery door. “Tell me again why we’re robbing a children’s party.” Bill adjusted his tie, which was too shiny to be trustworthy. “Because, darling, the loot is colorful, plentiful, and guarded by a man in a raccoon costume.” “Ah,” Molly said. “Professional security.” “Exactly. Also, nobody expects thieves to be sentimental monsters.” “I’m not sentimental.” “You stole a cupcake from a charity gala and apologized to it.” “That cupcake looked lonely.” Bill lowered his voice. “Focus. The presents are stacked beside the bounce castle.” “Bounce castle?” “Inflatable fortress of opportunity.” Molly grinned. “You do make crime sound festive.” “I contain multitudes.” “What’s the plan?” “We enter as entertainers.” “Entertainers?” “Yes. You bring the accordion.” “I don’t own an accordion.” “You do now. I rented one under an alias.” Molly blinked. “What alias?” “Mr. Fun.” “That’s appalling.” “Thank you. We mingle, distract the parents, and when the magician makes the rabbit appear, we make the presents disappear.” “And if a child asks where the gifts went?” Bill shrugged. “We’ll say they were taken by elves.” Molly laughed. “That is the worst lie I’ve ever heard.” “Which is why it will work.” She slipped on her gloves. “Fine. But if there’s cake, we steal the cake too.” Bill opened the door. “Molly, that’s why I adore you.”
created_at
"2026-03-18T11:11:58.346Z"
3/18/2026, 11:11:58 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1593
moderation_latency
(null)
generation_time
2819
tokens_prompt
77
tokens_completion
439
native_tokens_prompt
66
native_tokens_completion
419
native_tokens_completion_images
0
native_tokens_reasoning
93
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.001935
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_033a8727f7815d5f0169ba887e69c48190a9357e9f67da02aa"
is_byok
false
latency
232
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832318-PCU8UobelTTh98xqmzZ9"
upstream_id
"resp_033a8727f7815d5f0169ba887e69c48190a9357e9f67da02aa"
total_cost
0.001935
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
14.30%
Matches word count
n/a
neededClean
false
words
221
51.82%
Dialogue to Total Word Ratio
Ratio: 81.00%, Deviation: 9.00%
neededClean
false
wordsTotal
221
wordsDialogue
179
33.0600%