NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KKABYJTWJ9SRDB8C9GDGDFBG
Run Details
Model:
GPT-5.4 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
10.1s
Cost:
$0.006045
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
51.93%
Matches word count
(details)
79.25%
Dialogue to Total Word Ratio
(details)
65.5875%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the inflatable castle, wearing a caterer’s apron over black burglary clothes. Bill knelt beside a hedge, clutching a clipboard and a balloon pump like they were professional equipment. “Remember,” Molly whispered, “we’re not robbing children. We’re redistributing unappreciated inventory.” Bill peered at the garden. “Inventory wrapped in dinosaurs and glitter. Morally, it’s very confusing.” “The key is confidence. We stroll in, smile at a grandmother, and locate the present table.” “I can smile,” Bill said. “At grandmothers, less so. They can smell panic and discount aftershave.” A shriek erupted from the trampoline. A clown on stilts wandered past the gate, smoking. Molly adjusted her apron. “The clown is our window. No one watches presents when a tall man makes balloon swords.” Bill frowned. “I hate this plan.” “You hate all plans.” “I liked the tunnel under the aquarium.” “That was because it involved a sandwich van.” “An essential component of strategy,” Bill muttered. “What if they ask whose child we belong to?” Molly grinned. “We say we’re with Kevin.” “There’s always a Kevin.” “Exactly.” They slipped through the gate, nodding like exhausted parents. The gift table gleamed beneath a banner reading HAPPY SEVENTH, OLIVER. Bill lifted a parcel. “Heavy. Educational, probably.” Molly filled the sack. “Move quickly. Nothing inspires pursuit like stolen Lego.”
created_at
"2026-03-09T22:37:12.423Z"
3/9/2026, 10:37:12 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
3588
moderation_latency
(null)
generation_time
9992
tokens_prompt
77
tokens_completion
538
native_tokens_prompt
66
native_tokens_completion
392
native_tokens_completion_images
0
native_tokens_reasoning
74
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.006045
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0abe2bd57d2e37f10169af4b9881e48190b5c7072c4fc0a593"
is_byok
false
latency
170
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773095832-l83nVCX7FpSf9z9ONn03"
upstream_id
"resp_0abe2bd57d2e37f10169af4b9881e48190b5c7072c4fc0a593"
total_cost
0.006045
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
51.93%
Matches word count
n/a
neededClean
false
words
216
79.25%
Dialogue to Total Word Ratio
Ratio: 56.94%, Deviation: 6.94%
neededClean
false
wordsTotal
216
wordsDialogue
123
65.5875%