NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1HXBSQDT66MT19XH8EW7AP
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
9.0s
Cost:
$0.004545
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
2.01%
Matches word count
(details)
99.15%
Dialogue to Total Word Ratio
(details)
50.5786%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Rain needled the bakery awning while Molly and Bill crouched behind a stack of wobbling milk crates, studying the village hall through a fogged pair of opera glasses that had probably once belonged to someone richer and deader. Inside, balloons bobbed against the ceiling like captive moons, and a clown in huge shoes wrestled a ribbon onto a chair with the tragic dignity of a failed admiral. Molly unfolded a hand-drawn map on her knee. It was mostly arrows, crumbs, and confidence. Her eyes glittered at the sight of a mountain of brightly wrapped gifts piled beside a banner that read HAPPY SEVENTH, OSCAR, in letters so cheerful they felt personally insulting. “Cake at three, chaos at four,” she whispered. Bill adjusted his flat cap and peered through the rain. “And presents at four-oh-one.” They watched parents troop in carrying casseroles, toddlers, and the exhausted smiles of civilians entering a war zone. The hired magician arrived next, lugging a top hat and an expression that suggested rabbits had wronged him professionally. Molly tapped the window schedule she’d memorized. When the pass-the-parcel riot began, Bill would slip in dressed as a balloon man, she as an aunt from Leeds. They’d sweep the gifts into a laundry trolley and roll them out the side door. A child inside screamed with joy. Bill grinned. “Our kind of alarm.”
created_at
"2026-03-06T12:28:14.014Z"
3/6/2026, 12:28:14 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
377
moderation_latency
(null)
generation_time
8928
tokens_prompt
77
tokens_completion
356
native_tokens_prompt
66
native_tokens_completion
292
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004545
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_05b2aad28acb91d30169aac85e103481909b6d30e48803b41d"
is_byok
false
latency
138
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772800094-adxWQYcGjjVkHgNS0NgD"
upstream_id
"resp_05b2aad28acb91d30169aac85e103481909b6d30e48803b41d"
total_cost
0.004545
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
2.01%
Matches word count
n/a
neededClean
false
words
225
99.15%
Dialogue to Total Word Ratio
Ratio: 6.96%, Deviation: 3.04%
neededClean
false
wordsTotal
230
wordsDialogue
16
50.5786%