NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1H7MW423R663GG6XP50971
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
10.2s
Cost:
$0.004440
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
60.28%
Matches word count
(details)
34.11%
Dialogue to Total Word Ratio
(details)
47.1950%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the garden shed, peering through a crack in the fence at the battlefield of pastel balloons, wobbling paper hats, and one alarming clown arranging cupcakes like land mines. The children shrieked with the ecstatic lawlessness of tiny tyrants. At the center of the lawn stood the prize: a banquet table stacked with presents in glossy towers, every ribbon catching the sun like a dare. Bill adjusted his black turtleneck, which in daylight made him look less like a master criminal and more like a substitute drama teacher in distress. Beside him lay their equipment: a coil of rope, a torch, two pillowcases, and a fake pony head Molly insisted was “for blending in.” “Tell me again,” Bill whispered, “why we’re robbing a six-year-old?” “Because,” Molly said, checking her watch with professional gravity, “children have no concept of perimeter security.” A magician stumbled past the hedge, sneezing glitter. Bill flinched. “We go in during the cake song, grab the lot, and vanish through the bouncy castle.” “That sentence contains three ways to die.” Molly grinned and pulled the pony head over Bill’s face. Through the eyeholes he watched a child in a pirate costume body-slam another near the jelly. For a moment, even he had to admit it: chaos was the perfect accomplice.
created_at
"2026-03-06T12:16:22.413Z"
3/6/2026, 12:16:22 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
405
moderation_latency
(null)
generation_time
10141
tokens_prompt
77
tokens_completion
351
native_tokens_prompt
66
native_tokens_completion
285
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00444
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0ef38fe52d0facfa0169aac59677f48195a4eff7d27456d057"
is_byok
false
latency
119
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772799382-uMXuEUNYAuKUJvtJzlC4"
upstream_id
"resp_0ef38fe52d0facfa0169aac59677f48195a4eff7d27456d057"
total_cost
0.00444
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
60.28%
Matches word count
n/a
neededClean
false
words
215
34.11%
Dialogue to Total Word Ratio
Ratio: 20.18%, Deviation: 10.18%
neededClean
false
wordsTotal
218
wordsDialogue
44
47.1950%