NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KKAC0SKNB3C24SPEX8SCDGGT
Run Details
Model:
GPT-5.4 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
26.4s
Cost:
$0.030495
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
99.9995%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly studied the suburban fortress from behind a hedge clipped into the shape of a dinosaur with asthma. Balloons bobbed over the fence, a hired magician unloaded rabbits, and somewhere inside, a chorus of sugar-fueled children screamed with the confidence of tiny emperors. Bill adjusted his velvet blazer, which made him look less like a master thief and more like a disappointed cruise singer. Their target was not jewels, art, or state secrets, but a mountain of brightly wrapped presents stacked beside the cake table like tribute to a very short king. "Bill, masks on. Smile cheerful like uncles, not burglars." "Molly, I can juggle. Children trust jugglers. Adults fear taxes already." He produced three oranges from his pockets and began practicing with the solemnity of a priest blessing produce. Molly checked the timetable: clown at two, cake at three, sugar crash at four. During Happy Birthday, they would drift to the gift table, load sacks disguised as pony piƱatas, and vanish through the side gate before anyone noticed missing loot or suspiciously mobile ponies. It was, Molly thought, a flawless operation, provided Bill stopped dropping oranges on his shoes and the magician didn't recognize them from the incident in Brighton.
created_at
"2026-03-09T22:38:24.897Z"
3/9/2026, 10:38:24 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
24777
moderation_latency
(null)
generation_time
26223
tokens_prompt
77
tokens_completion
696
native_tokens_prompt
66
native_tokens_completion
2022
native_tokens_completion_images
0
native_tokens_reasoning
1764
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.030495
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_088eae72240d48490169af4be0fd888195bcaffee761e7832d"
is_byok
false
latency
179
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773095904-RIIZsIcCvuSbOnLDTfxs"
upstream_id
"resp_088eae72240d48490169af4be0fd888195bcaffee761e7832d"
total_cost
0.030495
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
201
100.00%
Dialogue to Total Word Ratio
Ratio: 9.90%, Deviation: 0.10%
neededClean
false
wordsTotal
202
wordsDialogue
20
99.9995%