NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1HJEK38JVWGA9WBF68M1Y8
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
25.1s
Cost:
$0.013665
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.36%
Dialogue to Total Word Ratio
(details)
0.1777%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge like a glamorous gargoyle, peering through a gap in the leaves at the suburban battlefield beyond. Streamers flapped from the fence. A pink inflatable castle bobbed in the yard like a diabetic moon landing. Children in paper crowns careered across the grass with the sort of reckless joy only the sugar-rich and consequence-poor could manage. Near the patio doors stood a long trestle table stacked with presents in bright paper towers, each one a tiny monument to poor judgment and online shopping. Bill adjusted his cap and squinted over her shoulder. He smelled faintly of peppermint and panic. “That,” Molly whispered, pointing with the seriousness of a field marshal, “is the prize pile.” “It looks festive,” Bill said. “I don’t trust anything that festive. It’s unnatural.” “Focus. We go in during the cake. No one watches their belongings during cake. Civilisation itself briefly collapses.” Bill nodded solemnly. “Cake is the window of lawlessness.” From the garden came the shriek of a child discovering either delight or a mild head injury. Molly withdrew behind the hedge and unfolded a hand-drawn plan on the back of a takeaway menu. It showed the house, the garden, and what appeared to be a dragon near the trampoline. “Why is there a dragon?” Bill asked. “That’s the clown,” Molly said. “Symbolically.” “I hate clowns.” “You hate cardigans with zips. Your emotional range is not useful intelligence.” Bill sniffed. “I’m just saying, any grown man who paints on extra eyebrows has abandoned reason.” Molly tapped the menu. “Listen carefully. The entertainer gathers the children for the magic show at two-thirty. Parents drift toward prosecco and mutual disappointment. We enter through the side gate disguised as the balloon delivery.” Bill looked down at the bundle in his arms: twenty helium balloons in savage colors tugging against their strings like eager little traitors. “And if they ask who sent us?” “We say, ‘From Uncle Tony.’ Every family has an Uncle Tony. If they don’t, they’ll assume it’s the divorced one.” Bill considered this. “That’s actually very elegant.” “Thank you.” A woman in a floral jumpsuit crossed the lawn carrying a tray of orange squash. Molly watched her with narrowed eyes. “Who’s that?” Bill asked. “Commanding officer. Possibly aunt. Moves quickly, distrusts happiness. She’ll be trouble.” “She has the walk of someone who alphabetises board games.” “Exactly.” Bill leaned closer to the hedge. “How many presents do you reckon?” “Twenty-five, maybe thirty. Enough to make this afternoon worth missing lunch.” “Do we take all of them?” Molly turned to him, scandalized. “Bill. We are thieves, not critics. Of course we take all of them.” A cheer erupted from the garden. The clown had emerged, tall and ghastly, in a polka-dot suit with shoes like canoes. Even from a distance he radiated the damp menace of an underfunded theatre school. Bill swallowed. “He’s looking this way.” “He’s not looking at us. He’s scanning for vulnerable applause.” The clown produced a balloon animal and handed it to a child, who burst into tears instantly. “Promising,” Molly said. Bill fidgeted with the balloon strings. “You ever feel bad? About stealing from children?” Molly looked back at the mountain of gifts, at the shiny bows trembling in the breeze. “Briefly. Then I remember children are basically very small looters with jam on their hands.” “That does help.” “It should.” Inside the house, through the patio doors, she spotted the hallway beyond: clear route, tiled floor, one umbrella stand, no dog. Beautiful. “Right,” she said, standing. “Masks on. Smiles wide. We are the joy professionals.” Bill pulled a party hat from his pocket and placed it on his head with grim resignation. “If I die here, tell people I was doing surveillance.” “If you die here,” Molly said, lifting the balloons, “I’m taking your share.” “Fair.” They slipped from behind the hedge and headed for the side gate, buoyed by helium, greed, and the distant sound of a clown asking a child to pick a card like it was a threat.
created_at
"2026-03-06T12:22:16.426Z"
3/6/2026, 12:22:16 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
453
moderation_latency
(null)
generation_time
25073
tokens_prompt
77
tokens_completion
1159
native_tokens_prompt
66
native_tokens_completion
900
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.013665
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_05a48b12185c5e9e0169aac6f87b888197a9458431cb5b6882"
is_byok
false
latency
175
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772799736-ZFrC3rdaLPEUlsSFTJM0"
upstream_id
"resp_05a48b12185c5e9e0169aac6f87b888197a9458431cb5b6882"
total_cost
0.013665
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
672
0.36%
Dialogue to Total Word Ratio
Ratio: 40.62%, Deviation: 10.62%
neededClean
false
wordsTotal
677
wordsDialogue
275
0.1777%