NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KKACDSYWNASP5VBA25DVCTFY
Run Details
Model:
GPT-5.4 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
24.9s
Cost:
$0.015225
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
2.44%
Dialogue to Total Word Ratio
(details)
1.2210%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again,” said Molly, crouched behind the hedge with a pair of opera glasses, “why we are robbing a seven-year-old.” “Correction,” said Bill, offended in a whisper, “we are relieving thirty-seven inattentive adults of a pile of commercially overpackaged goods.” “In a bouncy castle.” “In a strategic distribution hub disguised as a bouncy castle.” Molly lowered the glasses. “There is a magician making balloon parrots.” “Diversionary asset.” “There is a woman dressed as a fairy painting tigers on children.” “Additional diversionary asset.” “There is a chocolate fountain, Bill.” Bill peeked through the leaves and sighed with reverence. “Yes. A dangerous variable.” The back garden blazed with bunting and shrieking. Children in paper crowns careered past a table stacked with brightly wrapped presents. A clown on stilts folded in half to get through the gate and emerged looking winded and philosophical. Molly said, “This is beneath us.” Bill adjusted his tie, which featured tiny lockpicks. “We once stole a racehorse from a duke because the duke had insulted your hat.” “He said it looked argumentative.” “It was an outrageous remark. Compared to that, this is community-level acquisition.” Molly frowned. “You do realize children remember things forever. We take these presents, we become origin stories.” Bill considered. “That is a fair ethical concern.” “And one of them has a toy sword. Look at him. He’s already seeking vengeance.” The boy in question was patrolling the perimeter with the joyless focus of a tiny customs officer. Bill unfolded a hand-drawn map. “We proceed delicately. Stage one: infiltrate as hired entertainment.” “I am not making balloon animals.” “You don’t have to. You’re the mysterious puppeteer from Prague.” “I can’t do a Prague accent.” “You can barely do your own accent.” “My accent is layered.” “Your accent is stolen from three counties and a barmaid.” Molly snatched the map. “This says ‘neutralize grandmother.’” “It means distract grandmother. She’s guarding the gift table like a retired admiral.” The grandmother sat in a deckchair beside the presents, handbag on lap, expression severe enough to curdle lemonade. Molly whistled. “Good posture. Former military?” “Worse. Former school secretary.” They watched a child run up, bounce once on the castle, then come off crying because another child had declared himself king. Bill tapped the map again. “While the adults are occupied with cake deployment, I wheel the presents out in the catering trolley.” “And I do what?” “You create narrative confusion.” “That’s not a job, Bill. That’s your personality.” He ignored this. “You stride in, announce there’s been a mix-up with the pony.” “There is no pony.” “Exactly. People panic beautifully around absent ponies.” Molly looked at him for a long moment. “Sometimes I think your mind was assembled out of storm debris.” “Yet you continue to work with me.” “Only because prison food made your cheekbones interesting.” A gust of wind carried the smell of sausages and frosting. Bill’s stomach growled loud enough to startle a pigeon. “That,” said Molly, “is our other problem. We can’t rob a child on an empty stomach. It’s ungentlemanly.” “And ungentleladyly,” Bill agreed. “Reckless blood sugar ruins timing.” “So first we eat.” “Reconnaissance eating.” “Two sausage rolls each.” “Three. We need strength.” Molly peered again through the glasses. “There’s a sign by the gate.” “What does it say?” “‘Welcome to Timothy’s Pirate Party. Please put presents on the treasure table.’” Bill slowly smiled. “Pirates.” “Oh no.” “Oh yes. We’re not thieves, Molly.” “We are manifestly thieves.” “Tonight, we are themed participants.” Molly rubbed her temples. “I hate when your logic puts on a hat.” Bill brightened. “We go in as pirates, carry off the treasure in full view, and everyone applauds the commitment to the bit.” She stared at the riot of balloons, the gift table, the hawk-eyed grandmother, the toy-sword boy, the magician, the fairy, the clown, the chocolate fountain. Then she said, “If this works, I’m taking the fountain too.” Bill grinned. “Now you’re thinking like a professional.” They rose from the hedge together, brushed leaves from their coats, and headed for the gate with the solemn confidence of idiots about to be mistaken for geniuses.
created_at
"2026-03-09T22:45:31.236Z"
3/9/2026, 10:45:31 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2903
moderation_latency
(null)
generation_time
24821
tokens_prompt
77
tokens_completion
1408
native_tokens_prompt
66
native_tokens_completion
1004
native_tokens_completion_images
0
native_tokens_reasoning
35
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.015225
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0b6d5a1e4ba142970169af4d8b4e7c8190b26fc33e73960d0c"
is_byok
false
latency
144
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773096331-KN6Z6tVMG5JqREEWddHn"
upstream_id
"resp_0b6d5a1e4ba142970169af4d8b4e7c8190b26fc33e73960d0c"
total_cost
0.015225
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
686
2.44%
Dialogue to Total Word Ratio
Ratio: 61.38%, Deviation: 8.62%
neededClean
false
wordsTotal
694
wordsDialogue
426
1.2210%