NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KK9Q2823H8TCM21VP755X164
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
32.2s
Cost:
$0.016575
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.62%
Dialogue to Total Word Ratio
(details)
0.3082%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the privet hedge with the solemnity of a general surveying a battlefield, though the battlefield in question was a back garden full of bunting, a hired magician, and a castle-shaped cake that listed gently to one side like a sinking ship. Bill, beside her, wore a black turtleneck, burglar’s gloves, and, for reasons known only to Bill, a paper cone party hat he had found in the alley. “This,” Molly whispered, peering through the leaves, “is the softest target in the county.” Bill adjusted the hat. “I still think the hat helps me blend in.” “It makes you look like a suspicious elf.” “A festive suspicious elf.” Beyond the hedge, a pack of shrieking six-year-olds thundered across the lawn in pursuit of bubbles. A clown on stilts wobbled near the punch bowl. At the center of the chaos sat the prize: a long table stacked with brightly wrapped presents, glittering in the sun like a dragon’s hoard for people under four feet tall. Molly rubbed her hands together. She had the lean, hungry expression of a woman who respected craftsmanship, and nothing, to her mind, displayed craftsmanship quite like a well-tied ribbon. “Look at them,” she murmured. “Uniform shapes, high-volume wrapping, no inventory control. It’s practically an invitation.” Bill frowned. “I’m not saying I object to stealing from children. I’m just saying children are loud, sticky, and possess the investigative stamina of terriers.” Molly gave him a flat look. “Bill, last month you robbed a bowling alley during league night.” “Yes, but adults become sluggish around nachos. Children run entirely on cake and accusations.” A small boy raced past wearing a pirate hat and carrying a foam sword bigger than his torso. Bill instinctively ducked lower. Molly extracted a folded sketch from her pocket. It was not so much a plan as several arrows pointing toward a rectangle labeled PRESENTS, followed by a triumphant exclamation mark. “Here’s the strategy,” she said. Bill squinted. “That’s not a strategy. That’s optimism with stationery.” “We wait for the magician to begin his finale. Smoke, applause, confusion. Then we go in dressed as entertainers, seize the gifts, and vanish through the side gate before anyone notices.” Bill stared. “And what entertainers are we dressed as, exactly?” Molly reached into a canvas bag and produced two enormous mascot heads: one rabbit, one giraffe. Both had fixed smiles and thousand-yard stares. Bill recoiled. “Good lord.” “The rabbit is yours.” “Why is the giraffe wearing lipstick?” “It came that way. Don’t be prejudiced.” He took the rabbit head between finger and thumb as if it might explode. “You know what bothers me most? Not the crime. Not the moral abyss. It’s that children can smell fear, and this rabbit smells like old carpet.” “That’s authenticity,” Molly said. “Now listen carefully. No improvising.” Bill put a hand to his chest. “You wound me.” “I’m serious. Last time you improvised, we escaped in a florist’s van and spent two hours hidden under hydrangeas while you sneezed in Morse code.” A burst of applause rose from the garden. The magician had appeared in a plum-colored cape and was flourishing a wand with the weariness of a man who had seen too many rabbits in his professional life. Bill glanced at the present table again. “There is, I admit, something beautiful about it. All that possibility. Toy dinosaurs. Chemistry sets. Probably at least one remote-control car.” Molly smiled, quick and sharp. “Now you’re thinking like a professional.” He pulled on the rabbit head, and his voice emerged muffled and tragic. “If I’m mauled by a child, tell the papers I died in business attire.” Molly fitted the giraffe over her own head, squaring her shoulders as the magician shouted for every eye in the garden. “Bill,” she said through the giraffe’s painted grin, “on my count, we become unforgettable.” They rose from the hedge together, one rabbit and one lipstick giraffe, and slipped toward the party with all the stealth that giant felt animals could muster.
created_at
"2026-03-09T16:32:12.365Z"
3/9/2026, 4:32:12 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
8201
moderation_latency
(null)
generation_time
32126
tokens_prompt
77
tokens_completion
1357
native_tokens_prompt
66
native_tokens_completion
1094
native_tokens_completion_images
0
native_tokens_reasoning
207
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.016575
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_06bbd88617d41f9d0169aef60c70748197b37ab06faddf3105"
is_byok
false
latency
169
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773073932-OfbuBSYbD5ZzKhba086c"
upstream_id
"resp_06bbd88617d41f9d0169aef60c70748197b37ab06faddf3105"
total_cost
0.016575
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
668
0.62%
Dialogue to Total Word Ratio
Ratio: 40.09%, Deviation: 10.09%
neededClean
false
wordsTotal
676
wordsDialogue
271
0.3082%