NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KK9Q9RTQCXBR10JS7NW9DCKT
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
32.2s
Cost:
$0.016755
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.06%
Dialogue to Total Word Ratio
(details)
0.0291%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hydrangeas with the solemn concentration of a jewel thief examining crown diamonds, though what she was actually studying was a suburban back garden bristling with balloons, paper streamers, and a banner that read HAPPY 7TH, OLIVER!!! in letters so large they seemed to be shouting directly at the police. Bill, beside her, adjusted his flat cap and peered over the fence. The children’s party was in full swing. Tiny humans tore across the lawn in capes and cone hats, powered by frosting and poor decisions. At the center of the chaos stood the gift table: a trestle piled high with presents in glossy paper, curling ribbons, and bows the size of pigeons. “Tell me again,” Bill whispered, “why we’re robbing a birthday party instead of somewhere with fewer minors and more emeralds?” “Because,” Molly said, keeping her eyes on the prize, “children receive astonishingly expensive things from relatives who feel guilty. There’ll be game consoles, tablets, remote-control dinosaurs. It’s a sentimental gold mine.” Bill frowned. “It feels wrong to steal from children.” Molly turned to him. “Bill, we once stole a ceremonial cheese wheel from a mayor.” “That was from a mayor,” he said. “This is from Oliver. I don’t know Oliver, but he sounds little.” From the garden came a squeal, followed by the unmistakable thud of someone losing a sack race to physics. Molly produced a folded diagram from her pocket. It was drawn in red pencil and smelled faintly of peppermint. “Observe. The magician arrives in eight minutes. All eyes go to the rabbit, all ears go to the dreadful patter, and we slip in through the side gate, load the presents into the laundry basket, and vanish through the alley.” Bill studied the diagram upside down. “Why have you marked the clown as a hazard?” “Unpredictable footwear,” Molly said. “And access to balloons. Never trust a person who can produce a sausage dog from a pocket.” A shriek of laughter erupted from the lawn as a pirate-themed piñata was carried out like a festive hostage. Bill tugged at his collar. “There are a lot of adults.” “Yes, but none of them are paying attention. They’re drinking warm prosecco and discussing school admissions. In criminal terms, they are spiritually absent.” “And the dog?” Molly glanced toward the patio, where a bulldog in a birthday bow tie sat like a retired dockworker. “Bribed. I brought cocktail sausages.” Bill blinked. “You packed for the dog before you packed for me.” “You are less bitey.” He sighed. “What if a child sees us?” “Then we improvise. Smile vaguely, carry a clipboard, and say something bureaucratic. Adults fear clipboards.” Bill nodded despite himself. “That’s true. I once signed for a conservatory I didn’t want because a man had a lanyard.” A gust lifted the corner of the banner. One of the balloons popped with the gunshot drama only balloons possess. Molly took it as an omen. “Right,” she said, standing. “Game faces.” Bill rose more cautiously. “What sort of face is appropriate for stealing a pile of gifts from a seven-year-old?” “The sort that suggests you’re from catering.” They slipped through the side gate just as the magician announced himself with a cry of “Who’s ready for wonder?” which, to Molly’s mind, was exactly the sort of sentence that covered a multitude of crimes. Children stampeded toward the performance area. Adults drifted after them, applauding as if someone had cured weather. The gift table stood momentarily alone, bright and ridiculous, loaded with wrapping paper treasure. Bill seized the laundry basket and muttered, “I’m going to be haunted by very small ghosts.” Molly thrust a boxed present into his arms. “Move faster and they’ll never catch us.” He obeyed, stacking gifts until the basket looked like Christmas had suffered a head injury. For one glorious moment, it worked. Then a tiny voice behind them said, with devastating clarity, “Mum says you’re not our aunties.” Molly and Bill froze, surrounded by glittering paper and the crushing moral authority of a child in a plastic crown.
created_at
"2026-03-09T16:36:18.91Z"
3/9/2026, 4:36:18 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
6504
moderation_latency
(null)
generation_time
32104
tokens_prompt
77
tokens_completion
1122
native_tokens_prompt
66
native_tokens_completion
1106
native_tokens_completion_images
0
native_tokens_reasoning
211
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.016755
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_096193796dd15ffb0169aef70313d48196b7c79534c9c36ac4"
is_byok
false
latency
250
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773074178-eWTYaJHUzJxsCvEwBXdT"
upstream_id
"resp_096193796dd15ffb0169aef70313d48196b7c79534c9c36ac4"
total_cost
0.016755
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
676
0.06%
Dialogue to Total Word Ratio
Ratio: 42.21%, Deviation: 12.21%
neededClean
false
wordsTotal
680
wordsDialogue
287
0.0291%