NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KKABWTV4RQP0WQ8SSFNEAE35
Run Details
Model:
GPT-5.4 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
29.2s
Cost:
$0.015150
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
97.71%
Dialogue to Total Word Ratio
(details)
48.8567%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again,” said Bill, peering through the hedge with the solemnity of a man evaluating a bank vault, “why we are robbing a six-year-old.” “We are not robbing a six-year-old,” Molly whispered. “We are intercepting gifts in transit from a garden full of inattentive adults.” “That is somehow worse when you phrase it fancy.” “Hush. Observe the battlefield.” Bill squinted at the back garden. Balloons bounced on the fence. A clown was making a dachshund out of what looked, to Bill, like a regretful sausage. Children tore across the lawn in shrieking loops. “I hate the clown,” Bill said. “You hate all clowns.” “No, I hate two kinds of clowns. Party clowns and clowns outside parties. That covers most of the trade.” Molly unfolded a hand-drawn plan on her knee. “Present table by the patio doors. Cake at one o’clock. Pony rides at two. Sugar crash at half past. During pass-the-parcel, every eye will be on the screaming circle, and we glide in.” “Glide,” Bill repeated, looking down at his boots. “In these?” “Confidence glides. Shoes merely accompany it.” Bill tugged at the brim of his borrowed cap. “I still say the costumes are too much.” “We’re entertainers,” said Molly. “Invisible people in plain sight. You’re ‘Captain Bubbles.’ I’m ‘Professor Pop.’” “Captain Bubbles sounds damp.” “It sounds trustworthy. Children trust bubbles. Adults trust uniforms. We have both. That is civilisation.” Bill held up the giant ring of bubble wands. “I look like a washerwoman who’s lost her calling.” “You look festive. Now, our route. We enter through the side gate with the face of underpaid professionals. We produce bubbles. We drift toward the present table. You create a diversion.” “What kind of diversion?” “The kind you are born to. Mild confusion with a note of apology.” Bill considered this. “I can trip over the clown.” “No. Too obvious.” “I can start a rumour about a loose ferret.” “Too memorable.” “A bee?” “Seasonal, but risky.” Bill brightened. “I could loudly ask which child is allergic to glitter.” Molly paused. “That’s excellent.” “Thank you.” “Panic adjacent, but not actionable. Very hard to prosecute.” A small boy in a paper crown sprinted past the hedge, pursued by three girls wielding foam swords. Bill watched him go. “You ever think this is low, even for us?” Molly adjusted her spectacles, which had plain glass and an air of false scholarship. “Bill, these presents will be forgotten by Wednesday. Half of them are battery-operated nonsense destined to sing in cupboards. We are relieving the nation of plastic.” “That is the first eco-crime I’ve ever committed.” “Besides,” she said, “we’re not keeping all of them. We fence the expensive ones, donate the educational toys, and I’m having that remote-control dinosaur.” “I knew there was a dinosaur.” “There is always a dinosaur.” The clown honked a horn. Bill flinched so hard the hedge rustled. “Control yourself,” Molly hissed. “He looked at me.” “He cannot see through greasepaint and despair.” Bill took a breath. “Right. We go in. I bubble. I ask about glitter. While the mothers audit the guest list and the fathers pretend they know where the EpiPens are, you bag the loot.” “Exactly.” “And if someone stops us?” Molly smiled. “Then we become indispensable.” “How?” “We say, ‘Terribly sorry, we were hired for the princess package.’ Nobody admits they forgot hiring entertainment. Pride does half the burglary in this country.” Bill nodded slowly. “You know, that’s almost beautiful.” “It is beautiful. It’s social engineering with balloons.” A roar rose from the garden as the clown produced a stream of sweets from his sleeve. Bill grimaced. “I really hate his face.” “Focus, Captain Bubbles.” Molly stood, smoothed her bright waistcoat, and lifted the empty gift sack hidden inside the prop bag. “On my mark. Smile like you charge by the hour.” Bill rose beside her. “If this goes wrong, I’m blaming the clown.” “If this goes right,” said Molly, pushing open the gate, “we’re having cake in the getaway van.” Bill’s expression hardened into resolve. “Now that,” he said, “is a crime I can believe in.”
created_at
"2026-03-09T22:36:15.085Z"
3/9/2026, 10:36:15 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2843
moderation_latency
(null)
generation_time
29094
tokens_prompt
77
tokens_completion
1337
native_tokens_prompt
66
native_tokens_completion
999
native_tokens_completion_images
0
native_tokens_reasoning
47
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01515
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0dd17a06eb4c57ab0169af4b5f2e98819590710ee859e9d722"
is_byok
false
latency
213
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773095775-GE4hauxc1Xl9JabxNz0B"
upstream_id
"resp_0dd17a06eb4c57ab0169af4b5f2e98819590710ee859e9d722"
total_cost
0.01515
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
681
97.71%
Dialogue to Total Word Ratio
Ratio: 69.32%, Deviation: 0.68%
neededClean
false
wordsTotal
691
wordsDialogue
479
48.8567%