NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0N1GTZ535MXN3ETQNQQHJQ
Run Details
Model:
GPT-5.5 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
27.3s
Cost:
$0.032430
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.01%
Dialogue to Total Word Ratio
(details)
0.0034%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the invitation on the bonnet of the van as if it were a treasure map. It was pink, glittered aggressively, and smelled faintly of bubblegum. Across the top, in looping letters, it announced: Poppy’s Seventh Birthday Spectacular! Ponies! Magician! Cake Mountain! Bill squinted at it over his half-moon glasses. “Cake Mountain sounds structural.” “It sounds delicious,” Molly said. “Focus, Bill. The presents arrive between two and three. We go in at three-fifteen, during the magician’s rabbit finale.” Bill looked wounded. “I don’t like rabbits. They judge.” “Everyone judges you. You wore a balaclava to a car boot sale.” “It was chilly.” Molly tapped the invitation. “We are not stealing from children. We are liberating gifts from the tyranny of excessive wrapping paper.” “That is exactly stealing from children.” “Fine. We are stealing from children, but tastefully.” The van was parked three streets from the community hall, beneath a sycamore tree full of pigeons with criminally accurate aim. Through the open window came the faint sound of shrieking laughter, a balloon popping, and an adult saying, with increasing panic, “No, Alfie, not in the punch.” Bill unfolded a notebook. The cover read MASTERPLAN, though someone had drawn a sausage in place of the A. “Disguises,” he said. “I’m thinking entertainers. Clown?” “You cried last time you saw a clown.” “That clown had ambition.” “You’ll be the health-and-safety inspector,” Molly said. “Clipboard, frown, visible disappointment. It’s your natural state.” Bill considered this. “And you?” “Face painter.” “You can’t paint faces.” “I can paint a face.” “You once painted a cat on my cheek and it looked like a haunted potato.” “It was postmodern.” A gust of wind lifted the invitation. Molly slapped it down with the reflexes of a pickpocket and the dignity of someone who had once been chased out of a garden centre by an animatronic Santa. Their target, the community hall, was a squat brick building decorated with balloons, streamers, and one enormous banner reading HAPPY BIRTHDAY POPPY in letters so cheerful they seemed legally enforceable. Parents drifted in and out carrying crisps, juice boxes, and expressions of spiritual defeat. A stack of presents had already appeared on a long table near the window: shiny boxes, gift bags, one suspiciously large parcel with air holes. Bill peered at it. “What if that’s a puppy?” “Then we don’t steal that one.” “What if they’ve gift-wrapped a ferret?” “Then we especially don’t steal that one.” Molly opened a battered suitcase. Inside were their tools: a clipboard, a sash reading OFFICIAL FUN AUDITOR, several false moustaches, a face-painting kit, and a collapsible trolley shaped like a duck. Bill lifted the trolley. “Why is it a duck?” “Children trust ducks.” “Do adults?” “Adults are tired. They trust anything with wheels.” He nodded solemnly, as though this were a law of physics. From the hall came a collective gasp, followed by applause. The magician had begun. Molly checked her watch. The rhinestones around its face caught the sun and briefly blinded Bill. “Right,” she said. “We enter separately. You cause a minor administrative panic near the kitchen. I mingle, compliment the cake, load the duck, and waddle away.” “Suppose someone asks for face paint?” “Then I give them a lightning bolt.” “What if they ask for a tiger?” “Lightning bolt with whiskers.” “What if they ask for Princess Sparklehoof?” Molly paused. “Then we abandon the heist. Some crimes are too great.” Bill tucked the clipboard under his arm and practiced his frown in the van mirror. It was powerful enough to curdle custard. “You know,” he said, “Poppy’s going to be disappointed.” Molly looked again at the mountain of presents, then at the hall where a child was laughing so hard she had become a hiccuping blur. “For ten minutes,” Molly said. “Then we return them anonymously, keep the wrapping paper, and invoice her parents for teaching everyone a lesson about materialism.” Bill brightened. “So it’s an educational heist.” “Exactly.” “And cake?” Molly shut the suitcase with a snap. “Obviously cake.”
created_at
"2026-04-24T21:06:28.85Z"
4/24/2026, 9:06:28 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4476
moderation_latency
(null)
generation_time
27177
tokens_prompt
77
tokens_completion
1194
native_tokens_prompt
66
native_tokens_completion
1070
native_tokens_completion_images
0
native_tokens_reasoning
133
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03243
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_0f20fff1dab715420169ebdb54f36881908f0e8aaf1540a31c"
is_byok
false
latency
347
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777064788-BhfjU7vRLPhmq5cSktYN"
session_id
(null)
api_type
"completions"
id
"gen-1777064788-5yrmykAx7hbSqrhG8GnE"
upstream_id
"resp_0f20fff1dab715420169ebdb54f36881908f0e8aaf1540a31c"
total_cost
0.03243
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
669
0.01%
Dialogue to Total Word Ratio
Ratio: 43.85%, Deviation: 13.85%
neededClean
false
wordsTotal
675
wordsDialogue
296
0.0034%