NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0PFBJZPBS5GZ55JH8C2GGP
Run Details
Model:
GPT-5.5 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
29.2s
Cost:
$0.034440
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
4.31%
Dialogue to Total Word Ratio
(details)
2.1532%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly flattened the crayon-drawn invitation on the café table as if it were a royal map. A smiling dinosaur in a party hat announced: TIMMY TURNS SEVEN! PRESENTS WELCOME! Bill squinted at it over his teacup. “That dinosaur knows too much.” “It’s a stegosaurus, Bill.” “Exactly. Armored. Suspicious.” Molly tapped the address. “Three o’clock, Saturday. Community hall. Balloons. Cake. Forty children. One distracted magician. And, most importantly, a mountain of presents.” Bill leaned back. “I don’t like it.” “You said that about the bakery job.” “The bakery job had geese.” “One goose.” “One goose with leadership qualities.” Molly sighed. “This is simple. We enter the party, blend in, collect the gifts, and leave.” “Blend in how? We are thirty-four and thirty-six.” “You’re short.” “I am not short. I’m compact with menace.” “You can wear a paper crown.” Bill stirred his tea grimly. “Children can smell fear.” “Then don’t be afraid.” “I’m afraid of glitter. It gets everywhere. Last Christmas I found some in my wallet and I hadn’t even been invited anywhere.” Molly folded her arms. “Bill, focus. Timmy’s parents will be busy. The kids will be screaming. Nobody watches the present table.” “Wrong. Grandmothers watch present tables. Grandmothers watch everything. A grandmother once saw me take one extra mint from a restaurant bowl and coughed the word ‘shame.’” “We’ll create a diversion.” “No clowns.” “I wasn’t going to say clowns.” “You were thinking clowns.” “I was thinking balloon animal stampede.” Bill paused. “Explain.” “We release a bag of balloons. Everyone looks up. We casually move the gifts.” “That’s not a stampede. That’s floating.” “Fine, a balloon animal migration.” He rubbed his chin. “What about the magician?” “What about him?” “Magicians are thieves with sleeves.” “Makes him one of us.” “Makes him competition.” Molly smiled. “We’ll compliment his hat. Magicians are weak to praise.” Bill nodded despite himself. “True. I once told a man his cape was dramatic and he gave me my watch back.” At the next table, a toddler dropped a spoon. Bill flinched. Molly lowered her voice. “You need confidence. Picture it: we walk in carrying a gift.” “What gift?” “Empty box.” “That’s dishonest.” “We’re thieves, Bill.” “Yes, but we have standards.” Molly stared at him. “All right, flexible standards.” “We place the empty box on the table. Later, we carry out several full boxes.” “Several? How many arms do you think I have?” “Two, last I checked.” “Then two presents. Three if one is soft.” “We are not risking everything for one plush giraffe and a puzzle.” Bill lifted a finger. “Puzzles have resale value among patient criminals.” “This is why our empire is a drawer full of foreign coins and a toaster without a lever.” He looked offended. “That toaster has character.” “It has a fork stuck in it.” “That fork is structural.” Molly leaned closer. “Bill. Presents. Focus.” He looked at the invitation again. “What if Timmy cries?” “He won’t know until later.” “That’s worse. Delayed crying has echo.” “You’re going soft.” “I’ve always been soft. That’s why I wear the big coat.” Molly’s expression faltered. “They’ll be fine. Parents exaggerate. Half the presents are educational anyway.” “Educational presents are cruel, but still.” “Since when do you care?” “Since my nephew invited me to his party and someone stole my gift.” “You brought him a potato.” “It was shaped like a duck.” “It was a potato.” “And he loved it for nearly six minutes.” Molly drummed her fingers. “So what are you saying?” Bill pushed the invitation back. “I’m saying we don’t steal from children.” “We’re thieves.” “We steal from jewelers, taxidermists, and men who say ‘crypto’ too loudly. Not children.” Molly looked out the window at a passing stroller piled with parcels. “The presents are right there.” “So is our self-respect, probably under the table.” She groaned. “Fine. New plan. We break in after the party and steal leftover cake.” Bill brightened. “Now that’s a crime with dignity.” “And maybe the magician’s hat.” “If he deserves it.” Molly picked up the invitation and tucked it into her coat. “Saturday, then. Cake job.” Bill raised his teacup. “To Timmy. May his gifts remain unpilfered and his cake be poorly guarded.” Molly clinked her cup against his. “And may there be no geese.” From somewhere outside came a honk. Bill went pale. “The stegosaurus has allies.”
created_at
"2026-04-24T21:31:30.788Z"
4/24/2026, 9:31:30 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4044
moderation_latency
(null)
generation_time
29087
tokens_prompt
77
tokens_completion
1388
native_tokens_prompt
66
native_tokens_completion
1137
native_tokens_completion_images
0
native_tokens_reasoning
88
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03444
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_05ac8333b855cf750169ebe132e4808195a9e9ae9b6d00aecd"
is_byok
false
latency
257
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777066290-K0y0KLNfWA6oyFwCcHbY"
session_id
(null)
api_type
"completions"
id
"gen-1777066290-k62UR5yaiJ8iKlXCUaWC"
upstream_id
"resp_05ac8333b855cf750169ebe132e4808195a9e9ae9b6d00aecd"
total_cost
0.03444
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
721
4.31%
Dialogue to Total Word Ratio
Ratio: 77.93%, Deviation: 7.93%
neededClean
false
wordsTotal
725
wordsDialogue
565
2.1532%