NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0PV5TJ92S99Q89XT3E22VQ
Run Details
Model:
GPT-5.5 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
33.8s
Cost:
$0.037320
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
24.17%
Dialogue to Total Word Ratio
(details)
12.0846%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly unfolded the napkin like it was a treasure map, though it mostly showed a smear of jam and the words HAPPY 7TH BIRTHDAY, LUCY! in purple frosting. Bill squinted at it. “That is not a floor plan.” “It is if you have imagination.” “It’s a napkin.” “It’s intelligence gathered at great personal risk.” “You bought a cupcake from the mother.” “I infiltrated the perimeter.” “You said, ‘One cupcake, please.’” Molly tapped the frosting letters. “The party begins at two. Cake at three. Presents after cake. That gives us a window.” Bill leaned back on the park bench, which groaned like it wanted no part in the conversation. “A window to do what, exactly? Waltz into a room full of children, parents, balloons, and one sugar-crazed magician?” “Precisely.” “Precisely insane.” Molly tucked a curl under her wool hat. “We’re not stealing from the children. We’re stealing from the institution of birthday excess.” “The institution is seven.” “The presents are piled high, Bill. Mountains. Towers. Shiny bags with tissue paper exploding out the top. No child needs that many gifts.” “No child needs two criminals in the cloakroom either.” “We prefer ‘redistribution specialists.’” Bill sighed. “Last week you redistributed a garden gnome and then felt guilty and mailed it back wearing a scarf.” “He looked cold.” Across the street, the community hall gleamed with streamers. A paper dinosaur taped to the door flapped in the wind as if warning them away. Molly produced a pair of party hats from her coat. “Disguises.” Bill stared. “One has a pony on it.” “You can be whimsical.” “I am wanted in three boroughs for burglary, not whimsy.” “Then be dangerously whimsical.” He took the hat and set it on his head. The elastic snapped under his chin. “I hate how natural this feels.” “Good. We enter as late guests.” “We don’t know Lucy.” “We know enough. She likes dinosaurs, pink icing, and possibly ponies.” “That describes half the children in the kingdom.” Molly pointed to a sagging banner through the window. “We mingle. We applaud the magician. During the balloon animal portion, we approach the present table.” “And then?” “And then we remove the presents.” Bill waited. “That’s your entire plan?” “It has elegance.” “It has prison.” “It has simplicity.” “It has witnesses under four feet tall who remember everything and describe people like little police artists.” Molly considered this. “Fine. We create a diversion.” “No fire.” “I wasn’t going to say fire.” “You were thinking fire.” “I was thinking dramatic glitter.” “That’s just fire with ambition.” A shriek of laughter burst from the hall. Both thieves ducked instinctively behind the bench, though no one had looked their way. Bill peered over the backrest. “Listen to them. They sound happy.” “Of course they’re happy. They’re guarding loot.” “They’re eating sausage rolls.” “Loot-adjacent.” He removed the pony hat and turned it in his hands. “Molly, have we perhaps reached a low point?” She frowned. “We once robbed a mime.” “He had a diamond cufflink.” “He was very upset, silently.” “This is worse.” Molly looked again at the hall. A small girl in a green dinosaur tail ran past the window, chased by another child wearing fairy wings and a pirate eyepatch. Behind them, a tired father carried a stack of paper plates like sacred relics. Bill said, “What are we even going to do with children’s presents? Fence a stuffed giraffe? Move hot crayons on the black market?” “There’s always demand for building bricks.” “There’s always demand for not being monsters.” Molly folded the napkin slowly. “You’re getting sentimental.” “You made me wear a pony hat. It opened something.” The hall door swung wide. A woman stepped out, balancing a bin bag and a look of total defeat. “Excuse me,” she called. “Are you two the entertainers?” Molly and Bill froze. Bill whispered, “Say no.” Molly smiled brightly. “Yes.” Bill whispered, “Why would you say yes?” The woman sighed with relief. “Wonderful. The magician’s rabbit escaped into the kitchen, the children are mutinying, and Lucy is crying because nobody can make a dinosaur out of balloons.” Molly stood. “We can help.” “We can?” Bill said. “We can.” She jabbed him in the ribs. “Bill does a marvelous pony.” “I do not.” Ten minutes later, Bill stood in the middle of the party wearing the pony hat, twisting balloons with the terror of a man defusing bombs. Molly passed out napkins and intercepted falling cups of juice with criminal precision. Lucy sniffled, watching Bill’s lopsided balloon creation. “It’s a dinosaur,” he said. “It looks like a sausage with knees,” Lucy replied. Bill bowed. “A rare sausageosaurus.” Lucy giggled. Molly, beside the present table, looked at the shining pile. Then she looked at Bill, surrounded by delighted children, sweating under a paper hat. He mouthed, “No stealing.” She mouthed back, “Fine.” Later, they left with two cupcakes, a thank-you card, and seventeen pounds in entertainer money. Bill smiled. “Best heist yet.” Molly bit into her cupcake. “We stole the show.”
created_at
"2026-04-24T21:37:58.107Z"
4/24/2026, 9:37:58 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
3152
moderation_latency
(null)
generation_time
32760
tokens_prompt
77
tokens_completion
1527
native_tokens_prompt
66
native_tokens_completion
1233
native_tokens_completion_images
0
native_tokens_reasoning
84
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03732
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_0544a9dc9f8d05690169ebe2b64438819490c78ec72fa9deb0"
is_byok
false
latency
390
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777066678-2CqQZnLyhuKowd8Yqrnm"
session_id
(null)
api_type
"completions"
id
"gen-1777066678-RJSXpGMGUCvpVdoZzrpR"
upstream_id
"resp_0544a9dc9f8d05690169ebe2b64438819490c78ec72fa9deb0"
total_cost
0.03732
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
833
24.17%
Dialogue to Total Word Ratio
Ratio: 55.33%, Deviation: 5.33%
neededClean
false
wordsTotal
835
wordsDialogue
462
12.0846%