NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0N4NZXT2BH597XAEYYJPBG
Run Details
Model:
GPT-5.5 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
27.5s
Cost:
$0.033690
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
4.32%
Dialogue to Total Word Ratio
(details)
2.1623%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly unrolled the birthday-party blueprint across the bonnet of their stolen ice-cream van, which they had stolen by accident and now deeply regretted because it played “Greensleeves” whenever anyone breathed near the dashboard. Bill peered at the crayon-smeared map. “Why is the living room labelled Dragon Pit?” “Because it contains children,” Molly said. “Six-year-olds. Possibly seven. Unpredictable. Sticky. Armed with cake.” Bill swallowed. He had cracked safes, slipped diamonds from museum cases, and once pickpocketed a duchess during a sneeze, but children unsettled him. They asked direct questions. They noticed wigs. “The presents will be here,” Molly said, tapping a square marked Mountain of Joy. “Beside the bouncy castle, under the banner that says Happy Birthday, Poppy.” “Poppy,” Bill repeated. “Sounds dangerous.” “She’s turning six.” “That’s when they’re fastest.” Molly ignored him and produced two party hats from her coat. One was pink with glitter. The other had a rubber dinosaur glued to the top. Bill stared. “Absolutely not.” “Disguise is the soul of burglary.” “I am not wearing a dinosaur on my head.” “You said the same thing in Prague, and the tiara got us past three guards.” “That was different. The tiara had dignity.” Molly fitted the pink hat onto her own hair and tied the elastic beneath her chin with professional solemnity. “We enter at fourteen hundred hours with the entertainer.” “The magician?” “No. The man dressed as Captain Sprinkles.” Bill looked toward the house across the street. Balloons bobbed at the gate like bright little witnesses. A paper unicorn smiled at him from the window, which felt accusatory. “And what if Captain Sprinkles recognizes us as not being part of his crew?” Bill asked. Molly held up a clipboard. “We say we’re from Regional Sprinkle Compliance.” “That’s not a thing.” “It is if you say it with enough disappointment.” From inside the van came a soft mechanical wheeze. Then “Greensleeves” began, loudly, mournfully, and with the emotional range of a haunted kettle. Bill slapped the dashboard. “Hush, you creamy traitor!” The music grew louder. Molly leaned through the window and yanked out a wire. The van coughed, released a final jolly note, and went silent. “Right,” she said. “Once inside, we mingle.” “I don’t mingle. I lurk.” “You’ll mingle today. You’ll stand near the sausage rolls and say things like, Lovely weather for pass-the-parcel.” Bill practiced a smile. It looked as if he had found a beetle in his soup and was trying to be brave about it. “Less terrifying,” Molly advised. He adjusted his dinosaur hat with a sigh. “And then we take the presents?” “We remove them,” Molly said, “with grace. Into these.” She opened a laundry sack printed with cheerful ducks. Bill frowned at it. “That is a very judgmental sack.” “It was on sale.” Across the road, the front door opened. A small girl in a gold crown marched out carrying a wand. Behind her trailed a line of adults with plates, parcels, and the exhausted expressions of people negotiating with tiny royalty. Molly lowered the binoculars. “That’s Poppy.” Bill ducked behind the bonnet. “She looked straight at me.” “She looked at the van.” “The van looks guilty.” Poppy pointed at the ice-cream van. Her face lit up with devastating joy. Molly and Bill froze. “New plan,” Molly whispered. “We flee?” “We sell ice cream.” “We don’t have ice cream.” Molly opened the freezer compartment. Inside were three melted lollies, a bag of peas, and a velvet pouch of lockpicks. Bill lifted the peas. “Mint choc chip?” Children began pouring from the house, shrieking with hope. Molly straightened her party hat, squared her shoulders, and grabbed the duck sack. “Remember,” she said, “criminal confidence.” Bill looked at the approaching swarm, then at the presents glittering in the hallway beyond them. “Molly?” “Yes?” “I think the presents may be safer than we are.”
created_at
"2026-04-24T21:08:12.42Z"
4/24/2026, 9:08:12 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
7694
moderation_latency
(null)
generation_time
27429
tokens_prompt
77
tokens_completion
1262
native_tokens_prompt
66
native_tokens_completion
1112
native_tokens_completion_images
0
native_tokens_reasoning
219
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03369
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_00a07a9a8864267a0169ebdbbc7e7c8196b0366226f1136a28"
is_byok
false
latency
377
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777064892-7dZhut0Q7zHSzqW0hECW"
session_id
(null)
api_type
"completions"
id
"gen-1777064892-dLx7DCHRba7AzS2jVuc3"
upstream_id
"resp_00a07a9a8864267a0169ebdbbc7e7c8196b0366226f1136a28"
total_cost
0.03369
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
640
4.32%
Dialogue to Total Word Ratio
Ratio: 37.93%, Deviation: 7.93%
neededClean
false
wordsTotal
646
wordsDialogue
245
2.1623%