NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0PG7T6CDFQ4RWG4SAFMCAQ
Run Details
Model:
GPT-5.5 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
28.4s
Cost:
$0.036090
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
34.65%
Dialogue to Total Word Ratio
(details)
17.3247%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the map across the bonnet of Bill’s van, pinning down one corner with a sausage roll and another with a plastic tiara. The map was not, strictly speaking, a map. It was a crayon drawing of the church hall made by Bill after “extensive reconnaissance,” which had consisted of him peering over the hedge while pretending to look for a lost ferret. “The presents are here,” Molly said, tapping a square labelled MOUNT GIFTMORE. Bill squinted. “That’s the buffet.” “No, the buffet is this bit with the little sausages.” “That’s the bouncy castle.” Molly frowned at the drawing. “Why did you draw the bouncy castle as sausages?” “I was hungry.” Inside the hall, music thumped and children shrieked with the specific joy of people too short to be held legally responsible for jam on curtains. Through the frosted windows came flashes of balloons, paper hats, and one exhausted adult dressed as a pirate with the hollow stare of a man reconsidering every choice since birth. Bill adjusted the collar of his borrowed magician’s cape. It had silver moons on it and smelled faintly of cabbage. “I still think clown is safer,” he said. “You cried when you saw your own reflection.” “That clown came out of nowhere.” “It was you, Bill.” “That’s what made it worse.” Molly had chosen a pink fairy godmother outfit with wings that kept catching on the van mirror. She believed in disguises that made witnesses doubt themselves. Nobody wanted to admit they’d been robbed by a fairy godmother and a nervous wizard. “The birthday boy turns seven at half past,” she said. “Cake, singing, distraction. We enter during the applause, collect the parcels, exit with dignity.” Bill nodded gravely. “And if they ask us to perform?” “You do a trick.” “I know one.” “The coin behind the ear?” “No, the one where I panic and run.” Molly sighed. A balloon drifted loose from the hall door and bobbed across the car park like a tiny, mocking moon. Bill watched it go. “Seems a bit rough, stealing from kids,” he muttered. “They’re children. They’ll get over it by Tuesday.” “I didn’t get over losing my wooden duck until I was twenty-nine.” “You found it in your sock drawer.” “Exactly. Years of emotional complexity wasted.” Molly softened, but only in the way a biscuit softens after being dropped in tea: briefly, and with structural concerns. She looked again at the glowing windows. A chorus of children began shouting something that might have been Happy Birthday, or possibly a small uprising. “We’re not monsters,” she said. “We’re professionals.” “Professionals steal from banks.” “Banks have alarms.” “Children have mothers.” A woman’s voice blasted from inside the hall: “Who put jelly in the piano?” Bill pointed toward the door. “See? Security’s already unstable.” Molly gathered up the crayon map, accidentally smearing Mount Giftmore into the buffet. “Fine. In and out. No improvising. No sentiment.” The hall door opened. A little girl in a paper crown stepped out, holding a wrapped box nearly as big as her torso. She stared at them. Molly froze in a fairy pose that suggested benevolence and tax fraud. Bill gave a tiny wave. “Evening.” The girl looked him up and down. “Are you the magic man?” Bill swallowed. “Depends who’s asking.” “I asked for a rabbit.” Molly leaned close to Bill. “Can you produce a rabbit?” “I can produce a receipt for this cape.” The girl considered this, then thrust the present at Molly. “Hold this. I need crisps.” She marched back inside. Molly stared at the gift in her hands. It was wrapped in bright blue paper covered with dinosaurs wearing party hats. A tag read: For Toby, from Nan. Bill looked at it, then at Molly, then at the hall. “Well,” he said quietly, “that was easy.” Molly’s fingers tightened around the present. From inside came laughter, an adult groan, and someone yelling, “The pirate’s stuck in the castle!” She tucked the gift under her arm and headed for the door. Bill blinked. “Molly?” “We’re returning it,” she snapped. “And stealing cake.”
created_at
"2026-04-24T21:31:59.695Z"
4/24/2026, 9:31:59 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
9808
moderation_latency
(null)
generation_time
28275
tokens_prompt
77
tokens_completion
1301
native_tokens_prompt
66
native_tokens_completion
1192
native_tokens_completion_images
0
native_tokens_reasoning
285
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03609
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_0c7b284167da13d40169ebe14fca588194b44b4732f67fd9a3"
is_byok
false
latency
355
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777066319-JvkhfCNes2aoo2lUUGnf"
session_id
(null)
api_type
"completions"
id
"gen-1777066319-obdSp1fYfcKQZlf8d3Mo"
upstream_id
"resp_0c7b284167da13d40169ebe14fca588194b44b4732f67fd9a3"
total_cost
0.03609
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
681
34.65%
Dialogue to Total Word Ratio
Ratio: 34.60%, Deviation: 4.60%
neededClean
false
wordsTotal
682
wordsDialogue
236
17.3247%