NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0PZ4X0F068TGMTKSV1R5KH
Run Details
Model:
GPT-5.5 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
27.0s
Cost:
$0.032760
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
11.92%
Dialogue to Total Word Ratio
(details)
5.9618%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the birthday invitation across the café table as if it were a map to buried treasure. It was pink, glittery, and shaped like a dinosaur wearing a party hat. Bill leaned over it with the solemn expression of a man reading a royal decree, though a smear of jam on his chin undermined the effect. “At two o’clock,” Molly said, tapping the card, “the presents will be assembled on the dining-room table.” Bill squinted. “How do you know they won’t be in the living room?” “Because it says ‘presents in the dining room, cake in the kitchen, bouncy castle in the garden.’ Honestly, Bill, criminals used to have to work harder.” He nodded, impressed. “Modern children are very organized.” Outside, rain tapped the window. Inside, Mrs. Puddleby’s Tea Nook smelled of scones and suspicion. Molly had chosen the corner booth because it had a view of the street and because Bill had once got trapped in the revolving door of a library while discussing a job. “We go in disguised,” Molly continued. “As what?” “Entertainers.” Bill sat up. “I can juggle.” “You can drop three oranges in sequence.” “That’s the foundation of juggling.” Molly ignored him and drew a little stick figure on the napkin. It had a cape, a hat, and a frown. “I’ll be Madam Molotov, mysterious magician. You’ll be Professor Billiam, balloon artist.” Bill’s face drained of color. “Balloons are squeaky.” “So are prison doors, probably.” He considered this. “Fair point.” Molly’s plan was simple in the way that a chandelier was simple if you ignored the wiring. They would arrive during the chaos, smile at adults, confuse children, and locate the mountain of parcels. The birthday girl, one Penelope Phipps, aged six, had apparently invited twenty-eight classmates, three cousins, a pony named Biscuit, and a boy whose mother insisted he was “spirited” because “tiny hurricane” was not socially acceptable. “The key,” Molly said, “is blending in.” Bill looked down at his black turtleneck, black gloves, and black wool cap. “I look festive.” “You look like a burglar.” “I am a burglar.” “Not at children’s parties. At children’s parties you are whimsical.” Bill sighed. “Whimsy itches.” The waitress passed, gave them a hard stare, and refilled Molly’s teacup with the air of someone gathering evidence. Molly smiled sweetly. Bill hid the invitation under his saucer, which immediately tipped over and flooded the table. “Smooth,” Molly said. “I was improvising.” “You were baptizing a dinosaur.” Bill mopped glitter and tea with his sleeve. “What happens after we get the presents?” Molly lowered her voice. “We take them to the van.” “The van with ‘Bill’s Honest Carpets’ on the side?” “Yes.” “We don’t sell carpets.” “That’s why nobody asks questions. If it said ‘Bill’s Suspicious Present Van,’ people might pry.” A small child at the next table turned around, stared at them, and narrowed his eyes. He wore a paper crown and had the calm menace of a tiny king considering taxation. Molly smiled. “Hello, darling.” The child pointed at Bill. “He’s got jam on his crime hat.” Bill clutched his cap. “It’s not a crime hat.” “It is now,” the child said, and returned to his bun. Molly folded the damp invitation. For the first time, doubt crept across her face like a cautious cat. “Children notice everything.” Bill shuddered. “They’re short detectives.” “And there’ll be twenty-eight of them.” “Plus Biscuit.” They sat in silence, listening to the rain and the soft clink of teacups. Then Bill brightened. “What if,” he said, “we steal the cake instead?” Molly gave him a look. “Fine,” he said. “What if we attend politely, perform badly, and leave with party bags?” Molly considered the glitter on her fingers, the accusing child, and the fact that Professor Billiam would almost certainly be defeated by a balloon poodle. At last, she tucked the invitation into her coat. “Bill,” she said, “sometimes the greatest heist is knowing when not to heist.” He nodded wisely. “And sometimes it’s getting free sweets.” “Exactly.” They rose, paid for the tea, and stepped into the rain, already arguing over whether whimsy required a cape.
created_at
"2026-04-24T21:40:08.24Z"
4/24/2026, 9:40:08 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4966
moderation_latency
(null)
generation_time
26449
tokens_prompt
77
tokens_completion
1209
native_tokens_prompt
66
native_tokens_completion
1081
native_tokens_completion_images
0
native_tokens_reasoning
123
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03276
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_0388c1e63dc72ce10169ebe338d0908196af1173c52b74c7b4"
is_byok
false
latency
966
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777066808-QQngdoTjkGbfsZkEPvtY"
session_id
(null)
api_type
"completions"
id
"gen-1777066808-9XNPGHghx6rpTO2ZT82z"
upstream_id
"resp_0388c1e63dc72ce10169ebe338d0908196af1173c52b74c7b4"
total_cost
0.03276
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
687
11.92%
Dialogue to Total Word Ratio
Ratio: 36.52%, Deviation: 6.52%
neededClean
false
wordsTotal
690
wordsDialogue
252
5.9618%