NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0PQFN05W37ETQ6ZSMJ5KRG
Run Details
Model:
GPT-5.5 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
25.4s
Cost:
$0.034920
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
1.07%
Dialogue to Total Word Ratio
(details)
0.5363%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the invitation across the café table like it was a bank blueprint. Glitter clung to her gloves. Bill leaned over it, frowning at a cartoon dinosaur wearing a party hat. “Target is four o’clock,” Molly said. “Community hall. Birthday boy: Sebastian. Age six. Known associates: twenty-three sticky children and one magician.” Bill tapped the dinosaur. “This one looks like muscle.” “That’s the theme, Bill.” “I don’t like dinosaurs. They’ve got a reputation.” Molly sighed. “Our objective is simple. We enter the party, locate the present table, and depart with the goods.” Bill looked wounded. “You make it sound easy. Children’s parties are chaos. Last time, I got trapped in a parachute game and had to pretend I was a cloud.” “That was because you panicked.” “I was being hunted by toddlers.” “They were laughing.” “That’s how toddlers hunt.” Molly took a sip of tea and grimaced. Someone had put a lemon seed in it, which she considered a personal attack. Across the room, a barista was drawing foam hearts for honest citizens. “We’ll need disguises,” she said. Bill brightened. “Clowns?” “No.” “Balloon artists?” “No.” “Two normal adults who are definitely supposed to be there?” Molly considered this. “Too suspicious.” Bill nodded gravely. “True. Nothing more suspicious than normal adults.” “We go as caterers. Aprons. Trays. Confidence.” “I can carry sausage rolls,” Bill said. “I have a gift.” “You dropped an entire wedding cake into a koi pond.” “That pond looked hungry.” Molly ignored him and pointed to a handwritten note on the invitation. “It says presents on the stage. That means elevated terrain.” Bill shuddered. “Stages. People look at you on stages.” “Only if you perform.” “I once performed by accident.” “You sneezed into a trumpet.” “It got applause.” Molly folded the invitation, then unfolded it again because the glitter had transferred to her palm and made her look festive. She hated feeling festive before a job. “Problem one,” she said. “Parents.” Bill counted on his fingers. “Mothers, fathers, grandparents, suspicious aunts, the one uncle who owns a camera and thinks he’s a journalist.” “Exactly. We need a distraction.” Bill smiled slowly. “Magician.” Molly smiled back. “Magician.” They sat in mutual professional silence, imagining a man in a cape being blamed for everything. Outside, rain ticked against the window like tiny impatient fingers. “I’ll ask him to pull a rabbit from my hat,” Bill said. “You don’t own a hat.” “I’ll steal one first.” “No preliminary crimes.” “You’ve become very strict.” “I’ve become alive and uncaught.” Bill slumped. “Fine. What’s in the presents, anyway?” “Unknown. Toys. Games. Electronics. Possibly socks from a practical grandmother.” Bill’s eyes narrowed. “We leave the socks.” “We take everything.” “Molly, there are lines.” She tucked the invitation into her coat. “There are no lines in heist work.” “There are always lines. Queues, mostly. Buffets. Face painting. The toilet after children discover orange squash.” Molly stood, gathering her bag. “Four o’clock. Don’t be late.” Bill stood too, knocking his chair with a squeak that made three people stare. He gave them a polite wave. “Question,” he said. “What if Sebastian cries?” Molly froze. Bill continued, quieter. “Because he’s six. And the presents are his. And if someone stole my birthday presents when I was six, I’d have cried so hard my ears would’ve whistled.” Molly stared at the dinosaur invitation. Its paper grin suddenly looked less like a target and more like a small person’s very big day. “We could steal from the magician,” Bill offered. “No.” “The suspicious uncle?” “No.” “The cake?” Molly’s expression softened. “We are not stealing from children.” Bill exhaled in relief. “Good. Because I already bought Sebastian a present.” “You what?” “A dinosaur hat. For cover.” Molly snatched the invitation from her pocket and headed for the door. “New plan. We attend the party, deliver the gift, eat one slice of cake each, and rob no one.” Bill hurried after her. “Can I still wear the apron?” “No.” “The dinosaur hat?” “Absolutely not.” But by four o’clock, Bill wore both.
created_at
"2026-04-24T21:35:57.097Z"
4/24/2026, 9:35:57 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4659
moderation_latency
(null)
generation_time
25354
tokens_prompt
77
tokens_completion
1276
native_tokens_prompt
66
native_tokens_completion
1153
native_tokens_completion_images
0
native_tokens_reasoning
206
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03492
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_0cea09d0e59b99520169ebe23d2bf8819682f24539ab444105"
is_byok
false
latency
248
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777066557-WidJBE2x9xo8ku4XBBpi"
session_id
(null)
api_type
"completions"
id
"gen-1777066557-RFUcxjQ4X5p6X9cs5jxq"
upstream_id
"resp_0cea09d0e59b99520169ebe23d2bf8819682f24539ab444105"
total_cost
0.03492
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
671
1.07%
Dialogue to Total Word Ratio
Ratio: 59.52%, Deviation: 9.52%
neededClean
false
wordsTotal
672
wordsDialogue
400
0.5363%