NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0NB3QWA4Q2CTDR9ZCMCVQJ
Run Details
Model:
GPT-5.5
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
27.5s
Cost:
$0.033900
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.01%
Dialogue to Total Word Ratio
(details)
0.0044%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly unfolded the crayon-smeared birthday invitation across the café table like it was a bank blueprint. A cartoon dinosaur in a party hat smiled up at them, unaware it had become evidence. “Target arrives at two,” Molly said. “Cake at two-thirty. Presents opened at three.” Bill leaned closer, squinting. “You got all that from glitter glue?” “I got that from being a professional.” “You got that from stealing mail.” “Borrowing mail.” “From a mailbox.” “Temporarily.” Bill stirred his tea with unnecessary suspicion. “Molly, this is a six-year-old’s birthday party.” “Exactly. Soft security. Distracted adults. High-value parcels.” “High-value? It says ‘bring a wrapped gift under fifteen pounds.’” “Do you know what toys cost now?” Bill looked toward the window, where a poodle in a sweater judged them both. “I don’t like it.” “You don’t like anything involving balloons.” “Balloons are unstable.” “So are you, but I still work with you.” He tapped the invitation. “What’s the venue?” “Community center. Jungle room.” “There’s a jungle room?” “There’s a mural and a suspicious amount of rubber flooring.” Bill groaned. “Rubber flooring means children bounce.” “That’s why we move fast.” “We are not stealing from children.” Molly paused, then lowered her voice. “We’re not stealing from children. We’re redistributing unappreciated plastic clutter before it becomes landfill.” “That is the worst moral disguise you’ve ever worn.” “It has pockets.” A waiter passed. Molly covered the invitation with a napkin. Bill tried to look innocent and accidentally saluted. “Stop saluting civilians,” she whispered. “I panicked.” “You always panic.” “I’m the cautious one.” “You once hid in a laundry basket for forty minutes because a cat made eye contact.” “It was a large cat.” Molly slid a second paper across the table: a hand-drawn floor plan with arrows, stick figures, and one alarming sketch labeled Cake Zone. “Entry through side door,” she said. “We pose as entertainers.” Bill stared. “Entertainers?” “You can juggle.” “I can drop three oranges in sequence.” “Children love slapstick.” “They love snacks and screaming.” “Then you’ll fit in.” He rubbed his forehead. “What are you going as?” “Magician.” “You know magic?” “I know misdirection.” “That’s just lying with jazz hands.” “Precisely.” Outside, rain began tapping the glass. Inside, the espresso machine hissed like it disapproved. Molly pointed to the plan. “Parents cluster here. Children swarm here. Presents sit here, near the dinosaur banner.” Bill studied the tiny rectangle labeled Loot Mountain. “And the exit?” “Same side door. We wheel out the presents in a prop trunk.” “Where do we get a prop trunk?” “You’re sitting on it.” Bill froze, then glanced under the table. The large floral suitcase between their feet looked less like luggage now and more like a criminal confession. “You brought equipment to brunch?” “This isn’t brunch. This is reconnaissance with muffins.” “You had two muffins.” “Research is hungry work.” Bill folded his arms. “I have objections.” “List them alphabetically.” “A: children. B: party games. C: possible clown.” “There is no clown.” “How do you know?” “Budget party.” “That doesn’t eliminate an uncle with face paint.” Molly considered this. “Acceptable risk.” “No uncle in face paint is acceptable.” She leaned in, eyes bright with the dangerous confidence that had once gotten them trapped inside a garden shed overnight. “Listen. We go in, perform ten minutes, create a distraction, load the presents, leave. Clean, quick, elegant.” “Nothing we do is elegant.” “Our intentions are elegant.” “Our intentions are felonies.” “Alleged felonies.” Bill sighed so deeply his teabag trembled. “What if the birthday child cries?” Molly’s expression softened for half a second. “We leave one present.” “One?” “The biggest-looking one.” “That’s probably socks from a grandparent.” “Then he learns resilience.” “Molly.” “What?” “This is low, even for us.” She sat back. The invitation glittered faintly beneath the napkin. Around them, respectable people ate toast without conspiring against kindergarteners. Finally, Molly said, “Fine. We don’t steal the presents.” Bill blinked. “Really?” “We steal the magician’s fee.” “There is no magician.” “There will be after we invoice them.” Bill stared, then slowly smiled. “So we perform badly and get paid?” “Professionally badly.” “And the children keep the presents?” “Every noisy, battery-hungry one.” He lifted his teacup. “To crime with boundaries.” Molly clinked her mug against it. “To boundaries with invoices.” At the next table, the poodle barked once through the window. Bill nodded solemnly. “Even the dog approves.” “The dog knows talent.” “The dog knows we’re banned from three malls.” “Temporary bans.” “From 2019.” Molly packed away the floor plan. “Come on. We need capes.” Bill stood, hauling the floral suitcase. “If there’s a clown, I’m resigning.” “If there’s a clown, we rob him first.”
created_at
"2026-04-24T21:11:43.107Z"
4/24/2026, 9:11:43 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
756
moderation_latency
(null)
generation_time
27384
tokens_prompt
77
tokens_completion
1581
native_tokens_prompt
66
native_tokens_completion
1119
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0339
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_0c53c52625a186f30169ebdc8f346481938caa1a6eed99db4e"
is_byok
false
latency
293
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777065103-5SujcojfnLw1iYVlMuiK"
session_id
(null)
api_type
"completions"
id
"gen-1777065103-B52u4qM8S6oqEStY90j4"
upstream_id
"resp_0c53c52625a186f30169ebdc8f346481938caa1a6eed99db4e"
total_cost
0.0339
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
771
0.01%
Dialogue to Total Word Ratio
Ratio: 63.67%, Deviation: 13.67%
neededClean
false
wordsTotal
779
wordsDialogue
496
0.0044%