NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0PXRQ0K61EQW52JJ7Z66A8
Run Details
Model:
GPT-5.5 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
30.9s
Cost:
$0.035790
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.56%
Dialogue to Total Word Ratio
(details)
0.2797%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the crayon-drawn map across the diner table and pinned down one curling corner with a saltshaker shaped like a chicken. “Behold,” she said. “The mansion.” Bill leaned over it. “That is a bouncy castle.” “It’s a party fortress, Bill. Show some respect.” “You’ve drawn a moat.” “That’s the paddling pool.” “And what’s this skull?” “That’s Grandma Enid.” Bill sat back. “We’re not getting past Grandma Enid.” “We are professionals.” “We once got locked inside a revolving door.” “Temporary setback.” “For forty minutes.” “Rotational sabotage,” Molly said. “Now focus. At exactly two o’clock, the children will be distracted by cake.” Bill shuddered. “Children hopped up on icing. I don’t like it. They move in packs.” “That’s why we strike during the song.” “Which song?” “The birthday song, Bill. The ancient chant of the distracted.” He tapped the map. “And the presents are here?” “Under the table with the dinosaur cloth.” “Why a dinosaur cloth?” “Because Alfie is six and has taste.” Bill frowned. “We’re stealing from a six-year-old named Alfie?” “We’re liberating merchandise from an unsecured gifting zone.” “That sounds worse.” “It sounds professional.” “It sounds like something you say before a judge.” Molly snatched up her coffee. “Do you want the velvet train set or not?” Bill’s eyes softened. “There’s a velvet train set?” “With brass buttons.” “Why does a train need buttons?” “It’s educational.” He stared at the map again. “What’s our cover?” “Entertainers.” “No.” “Yes.” “Last time you made me be a clown, a toddler called me ‘sad egg’ and I still hear it in traffic.” “This time you’ll be a pirate.” Bill considered. “Hat?” “Hat, sash, foam sword.” “Parrot?” “Budget cuts.” “No parrot, no pirate.” “You can have a balloon.” “A balloon is not a parrot.” “It can be, if you believe.” Bill narrowed his eyes. “What are you?” “Fairy godmother.” “Of course you are.” “I have wings.” “You have a tea towel with glitter.” “Wings,” Molly said firmly. A waitress passed, glanced at the map, and kept walking faster. Bill lowered his voice. “What about the parents?” “Subdued by small talk.” “Dangerous.” “Not if we ask about school catchment areas.” He nodded. “That’ll pin them for twenty minutes.” “Exactly. Then you perform your pirate jig.” “I don’t have a pirate jig.” “You have knees and shame. Improvise.” “And while I’m degrading myself?” “I slip behind the dinosaur table, gather the presents, and load them into the laundry basket.” “Why a laundry basket?” “No one questions laundry.” “At a children’s party?” “Especially at a children’s party. Children are sticky. Laundry is inevitable.” Bill chewed his straw. “What if Alfie sees us?” “We tell him we’re taking the presents to be inspected by Santa.” “It’s April.” “Santa has a fiscal year.” Bill sighed. “Molly, this is low, even for us.” She looked offended. “Low? This is precision work.” “It’s a kid’s birthday.” “It’s a kid with a velvet train set.” “He probably believes in goodness.” “Then this will be character building.” Bill folded his arms. “I’m not stealing all the presents.” Molly blinked. “You’re developing morals now?” “I’m developing limits.” “You stole a vicar’s bicycle.” “It was in a raffle.” “You didn’t buy a ticket.” “I left a muffin.” Molly drummed her fingers on the table. “Fine. We steal only the boring presents.” “What counts as boring?” “Socks. Educational puzzles. Anything from an aunt who says ‘practical.’” “And the velvet train set?” Bill’s face tightened with longing. Molly smiled. “We borrow it.” “No.” “For research.” “No.” “For one lap around the kitchen.” Bill looked down at the map, then up at her. “We go in, we entertain, we eat cake, we leave the presents alone.” “That’s not a heist.” “It’s a party.” Molly slumped. “We’ve fallen so far.” Bill patted her hand. “Cheer up. We can still steal all the balloons.” “Bill.” “What?” “That’s ridiculous.” He grinned. “Exactly. No one will see it coming.”
created_at
"2026-04-24T21:39:22.983Z"
4/24/2026, 9:39:22 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
5551
moderation_latency
(null)
generation_time
30809
tokens_prompt
77
tokens_completion
1344
native_tokens_prompt
66
native_tokens_completion
1182
native_tokens_completion_images
0
native_tokens_reasoning
223
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03579
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_0f2adeda9ae626960169ebe30b194c8197a12cd28ec380d543"
is_byok
false
latency
204
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777066763-Ra8IcVERxJbGUwwJTGir"
session_id
(null)
api_type
"completions"
id
"gen-1777066763-77my4U75WQzEycmyDnvx"
upstream_id
"resp_0f2adeda9ae626960169ebe30b194c8197a12cd28ec380d543"
total_cost
0.03579
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
648
0.56%
Dialogue to Total Word Ratio
Ratio: 80.18%, Deviation: 10.18%
neededClean
false
wordsTotal
651
wordsDialogue
522
0.2797%