NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0NJEB0V0P7AKJGHFFS8XA9
Run Details
Model:
GPT-5.5 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
24.3s
Cost:
$0.032130
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.03%
Matches word count
(details)
93.66%
Dialogue to Total Word Ratio
(details)
46.8456%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly unfolded the crayon-scribbled map across the dashboard of the van. A smear of jam marked the community hall, and Bill, wearing a party hat he claimed was “reconnaissance,” squinted at it like a general before battle. “Right,” Molly said. “We go in during the cake song.” “You mean ‘Happy Birthday’?” “I don’t use brand names on jobs.” “It’s not a brand name, Molly. It’s a song.” “It’s a trap, Bill. Everyone’s distracted, everyone’s emotional, and Aunties are filming vertically. Perfect window.” Bill nodded gravely. “And the presents?” “Stacked by the inflatable unicorn, according to our source.” “Our source is a six-year-old with frosting in his ear.” “Reliable. Greedy, but reliable.” Bill adjusted his party hat. “I still think this is low, even for us.” “Low? Bill, last week you stole a wishing well.” “It was unattended.” “It was bolted to a courtyard.” “It wished to be free.” Molly tapped the map. “Focus. We enter as entertainers.” “What kind of entertainers?” “Magicians.” Bill looked down at his costume: black cape, fake mustache, trainers with squeaky soles. “Can you do magic?” “I can make rent disappear.” “That’s not a trick, that’s bookkeeping.” “You’ll be my assistant.” “I don’t fit in a box.” “You don’t have to fit in a box. You have to look mysterious and say ‘ta-da’ whenever something goes wrong.” “So constantly.” “Exactly.” Through the windscreen, children shrieked inside the hall. A balloon bounced against the window, followed by a small face pressing its nose to the glass. Bill waved. The child waved back, then vanished. Bill swallowed. “They look… cheerful.” “That’s because they haven’t seen the gift table vanish yet.” “Do we have to take all the presents?” “That’s the job.” “Who hired us again?” Molly checked her notebook. “Anonymous client. Paid in advance. Left a note that said, ‘Ruin Tyler’s party.’” Bill frowned. “Molly.” “What?” “Tyler is six.” “Age is no defense against having too many presents.” “What if the anonymous client is another child?” “Then we’re encouraging enterprise.” “What if it’s Tyler’s jealous cousin?” “Then we’re supporting family drama.” “What if it’s Tyler’s mum trying to avoid assembling noisy toys?” Molly paused. “That one I respect.” Bill leaned closer to the window. “There’s a tiny pirate in there sharing crisps with a dinosaur.” “Sentimental observation logged and ignored.” “The dinosaur just gave the pirate the bigger crisp.” “Bill.” “I’m saying perhaps we don’t rob children.” “We’re thieves.” “We could rob adults.” “Adults lock things.” “Children cry, Molly.” Molly looked at the hall. A chorus of small voices began the birthday song, ragged and joyful. Bill whispered, “That’s our window.” Molly didn’t move. “Molly?” “I’m thinking.” “That’s new.” “I’m thinking maybe we go in, do three terrible magic tricks, accidentally spill juice on the anonymous client if we identify them, and leave the presents alone.” Bill brightened. “A moral pivot?” “A temporary reputational experiment.” “And the fee?” “We keep it.” “Is that ethical?” “We’re not saints, Bill.” He grinned and opened the van door. “Ta-da?” Molly grabbed a bouquet of balloon animals from the back seat. One looked like a sausage having a crisis. “Ta-da,” she said. “And if anyone asks, we’re here to steal the show.”
created_at
"2026-04-24T21:15:43.335Z"
4/24/2026, 9:15:43 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
7308
moderation_latency
(null)
generation_time
24259
tokens_prompt
77
tokens_completion
1065
native_tokens_prompt
66
native_tokens_completion
1060
native_tokens_completion_images
0
native_tokens_reasoning
286
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03213
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_0abbedc492af00df0169ebdd7f6dc48197b3d5c8853be15dc8"
is_byok
false
latency
272
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777065343-36PBZA1TBtXw3GCU2i1g"
session_id
(null)
api_type
"completions"
id
"gen-1777065343-LBkN1VHqpZddc2raeAzJ"
upstream_id
"resp_0abbedc492af00df0169ebdd7f6dc48197b3d5c8853be15dc8"
total_cost
0.03213
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.03%
Matches word count
n/a
neededClean
false
words
530
93.66%
Dialogue to Total Word Ratio
Ratio: 68.86%, Deviation: 1.14%
neededClean
false
wordsTotal
533
wordsDialogue
367
46.8456%