NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0MV8SYNZEJGC39Q4KGXMBS
Run Details
Model:
GPT-5.5 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
22.6s
Cost:
$0.042630
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
96.46%
Dialogue to Total Word Ratio
(details)
98.2311%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly pressed her ear to the nursery window. “Bill, report.” “Hostiles: twelve sugar-powered goblins, one clown, one cake shaped like a dinosaur, and a mountain of presents.” “Excellent. We wait until they sing.” “Why?” “Because everyone closes their eyes on the high note.” “That’s birthdays, not operas.” “Bill, confidence is ninety percent of crime.” “And the other ten?” “Snacks. Did you bring the distraction?” He lifted a balloon animal. “Behold, Gerald the Suspicious Giraffe.” “That looks like a worm with knees.” “It’s avant-garde.” “Molly, are we truly stealing from children?” “Not stealing. Temporarily relocating gifts to a surprise inspection facility.” “That sounds like my garage.” “Exactly.” “What if one contains socks?” “We leave a receipt of sympathy.” “What if one contains a puppy?” “We promote it to lookout.” “What if the clown sees us?” “We tell him we’re rival clowns.” “I don’t own clown shoes.” “You own those yellow loafers.” “Cruel but accurate.” Inside, the children cheered. “That’s the song,” Molly whispered. “Masks on.” Bill tugged down a rabbit mask. “Wait. If they open the presents, no burglary.” Molly sighed. “Fine. Plan B: steal cake.” “Finally, a victimless crime.” “And balloons,” Bill said. “They’ll float home unsupervised, right into custody.”
created_at
"2026-04-24T21:03:04.006Z"
4/24/2026, 9:03:04 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
21266
moderation_latency
(null)
generation_time
21561
tokens_prompt
77
tokens_completion
892
native_tokens_prompt
66
native_tokens_completion
1410
native_tokens_completion_images
0
native_tokens_reasoning
1102
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.04263
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_0ed2fe1faac675b10169ebda8817948197be9951eaceed2ce5"
is_byok
false
latency
227
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777064584-jQqRYUnV1IXMmnwajuWo"
session_id
(null)
api_type
"completions"
id
"gen-1777064584-Yrm34TjdntbHzesVuiDb"
upstream_id
"resp_0ed2fe1faac675b10169ebda8817948197be9951eaceed2ce5"
total_cost
0.04263
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
200
96.46%
Dialogue to Total Word Ratio
Ratio: 85.64%, Deviation: 4.36%
neededClean
false
wordsTotal
202
wordsDialogue
173
98.2311%