NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0NVEMA4B4P687KKA5S7B5R
Run Details
Model:
GPT-5.5 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
56.7s
Cost:
$0.118470
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
99.93%
Dialogue to Total Word Ratio
(details)
99.9651%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly whispered, “Bill, the birthday fortress awaits.” “Please stop calling Mrs. Puddleton’s lawn a fortress.” “It has bunting. Bunting is medieval.” “It has toddlers.” “Same thing, with juice boxes.” “Remind me why we’re stealing presents from children?” “Because rich Uncle Nigel wrapped something diamond-shaped, and because you lost our rent betting on a ferret named Mortgage.” “He looked swift.” “He looked asleep.” “Fine. What’s our brilliant plan?” “We stroll in as clowns.” “You hate clowns.” “I hate competition.” “I’m dressed as a pirate.” “Pirates are sea clowns.” “The dog is staring.” “Offer him your wooden leg.” “It’s my real leg.” “Then negotiate.” A balloon popped. “Fearless, aren’t you?” “I’m allergic to surprises.” A child opened the gate. “Yes, darling. We are the emergency laughter.” “And definitely not moral complexity.” The child handed Bill a cone hat. “Great. We need judges for musical chairs.” “Bill, new objective.” “The presents?” “No. Victory.” “Against children?” “Especially children.” “Then cake?” “Then cake. No presents.” “Can I keep the hat?” “Only if you win.” “Molly, if I beat a seven-year-old, is that wrong?” “Bill, it’s character development.” “For whom?” “Whichever one cries less.” “Also, share the cake with the dog first.” “Save me cake, pirate.”
created_at
"2026-04-24T21:20:38.546Z"
4/24/2026, 9:20:38 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
54041
moderation_latency
(null)
generation_time
56361
tokens_prompt
77
tokens_completion
419
native_tokens_prompt
66
native_tokens_completion
3938
native_tokens_completion_images
0
native_tokens_reasoning
3624
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.11847
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_0d970b46673c089b0169ebdea6a8608196b9aea8e5c6525e7f"
is_byok
false
latency
268
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777065638-tuhchk48efcVXbOwsm0w"
session_id
(null)
api_type
"completions"
id
"gen-1777065638-aDTrlvDmxTVh8v7hOJj0"
upstream_id
"resp_0d970b46673c089b0169ebdea6a8608196b9aea8e5c6525e7f"
total_cost
0.11847
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
200
99.93%
Dialogue to Total Word Ratio
Ratio: 91.63%, Deviation: 1.63%
neededClean
false
wordsTotal
203
wordsDialogue
186
99.9651%