NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0PK246TWPDYEY3BGQGGWK5
Run Details
Model:
GPT-5.5 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
18.6s
Cost:
$0.039540
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
20.19%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
60.0942%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly unfolded the invitation like a map to buried treasure, except it smelled of frosting and had a dinosaur wearing a party hat on the front. Bill peered over her shoulder, crumbs from his emergency biscuit sprinkling onto the carpet. The target was young Percival Plunkett’s seventh birthday, noon, community hall, theme: pirates. Fate had gift-wrapped their opportunity in tissue paper and supervised it with balloon animals. "Molly, we stroll in as entertainers, juggle badly, then leave with loot." "Bill, you can't juggle." "Exactly. Authentic children's entertainment." They sat in the van, which was painted with the words Definitely Not Suspicious Plumbing, watching parents carry parcels inside. One box was tall enough to contain a pony or, more likely, educational disappointment. Molly adjusted her fake mustache; Bill practiced saying “ahoy” with the confidence of a man who had once been chased by a swan and called it maritime experience. Their equipment was simple: empty sacks, squeaky shoes, and a moral compass Molly had won in a Christmas cracker. It spun uselessly whenever pointed at cake. Across the street, children shrieked with joy, unaware that two criminal masterminds were about to be defeated by pass-the-parcel, sticky fingers, and one grandmother with handbag reflexes like a cobra. Bill gulped; Molly grinned, already hearing sirens in the kazoo chorus and applauding anyway loudly.
created_at
"2026-04-24T21:33:32.173Z"
4/24/2026, 9:33:32 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
17705
moderation_latency
(null)
generation_time
18207
tokens_prompt
77
tokens_completion
472
native_tokens_prompt
66
native_tokens_completion
1307
native_tokens_completion_images
0
native_tokens_reasoning
1008
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03954
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_02a3dff765ee00690169ebe1ac46888193b14394352ec67a62"
is_byok
false
latency
273
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777066412-kYfcnBUFjA93ZH9Xjfiv"
session_id
(null)
api_type
"completions"
id
"gen-1777066412-NYOi0RKqwNQUzae9FSJr"
upstream_id
"resp_02a3dff765ee00690169ebe1ac46888193b14394352ec67a62"
total_cost
0.03954
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
20.19%
Matches word count
n/a
neededClean
false
words
220
100.00%
Dialogue to Total Word Ratio
Ratio: 9.42%, Deviation: 0.58%
neededClean
false
wordsTotal
223
wordsDialogue
21
60.0942%