NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0NTTJQSEEJ0B787F5BHEYH
Run Details
Model:
GPT-5.5 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
19.5s
Cost:
$0.039810
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
100.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge opposite the bouncy castle, wearing a penguin costume two sizes too small and the expression of a tax auditor at a cupcake buffet. Bill unfolded their map, which was actually a napkin smeared with frosting and one alarming drawing of a dragon. "Remember, we enter as magicians, distract toddlers, grab gifts, vanish before cake." Molly adjusted her flippers. "You're Professor Pudding; I'm the emotional support penguin." Across the lawn, children shrieked around a piƱata shaped like a unicorn with legal representation. The present table glittered beneath balloons, a mountain of ribbons, dinosaurs, dolls, and one suspiciously adult-sized box humming softly. Bill tapped the napkin. Phase One: produce rabbit. Phase Two: rabbit refuses. Phase Three: improvisation. Molly had prepared smoke pellets, lock picks, and apologetic stickers reading SORRY ABOUT THE FELONY. Bill had brought a wand, three raisins, and confidence, which Molly considered the most dangerous tool. A grandmother glanced their way. Molly froze, then waddled magnificently, waving like royalty. The grandmother applauded. Bill whispered that the plan was working. Then the adult-sized box sneezed. From inside came a small voice demanding union rates, hazard pay, and first choice of any present containing batteries today immediately.
created_at
"2026-04-24T21:20:18.016Z"
4/24/2026, 9:20:18 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
17141
moderation_latency
(null)
generation_time
19315
tokens_prompt
77
tokens_completion
350
native_tokens_prompt
66
native_tokens_completion
1316
native_tokens_completion_images
0
native_tokens_reasoning
1034
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03981
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_08f7bef11b01b6970169ebde922370819587262937579b7c93"
is_byok
false
latency
356
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777065618-vp68EljRQSgq3Mp9eR4b"
session_id
(null)
api_type
"completions"
id
"gen-1777065618-kJuICy4jPqrFVfAssOd4"
upstream_id
"resp_08f7bef11b01b6970169ebde922370819587262937579b7c93"
total_cost
0.03981
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
200
100.00%
Dialogue to Total Word Ratio
Ratio: 9.90%, Deviation: 0.10%
neededClean
false
wordsTotal
202
wordsDialogue
20
100.0000%