NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0PP8WGPDMSDG3GZ10TA1ST
Run Details
Model:
GPT-5.5 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
25.5s
Cost:
$0.051720
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
99.9997%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge, wearing a sequined burglar mask and holding a clipboard shaped like a cupcake. Across the lawn, balloons bobbed over a garden full of sugared chaos. Somewhere, frosting cannons began their terrible work. “Remember,” she whispered, “we are not monsters. We take only the presents marked ‘educational,’ because no child wants those.” Bill adjusted his fake magician’s cape. “I still think the clown entrance is risky. Clowns know things.” “The clown is my cousin.” “Exactly. He once made a priest cry with a balloon dachshund.” Molly peered through binoculars. The birthday boy was attacking a piñata with legal enthusiasm. Parents cheered, unaware of the alleged masterminds by the hydrangeas. “We glide in during cake,” Molly said. “You distract them.” “With magic?” “With juggling.” “I can juggle one orange.” “That is called holding.” Bill sighed. “Fine. What if the children ask questions?” “Say you’re from the Bureau of Present Inspection.” “There is no bureau.” “There will be, if we print badges.” A terrier spotted them and began yapping. Molly froze. Bill bowed to it. “Abort?” he asked. Molly watched a grandmother unwrap a toy drum set and smiled. “No,” she said. “We save the parents first. Now.”
created_at
"2026-04-24T21:35:17.401Z"
4/24/2026, 9:35:17 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
23259
moderation_latency
(null)
generation_time
25169
tokens_prompt
77
tokens_completion
355
native_tokens_prompt
66
native_tokens_completion
1713
native_tokens_completion_images
0
native_tokens_reasoning
1427
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.05172
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_00645a2bf679da020169ebe2157dcc819797bcca21f03e78a1"
is_byok
false
latency
333
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777066517-5qAe6NffHgBQ3jqy5W0k"
session_id
(null)
api_type
"completions"
id
"gen-1777066517-KshROkufK9oWJ0lGN7l8"
upstream_id
"resp_00645a2bf679da020169ebe2157dcc819797bcca21f03e78a1"
total_cost
0.05172
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
200
100.00%
Dialogue to Total Word Ratio
Ratio: 49.50%, Deviation: 0.50%
neededClean
false
wordsTotal
200
wordsDialogue
99
99.9997%