NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0MVHPJRNEN2SAD36RNF5YB
Run Details
Model:
GPT-5.5
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
9.5s
Cost:
$0.009480
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
98.71%
Matches word count
(details)
36.79%
Dialogue to Total Word Ratio
(details)
67.7502%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Bill, stop drawing the floor plan in frosting.” “It’s not frosting, Molly. It’s tactical icing.” “It’s a napkin from the bakery.” “Exactly. Perfect camouflage.” “We are not calling it Operation Cupcake Thunder.” “Fine. Operation Responsible Redistribution of Wrapped Goods.” “Bill.” “What?” “It’s a children’s party.” “So?” “So we need subtlety. Balloons, face paint, maybe a fake pony.” “I can be the pony.” “You cannot be the pony. You panic near apples.” “That was one time, and the apple looked judgmental.” “Listen. We enter as entertainers.” “I’ll juggle.” “You dropped three oranges into a soup tureen yesterday.” “The soup applauded.” “Molly, what about presents?” “We locate the gift table, create a distraction, and roll everything out in the magician’s trunk.” “Do we have a magician’s trunk?” “No, we have your laundry hamper.” “It has wheels.” “It smells like betrayal.” “Children won’t notice.” “Children notice everything. One of them once asked if my eyebrows were cousins.” “Harsh crowd.” “Exactly. We need charm.” “I’ll make balloon animals.” “You only know balloon snakes.” “Snakes are animals.” “Make them festive snakes.” “Can they wear tiny hats?” “Perfect. While they admire hats, I grab the parcels.” “And if the parents intervene?” “We smile.” “That’s your whole plan?” “No, Bill. Then we run.”
created_at
"2026-04-24T21:03:13.128Z"
4/24/2026, 9:03:13 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1091
moderation_latency
(null)
generation_time
9321
tokens_prompt
77
tokens_completion
460
native_tokens_prompt
66
native_tokens_completion
305
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00948
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_0c052d58c1df38c70169ebda913ee48193b692f8524003dc02"
is_byok
false
latency
392
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777064593-1lp9M8usv2oj0SLWi25X"
session_id
(null)
api_type
"completions"
id
"gen-1777064593-f6gEi2xN6QLR0Aa1Zby3"
upstream_id
"resp_0c052d58c1df38c70169ebda913ee48193b692f8524003dc02"
total_cost
0.00948
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
98.71%
Matches word count
n/a
neededClean
false
words
206
36.79%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
206
wordsDialogue
206
67.7502%