NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-31-01KSYXHY7B8648FKD3JTAW68HT
Run Details
Model:
Claude Opus 4.8 (Reasoning, High)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
21.4s
Cost:
$0.027990
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
99.98%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.9920%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The blueprint was, in fact, a child's drawing of a house with a smiling sun in the corner. Molly smoothed it flat against the dashboard and tapped it with authority. "This," she announced, "is the Henderson residence." Bill squinted. "That's a crayon drawing. There's a dog the size of the chimney." "Reconnaissance from our inside man." "Your nephew is six." "He's embedded deep within enemy territory." Molly rolled up the masterpiece and stuffed it into the glove box, where it joined seventeen parking tickets and a half-eaten granola bar she chose not to acknowledge. "The target is a Saturday afternoon birthday party. Theme: pirates. Twelve children, two exhausted parents, and a mountain of unopened presents stacked by the patio doors." Bill pulled a balaclava from his jacket and yanked it over his head. Molly stared at him. "What are you doing?" "Disguise." "It's ninety degrees and there will be toddlers. You look like you're about to rob a bank in Antarctica." She tugged it off him with a wet pop. "We blend in. We're guests." "Whose guests? We don't know these people." "Nobody knows anybody at a kid's party. Half of them are just neighbors who came for the cake." Molly handed him a wrapped box. It was suspiciously light. "Here. Your cover gift." Bill shook it gently. Something rattled. "What's inside?" "Emptiness. Like the soul of capitalism." She checked her watch. "We arrive, we mingle, we compliment someone's child's terrible juggling, and when the piñata comes out, every adult in that backyard turns into a security failure. That's our window." "The piñata is the distraction?" "The piñata is *everything*. Sugar-blinded children swarming a papier-mâché donkey. Total chaos. We load the presents into the minivan and vanish before anyone realizes the donkey gave its life in vain." They sat in silence for a moment, contemplating the operation's elegance. "One question," Bill said. "Shoot." "Why are we stealing presents from children?" Molly turned to him slowly, as though he'd asked why water was wet. "Because, Bill, the resale market for slightly-used building blocks is *criminally* underexploited. There's a man in Tucson who pays cash for action figures. No questions. No backstory. Just love and currency." "That feels morally complicated." "Everything good feels morally complicated. That's how you know it's working." She popped the car door open and stepped into the sunshine, adjusting an enormous sunhat that screamed *harmless suburban aunt.* "Now bring the gift and try to look like you enjoy children." Bill emerged clutching his hollow box, his face arranged into something between a smile and a hostage's plea. From the backyard came the sound of screaming laughter and a parent shouting about juice boxes. "And if someone asks whose side we're on?" he whispered. "Birthday boy's or the cousins'?" Molly straightened his collar with maternal precision and gave him a confident pat on the cheek. "We're on nobody's side, Bill. We're professionals." She grinned. "Now follow me, and whatever you do—do not eat the cake before we've stolen the loot."
created_at
"2026-05-31T11:43:52.055Z"
5/31/2026, 11:43:52 AM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1203
moderation_latency
(null)
generation_time
21339
tokens_prompt
77
tokens_completion
825
native_tokens_prompt
93
native_tokens_completion
1101
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.02799
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
id
"msg_01MVJzcuCggD4cKRBATpjmPq"
is_byok
false
latency
1201
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780227832-pGFyGbhK3Yx7ghrafHgK"
session_id
(null)
preset_id
(null)
api_type
"completions"
id
"gen-1780227832-YF6UumFq5fkpwCB4M8pD"
upstream_id
"msg_01MVJzcuCggD4cKRBATpjmPq"
total_cost
0.02799
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
99.98%
Matches word count
n/a
neededClean
false
words
502
0.00%
Dialogue to Total Word Ratio
Ratio: 58.70%, Deviation: 28.70%
neededClean
false
wordsTotal
506
wordsDialogue
297
49.9920%