NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-06-01-01KT1HGPSEB07NQ6M811V1TFNN
Run Details
Model:
Claude Opus 4.8 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
20.6s
Cost:
$0.029440
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
97.63%
Matches word count
(details)
2.17%
Dialogue to Total Word Ratio
(details)
49.8990%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly slammed the blueprints onto the coffee table, knocking over a half-eaten bowl of cereal. "There," she said. "That's our target. Tyler Henderson's seventh birthday party." Bill leaned forward, squinting. "These are crayon drawings." "They're reconnaissance, Bill. I had my nephew sketch the layout." "Your nephew drew a dinosaur eating the sun." "Focus on the bouncy castle. That's our entry point." Bill rubbed his temples. "Molly, we used to rob banks. Armored trucks. We once stole a yacht." "And now we're stealing presents from a child. Try to keep up." She tapped the paper. "Twelve to fifteen gifts, minimum. Probably more. Kids these days are spoiled rotten." "This feels like a low point in our careers." "It's a pivot, Bill. The market's changed. Banks have lasers now. You know what children's parties have?" "Joy?" "Clowns. And clowns, my friend, are highly distractible." She pulled out a second drawing. "We go in as entertainers. You'll be the magician." Bill stared at her. "I don't know any magic." "You know how to make wallets disappear." "That's pickpocketing." "Same energy. Now, I'll handle the cake distraction at fourteen hundred hours—" "You mean two o'clock?" "Don't second-guess me, Bill." Molly stood and began pacing. "When they bring out the cake, every adult in that room locks eyes on it like it's the Mona Lisa. That's our window. You sweep the present pile, I cover the exits." Bill held up a finger. "Question. What if a kid sees us?" "Children are unreliable witnesses. Nobody believes a seven-year-old. We could rob them in broad daylight and they'd blame the dog." "That's surprisingly dark." "It's strategy." She grabbed a balloon from the couch and twisted it aggressively. "We'll need disguises. I'm thinking face paint. You're allergic to face paint." "I am, actually. How did you—" "I read your file." "I have a file?" "Everyone has a file." She tossed him the mangled balloon. "Practice your animals. A magician who can't make a balloon dog is a dead giveaway." Bill examined the lumpy creation. "What is this supposed to be?" "A swan." "It looks like a tumor." "Then make it a tumor-themed party, Bill, I don't care!" She took a deep breath. "Look. One score. Twelve presents, we fence them online, we're set for months. Action figures hold their value." "You've thought about this way too much." "I've thought about it the correct amount." Molly sat back down. "Now. The piñata. That's the wildcard. When it bursts, there's a candy stampede. Pure chaos. We could use that as backup cover." Bill sighed and set the balloon down. "And if it all goes wrong?" "Then we grab what we can and run. Toddlers are slow. Their parents are sleep-deprived. We have the advantage." "I genuinely hate that this makes sense." "Welcome to the new economy, partner." She extended her hand. "You in?" Bill stared at her hand, then at the crayon dinosaur eating the sun. "...Do they at least have cake for us?" "Bill. There's always cake." He shook her hand. "Fine. Let's rob some children."
created_at
"2026-06-01T12:11:12.051Z"
6/1/2026, 12:11:12 PM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1139
moderation_latency
(null)
generation_time
20523
tokens_prompt
77
tokens_completion
826
native_tokens_prompt
93
native_tokens_completion
1159
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.02944
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
id
"msg_01CKfxcQNtK8uXcJkpC3fWig"
is_byok
false
latency
1139
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780315872-MjSJNkorHHExaLYctHpM"
session_id
(null)
preset_id
(null)
api_type
"completions"
id
"gen-1780315872-YDZkbhBiETM40KTPrR4M"
upstream_id
"msg_01CKfxcQNtK8uXcJkpC3fWig"
total_cost
0.02944
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
97.63%
Matches word count
n/a
neededClean
false
words
507
2.17%
Dialogue to Total Word Ratio
Ratio: 78.75%, Deviation: 8.75%
neededClean
false
wordsTotal
513
wordsDialogue
404
49.8990%