NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-31-01KSYXSC29PAEFBGBN4PGPWYCZ
Run Details
Model:
Claude Opus 4.8 (Reasoning, High)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
20.5s
Cost:
$0.029415
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
0.49%
Dialogue to Total Word Ratio
(details)
50.2434%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Bill spread the crayon-drawn map across the table, smoothing its crinkled edges with reverence. "Okay, so the perimeter is guarded by what I'm calling 'The Mom Squad,'" he said. "Six of them, minimum. Armed with juice boxes and disappointment." "Disappointment isn't a weapon, Bill." "You've never met my mother." Molly leaned over the map, squinting at the stick figures. "And what's this red blob?" "That's the bouncy castle. Total blind spot. We go in through there, nobody sees us." "We're going to infiltrate a child's birthday party through a bouncy castle." "It's foolproof. Adults can't get in those things without looking ridiculous." "We're adults." "Exactly. We'll blend in by looking ridiculous. It's called camouflage." Molly pinched the bridge of her nose. "Walk me through the presents." "The present table is here, by the cake." Bill tapped a smudge. "Goldmine. We're talking action figures, gift cards, possibly an entire LEGO set still in shrink wrap. That stuff holds value, Molly. Mint condition LEGO is basically currency." "You want to fence used birthday toys." "Not used. Unopened. There's a difference and that difference is profit." She sighed. "What's the timeline?" "Cake comes out at three. That's when they're all distracted, singing that song, taking photos. We've got exactly ninety seconds while they're focused on the candles." "And how do we carry everything out?" Bill grinned and reached under the table, producing an enormous unicorn piñata. "We gut this beauty, fill it with loot, and walk right out the front door. Nobody questions a man carrying a piñata. It's invisible. It's beautiful." "That's actually not terrible." "I have moments." "Rare moments." Molly studied the map again. "What about the kids? Twenty children is a lot of witnesses." "Witnesses?" Bill scoffed. "They're four. Their testimony is worthless. Half of them still believe a fat man delivers gifts via chimney. No jury's convicting us on that." "And if a kid grabs us?" "Then we improvise. We pretend to be entertainers. You can do balloon animals, right?" "I can make a balloon that looks vaguely like a snake." "Perfect. A snake is an animal. We're entertainers now." Molly crossed her arms. "There's a flaw in this plan." "There's no flaw." "The flaw is that we are two grown adults stealing presents from a toddler." Bill paused, genuinely considering this. "Okay, when you say it out loud, sure, it sounds bad." "It sounds bad because it is bad." "Morally, yes. Financially? Pristine." He rolled up the map. "Look, you're either in or you're out, but I'm telling you, this is the easiest score of our careers. No vaults. No lasers. No guards with actual guns. Just frosting and weak supervision." Molly was quiet for a long moment. "The LEGO's really unopened?" "Shrink wrapped, baby." She grabbed her coat. "Fine. But I'm not going in the bouncy castle." "You absolutely are going in the bouncy castle." "Bill." "It's load-bearing to the whole operation, Molly!" She was already walking out the door, and Bill, clutching his unicorn, hurried after her.
created_at
"2026-05-31T11:47:55.606Z"
5/31/2026, 11:47:55 AM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1037
moderation_latency
(null)
generation_time
20426
tokens_prompt
77
tokens_completion
838
native_tokens_prompt
93
native_tokens_completion
1158
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.029415
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
id
"msg_01PN6nLdxr4maTtaqSTbEZEr"
is_byok
false
latency
1037
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780228075-nv0lHMeyAiBpNHdyYLwj"
session_id
(null)
preset_id
(null)
api_type
"completions"
id
"gen-1780228075-Vo1IphJSfkwOSmWhRQoA"
upstream_id
"msg_01PN6nLdxr4maTtaqSTbEZEr"
total_cost
0.029415
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
501
0.49%
Dialogue to Total Word Ratio
Ratio: 80.32%, Deviation: 10.32%
neededClean
false
wordsTotal
503
wordsDialogue
404
50.2434%