NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-06-01-01KT1J20NWSTMW262XD7HDS2G5
Run Details
Model:
Claude Opus 4.8 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
21.3s
Cost:
$0.029640
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
99.74%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.8722%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the crayon-drawn map across the table, smoothing out a corner sticky with what she hoped was jam. "Okay, here's the situation," she said. "Forty-seven children, one bouncy castle, and approximately three hundred wrapped presents." Bill leaned over the map, squinting. "Three hundred? For one kid?" "It's the Pemberton boy. His father owns half the city. The invitation said 'no gifts necessary,' which in rich-people language means 'bring something expensive.'" "And the security?" Molly tapped a red X near the kitchen. "One magician, one clown, and a woman they're calling 'Auntie Carol' who I believe is ex-military." "A clown," Bill repeated flatly. "I hate clowns." "Everyone hates clowns, Bill. That's not a personality." He crossed his arms. "It's a perfectly valid fear. Did you know clowns can run faster than they look? Those shoes are deceptive." "Focus. We go in dressed as entertainers. I'm a face-painter, you're a balloon-animal artist." Bill's face fell. "I don't know how to make balloon animals." "How hard can it be? Twist, squeak, done." "It's an art form, Molly! There are years of apprenticeship—" "You watched one video this morning, didn't you?" He looked away. "I made something. It was supposed to be a giraffe." "What did it become?" "A very long sausage. The dog one looked like a different, smaller sausage." Molly pinched the bridge of her nose. "Fine. You're a mime now. Mimes don't make anything." "Mimes pretend to be trapped in boxes." "Then you'll be excellent at it, because you're trapped in this plan with me. Here's the play—at exactly two o'clock, they cut the cake. Sugar high hits the kids around two-fifteen. That's our window." Bill perked up. "Chaos." "Beautiful, screaming chaos. While Auntie Carol's distracted breaking up a juice-box dispute, we load the presents into the van disguised as the pony from the petting zoo." "We're disguising a van as a pony?" "We're disguising the van's *purpose*. The van says 'Pony Express Party Rentals.' Nobody questions a pony van." "That's surprisingly solid." "I have my moments." She rolled up the map. "Any questions?" Bill raised a hand like a schoolchild. "What if a kid recognizes that I'm not a real mime?" "Bill, they're six. Their threat assessment is built around whether broccoli is touching their chicken nuggets. You'll be fine." "And the clown?" Molly sighed. "I'll handle the clown." "How?" "I'll offer him a job. Clowns are always one bad birthday away from a life of crime." She slung her bag over her shoulder. "Now repeat the plan back to me." Bill cleared his throat. "Two o'clock, cake. Two-fifteen, sugar meltdown. I mime silently while you paint tiny butterflies on screaming children. We load three hundred presents into a fake pony. We retire to a beach." "You added the beach." "I'm manifesting it." Molly almost smiled. "You know, for a man terrified of clowns and incapable of making a balloon dog, you've got the heart of a true professional." Bill beamed. "That was an insult, Bill." "I'm choosing to ignore that part."
created_at
"2026-06-01T12:20:39.236Z"
6/1/2026, 12:20:39 PM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1001
moderation_latency
(null)
generation_time
21241
tokens_prompt
77
tokens_completion
830
native_tokens_prompt
93
native_tokens_completion
1167
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.02964
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
id
"msg_01NSWDtkv3ewQUo3S3Z9mMK3"
is_byok
false
latency
1001
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780316439-XUDi4POcxUDVycGuQ2Gh"
session_id
(null)
preset_id
(null)
api_type
"completions"
id
"gen-1780316439-AKur7r0MOAA2IC2Mf5vm"
upstream_id
"msg_01NSWDtkv3ewQUo3S3Z9mMK3"
total_cost
0.02964
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
99.74%
Matches word count
n/a
neededClean
false
words
504
0.00%
Dialogue to Total Word Ratio
Ratio: 82.62%, Deviation: 32.62%
neededClean
false
wordsTotal
512
wordsDialogue
423
49.8722%