NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-31-01KSYXM92F85S31VGB29NCGGQ1
Run Details
Model:
Claude Opus 4.8 (Reasoning, High)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
20.6s
Cost:
$0.029215
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
99.74%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.8722%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the blueprint across the table, smoothing out the wrinkles with the reverence of a surgeon preparing for a transplant. Bill leaned over her shoulder, chewing a toothpick. "Right," Molly said, tapping the paper. "The Hendersons' backyard. Saturday. Two o'clock. Little Timmy's seventh birthday." "Why are we robbing a seven-year-old?" Bill asked. "Because seven-year-olds get the good stuff now. Tablets. Drones. Gift cards." She circled a square on the diagram. "This is the present table. Our target." Bill squinted. "That's a folding card table with a tablecloth." "It's a fortress, Bill. Guarded by parents, sugar-fueled children, and at least one clown." "I hate clowns." "Everyone hates clowns. That's why he's the most dangerous element." Molly slid a photograph across the table. "Meet Bubbles. Hired entertainment. He does balloon animals and, according to his website, 'gentle magic.'" "Gentle magic," Bill repeated. "What's our cover?" "You're the face painter." Bill straightened up. "I can't paint." "You don't have to be good. You have to be slow. Every minute you keep a kid in that chair is a minute the parents aren't watching the table." She handed him a smock. "I'll be the aunt nobody recognizes. Every family's got one. I'll mingle, compliment the potato salad, and wait for the piñata." "Why the piñata?" "Because, Bill, when that piñata bursts, chaos descends. Thirty children diving for candy. Total parental panic. That's our window. Forty-five seconds, maybe a minute, where every adult eye is on the screaming mob." She grinned. "We load the presents into the gift bags we brought, walk out the side gate, and vanish." Bill frowned. "What about the clown?" "What about him?" "He's gonna notice two adults hauling away the loot." Molly leaned back, steepling her fingers. "That's where you come in. The moment the piñata breaks, you challenge Bubbles to a balloon-animal contest." "A what?" "Engage his ego. Clowns are deeply insecure. He won't be able to resist proving he's the superior entertainer. He'll be making giraffes while we make our exit." Bill stared at her. "This is the worst plan I've ever heard." "It's flawless." "We're stealing toys from a baby." "He's seven, Bill. He's practically a teenager. And besides," she added, "think of the resale value. Untouched, brand-new, still in the boxes. We could clear three hundred easy." Bill sat down heavily. "Three hundred dollars. For a felony. At a birthday party." "You said you wanted to retire." "To a beach. Not to prison." Molly rolled up the blueprint with a decisive snap. "Get cold feet now, you'll regret it forever. This is the heist of the season." "It's a children's party." "The Hendersons spared no expense. There's a bounce house, Bill. A bounce house." She slung her bag over her shoulder. "Now, are you in, or do I find a partner who isn't afraid of a man named Bubbles?" Bill sighed, picked up the smock, and held it against his chest like a man trying it on for the last time. "Fine," he muttered. "But I'm not painting butterflies."
created_at
"2026-05-31T11:45:08.7Z"
5/31/2026, 11:45:08 AM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
843
moderation_latency
(null)
generation_time
20516
tokens_prompt
77
tokens_completion
821
native_tokens_prompt
93
native_tokens_completion
1150
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.029215
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
id
"msg_01YDRx3wYJVomoJDdSa3sRZQ"
is_byok
false
latency
842
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780227908-pJcVxRNqgEiMnjfBSKFJ"
session_id
(null)
preset_id
(null)
api_type
"completions"
id
"gen-1780227908-NqNxRx5zE7XHVOWaHMBh"
upstream_id
"msg_01YDRx3wYJVomoJDdSa3sRZQ"
total_cost
0.029215
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
99.74%
Matches word count
n/a
neededClean
false
words
504
0.00%
Dialogue to Total Word Ratio
Ratio: 76.17%, Deviation: 26.17%
neededClean
false
wordsTotal
512
wordsDialogue
390
49.8722%