NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-31-01KSYXS397M88NMTKEEM1QWTBH
Run Details
Model:
Claude Opus 4.8 (Reasoning, High)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
19.9s
Cost:
$0.027990
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
51.93%
Matches word count
(details)
8.66%
Dialogue to Total Word Ratio
(details)
30.2949%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the blueprints across the kitchen table, smoothing out the wrinkles with the kind of reverence usually reserved for ancient treasure maps. In reality, it was a hastily drawn diagram of the Hendersons' backyard, complete with a stick figure that was apparently meant to represent a bouncy castle. Bill leaned over her shoulder, chewing the end of a licorice rope. "So this is the target?" "This," Molly said, tapping the paper dramatically, "is the score of the century." "It's a seven-year-old's birthday party." "Exactly. Do you know what kind of loot accumulates at these things? Tablets. Gift cards. Cash stuffed in little envelopes by guilty uncles." She pulled out a magnifying glass she'd bought specifically for this occasion, examined the diagram, then set it down when she realized it added nothing. "We're talking premium merchandise." Bill wasn't convinced. He sank into a chair, the licorice dangling from his mouth like a deflated cigarette. He'd robbed armored cars. He'd cracked safes in three countries. And now Molly wanted him to infiltrate a celebration themed entirely around an animated dinosaur. "Walk me through it," he sighed. Molly's eyes lit up. She'd been waiting for this. "We enter as entertainers. You'll be the clown." "Absolutely not." "You'll be the clown, Bill." He pointed at her with a trembling finger of dread. "I have a reputation." "You have a red nose in the duffel bag by the door is what you have." She grabbed a marker and began drawing arrows on the blueprint with manic energy. "While you distract the children with balloon animals—" "I don't know how to make balloon animals." "—you make a long balloon and tell them it's a snake. Children are idiots. Meanwhile, I slip into the gift pile during the cake cutting, when parental supervision drops to roughly zero percent." Bill had to admit the cake-cutting timing was sound. He'd seen the chaos. He'd been to his nephew's party once and witnessed a grown man get bodychecked into a piñata by a toddler hopped up on fruit punch. "What about the parents?" he asked. "Phones. They'll all be filming. Nobody who's filming is actually watching anything." Molly leaned back, arms crossed, triumphant. "By the time little Timmy goes to open his presents, we'll be three counties away counting our gift cards." "His name's not Timmy." "It's whoever. Focus." She slid the red nose across the table toward him. It rolled to a gentle stop against his coffee mug, accusatory in its cheerfulness. Bill stared at it for a long, philosophical moment. This was rock bottom. Or perhaps, he reflected, rock bottom had been six months ago, when they'd attempted to rob a petting zoo and he'd been pursued across a parking lot by an emotionally unstable goat. "If I do this," he said slowly, "I want fifty percent and we never speak of the clown thing again." Molly beamed and snatched up her marker. "Bill, my friend, you've got yourself a deal." She drew one final arrow on the map, pointing directly at the bouncy castle. "Now, let's talk about our escape route."
created_at
"2026-05-31T11:47:46.612Z"
5/31/2026, 11:47:46 AM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1159
moderation_latency
(null)
generation_time
19868
tokens_prompt
77
tokens_completion
803
native_tokens_prompt
93
native_tokens_completion
1101
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.02799
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
id
"msg_01QDrKVQGPeZPP8E3jpBGQP4"
is_byok
false
latency
1159
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780228066-NISmLFT23Pn3sDHfLzOY"
session_id
(null)
preset_id
(null)
api_type
"completions"
id
"gen-1780228066-62KYNbNEFHN0CudBZ6X0"
upstream_id
"msg_01QDrKVQGPeZPP8E3jpBGQP4"
total_cost
0.02799
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
51.93%
Matches word count
n/a
neededClean
false
words
516
8.66%
Dialogue to Total Word Ratio
Ratio: 36.99%, Deviation: 6.99%
neededClean
false
wordsTotal
519
wordsDialogue
192
30.2949%