NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-29-01KSSK0M79SF863J3B44RY7FHG
Run Details
Model:
Claude Opus 4.8 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
21.5s
Cost:
$0.028540
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
99.74%
Matches word count
(details)
12.33%
Dialogue to Total Word Ratio
(details)
56.0354%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The blueprints spread across the kitchen table were, technically speaking, a children's birthday party invitation rendered in crayon. Molly tapped the wobbly drawing with a manicured finger. "This X marks the bouncy castle." "That's a dog," Bill said. "It's a bouncy castle, Bill. Focus." He leaned closer, squinting through reading glasses he refused to admit he needed. The invitation had arrived courtesy of Molly's nephew, a sticky-fingered eight-year-old named Dougie who had, without realizing it, provided them with the single greatest opportunity of their criminal careers. Forty children. Forty mountains of presents. And, according to the gold foil lettering, a piñata shaped like a unicorn. "The presents will be stacked here, by the gift table," Molly continued, sliding a finger toward a smear of orange wax. "We come in during the cake. Nobody watches anything during cake." Bill set down his coffee. "I have concerns." "You always have concerns." "Last time my 'concern' was that the bank had a laser grid. I was correct about the laser grid." Molly waved this away as though correctness were a minor character flaw. She produced a second sheet of paper, this one a hand-drawn schedule. "Magician arrives at two. That's our window. Everyone's hypnotized by the disappearing rabbit." "Children don't get hypnotized. They heckle. I've seen footage." "Then we use the chaos." Bill rubbed his temples. He had spent twenty-two years cracking safes, dodging guard dogs, and once outrunning an entire wedding party in Lisbon. And now here he was, planning a daylight assault on a suburban living room defended by juice boxes and a woman named Karen who, by all accounts, took her party planning extremely seriously. "What about the parents?" he asked. "Distracted. Resentful. Drinking warm wine in the kitchen and pretending to enjoy each other's company." Molly grinned. "We blend in. You're an uncle, I'm an aunt. Nobody questions an aunt." "I don't have a costume." "You're wearing it. You always look like a disappointed uncle." Bill chose to let that pass. He picked up the invitation again, studying the crayon dog that was apparently a bouncy castle. There was something undeniably appealing about the plan. No vaults. No alarms. No Interpol. Just a duffel bag, a getaway car, and the deeply held assumption that no one would ever expect two professional thieves to rob a child. "The unicorn piñata," he said slowly. "We take that too." Molly's eyes lit up. "Now you're committed." "I'm not committed. I'm intrigued. There's a difference." "There really isn't." He sighed, the long exhale of a man who knew, deep down, that he had already agreed. Outside, somewhere in the gathering dusk, forty children were no doubt practicing their unwrapping technique, blissfully unaware that fate, in the form of two middle-aged criminals, was coming for their Lego. "Fine," Bill said. "But if a clown shows up, I'm out. Clowns are unpredictable." Molly folded the blueprints with the reverence of a general. "Saturday. Two o'clock. Wear the disappointed face." "It's not a face. It's just my face." "Even better."
created_at
"2026-05-29T10:03:26.835Z"
5/29/2026, 10:03:26 AM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
924
moderation_latency
(null)
generation_time
21213
tokens_prompt
77
tokens_completion
827
native_tokens_prompt
93
native_tokens_completion
1123
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.02854
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
is_byok
false
latency
257
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
429
1
endpoint_id
"dfc0e5bd-d703-4fe2-a7bb-655eb95d5441"
id
"msg_01BBJPB6tctUtFbKaWReSgZb"
is_byok
false
latency
924
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780049006-HU6kt7lkVFl0bTIZSrdq"
session_id
(null)
api_type
"completions"
id
"gen-1780049006-q6ZwkmpXakTKo0z9aDhW"
upstream_id
"msg_01BBJPB6tctUtFbKaWReSgZb"
total_cost
0.02854
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
99.74%
Matches word count
n/a
neededClean
false
words
504
12.33%
Dialogue to Total Word Ratio
Ratio: 36.47%, Deviation: 6.47%
neededClean
false
wordsTotal
510
wordsDialogue
186
56.0354%