NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-31-01KSYX9PNHR3YCSKGYBRF7583B
Run Details
Model:
Claude Opus 4.8 (Reasoning, High)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
21.5s
Cost:
$0.028815
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
98.71%
Matches word count
(details)
3.51%
Dialogue to Total Word Ratio
(details)
51.1104%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the crayon-drawn map across the table and tapped it with one finger. "This is the perimeter," she said. "Bouncy castle to the north, snack table to the east, and a clown patrolling the western flank." Bill leaned in, squinting. "A clown? You didn't say anything about a clown." "His name is Mr. Giggles. He's the one variable I can't predict." "I hate clowns, Molly. You know I hate clowns." "Everybody hates clowns. That's what makes him dangerous." She pulled a juice box toward her and stabbed the straw in. "Okay. We breach at fourteen-hundred hours, right after the pin-the-tail-on-the-donkey." Bill folded his arms. "Why then?" "Because they'll be blindfolded, Bill. An entire room of distracted children, spinning in circles. We walk in, grab the loot pile, walk out." "And the parents?" "Sugar-comatose by then. They've been on their feet since nine. By two o'clock, the parents are basically furniture." Bill scratched his chin. "What about the birthday kid?" "Tcommy. Turning seven. He's the brains of the operation, so we neutralize him first." "Neutralize a seven-year-old?" "With cake," Molly said, holding up a small bag. "Extra frosting. He gets one slice, he's down for the count. Trust me, I've studied his patterns." Bill sat back, unconvinced. "I don't love it. Last job you said the dog was 'basically furniture' too, and I lost a trouser leg." "That was a Rottweiler, Bill. This is a labradoodle named Princess Sparkles. Apples and oranges." "And the presents? How big are we talking?" Molly's eyes gleamed. "Word on the street is the kid's grandfather is loaded. We're looking at a deluxe LEGO set, a remote-control helicopter, and—" she paused dramatically, "—a possible Nintendo." Bill whistled low. "A Nintendo. That's a clean retirement, that is." "We split it sixty-forty." "Sixty to me?" "Sixty to me. I made the map." "It's in crayon, Molly!" "It's an artistic choice. It blends in with the environment." She tapped the table again. "Now. You go in dressed as the magician they hired. I'll handle crowd control." Bill blinked. "I can't do magic." "You don't have to. Pull a coin out of an ear, wave a stick around, the kids'll lose their minds. Meanwhile I'm loading presents into the van." "And if the real magician shows up?" Molly smiled. "He won't. I booked him for a party across town. Different child entirely. He's currently driving to the wrong address as we speak." Bill stared at her for a long moment, then slowly nodded. "You know, when you lay it all out like that, it almost sounds—" "Genius?" "I was going to say 'illegal,' but sure." "It's a children's party, Bill. What's the worst that can happen?" Bill picked up a tiny paper hat from the table and turned it over in his hands. "Famous last words," he muttered, and pulled the elastic strap under his chin. "For the record, if that clown so much as looks at me, the deal's off." "Noted," Molly said. "Now grab the balloons. We've got a party to crash."
created_at
"2026-05-31T11:39:22.173Z"
5/31/2026, 11:39:22 AM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1007
moderation_latency
(null)
generation_time
21439
tokens_prompt
77
tokens_completion
803
native_tokens_prompt
93
native_tokens_completion
1134
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.028815
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
id
"msg_01V9fYC4B5WaWo1K3CKsMBa2"
is_byok
false
latency
1007
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780227562-enYU8JOINvq7JiQZLuln"
session_id
(null)
preset_id
(null)
api_type
"completions"
id
"gen-1780227562-lDasaUN1DNQXR4Bw4Gng"
upstream_id
"msg_01V9fYC4B5WaWo1K3CKsMBa2"
total_cost
0.028815
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
98.71%
Matches word count
n/a
neededClean
false
words
506
3.51%
Dialogue to Total Word Ratio
Ratio: 78.19%, Deviation: 8.19%
neededClean
false
wordsTotal
518
wordsDialogue
405
51.1104%