NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-29-01KSSK6Z7Q7FW3Z1J2XC4SRG1M
Run Details
Model:
Claude Opus 4.8 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
21.5s
Cost:
$0.028740
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
99.92%
Matches word count
(details)
0.28%
Dialogue to Total Word Ratio
(details)
50.1014%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the crayon-drawn map across the table and tapped it with authority. "Okay, Bill, focus. The party starts at two. We hit the present pile at two-thirty, when they're distracted by the clown." Bill squinted at the paper. "Is that the clown? It looks like a melted traffic cone." "I'm not an artist, I'm a criminal mastermind." "You're neither, technically. We've never successfully stolen anything." Molly glared at him. "We stole that garden gnome." "It followed us home. It was on a skateboard." "Details." She slapped the table. "Now, the targets are children, which means low security but high unpredictability. They're fast, they're sticky, and they scream at frequencies that can shatter a man's resolve." Bill nodded gravely. "I still have nightmares about the bouncy castle incident." "We don't talk about the bouncy castle incident." "I lost a shoe in there, Molly. A shoe and most of my dignity." She rolled out a second piece of paper. "The presents will be stacked near the cake. We grab them, stuff them in the swag bag, and we're gone before anyone notices." "What's our disguise?" "We're entertainers. You're a magician, I'm your lovely assistant." Bill crossed his arms. "Why am I the magician?" "Because I'm not wearing the sparkly leotard, that's why." "I don't know any magic tricks." "You don't need to know magic. You need to wave your hands dramatically and say 'abracadabra' while I load the loot." "And if a kid asks me to pull a rabbit out of a hat?" Molly considered this. "Improvise. Pull out a sandwich. Kids love sandwiches." "Kids hate sandwiches. They want candy and chaos." "Then give them chaos!" She threw her hands up. "Knock over the punch bowl, set off the piñata early, I don't care. Just buy me ninety seconds with the gift table." Bill paced the room. "This feels morally questionable. They're children, Molly. Tiny, innocent children." "With brand-new toys still in the packaging. Do you know what those resell for?" "That's not the point—" "It's seventy percent of the point." He sighed deeply. "What's the other thirty?" "The thrill of the heist, obviously." "You said we've never successfully completed a heist." "Which is exactly why I'm thrilled. The suspense is unbearable." She zipped up the swag bag. "Now, what's your code name?" "Do we need code names?" "Professionals always have code names. I'm Shadow Whisper." "Then I want to be Shadow Whisper Two." "You can't be a number, Bill. Pick a real one." He thought for a moment. "Captain Birthday." Molly stared at him. "That's the worst name I've ever heard." "It's thematically appropriate!" "Fine. Captain Birthday it is." She extended her hand. "Partners?" Bill shook it reluctantly. "If we get caught, I'm telling them you forced me." "If we get caught, Bill, we'll be outnumbered by toddlers wielding juice boxes. Nobody's getting out clean." He grabbed his magician's hat and placed it on his head with a defeated dignity. "For the record," he muttered, "I still think the gnome was a fluke."
created_at
"2026-05-29T10:06:54.72Z"
5/29/2026, 10:06:54 AM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1579
moderation_latency
(null)
generation_time
21187
tokens_prompt
77
tokens_completion
826
native_tokens_prompt
93
native_tokens_completion
1131
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.02874
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
id
"msg_01BgEbjkGD76N4UVQakBzWND"
is_byok
false
latency
1579
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780049214-hWPmjEe5JoVZWlA466Qb"
session_id
(null)
api_type
"completions"
id
"gen-1780049214-qV7EsEMBvApwDoorwx2E"
upstream_id
"msg_01BgEbjkGD76N4UVQakBzWND"
total_cost
0.02874
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
99.92%
Matches word count
n/a
neededClean
false
words
503
0.28%
Dialogue to Total Word Ratio
Ratio: 80.83%, Deviation: 10.83%
neededClean
false
wordsTotal
506
wordsDialogue
409
50.1014%