NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-31-01KSYX8MM30THEZFSW63JMRYBD
Run Details
Model:
Claude Opus 4.8 (Reasoning, High)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
20.3s
Cost:
$0.029165
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
90.48%
Matches word count
(details)
4.99%
Dialogue to Total Word Ratio
(details)
47.7366%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the blueprints across the kitchen table, smoothing out the wrinkles with the reverence of a general planning an invasion. The paper crinkled, revealing a hand-drawn map of suburban hell: a backyard, a bouncy castle, and an X marking the gift table. Bill leaned over her shoulder, chewing a toothpick that had long since lost its flavor. "You're telling me," he said slowly, "that our big score is a pile of presents wrapped by people who can't even tie their own shoes?" "Don't underestimate the Hendersons," Molly said, jabbing a finger at the map. "Their kid turns seven this Saturday. Rich family. We're talking premium loot. Tablets. LEGO sets. Maybe even a Nintendo." Bill straightened up, unconvinced. He scanned the perimeter she'd sketched, noting the position of what appeared to be a clown labeled "THREAT LEVEL: UNKNOWN." "And the guards?" "Parents." Molly pulled out a second sheet, this one a list of names. "Twelve adults, minimum. All distracted, all exhausted, all hopped up on sheet cake and bad decisions. The moment that piƱata breaks, chaos erupts. That's our window." He picked up the list, frowning. "There's a guy here just labeled 'Uncle Greg.' Why's his name underlined three times?" "Uncle Greg," Molly said gravely, "drinks. By two o'clock he'll be asleep in a lawn chair. By two-fifteen, he won't notice a freight train rolling through the yard." Bill nodded, slowly coming around to the genius of it all. He paced the small kitchen, hands clasped behind his back like a man wrestling with destiny. "What about disguises?" Molly reached under the table and produced two enormous, fluffy costumes. She tossed one at him. It hit his chest with a soft, humiliating *poof*. He held it up. A dinosaur. Bright purple, with a sewn-on smile and a stubby little tail. "Absolutely not," he said. "Everybody loves a party mascot, Bill. Nobody questions the dinosaur. The dinosaur is invisible. The dinosaur is *trusted*." Bill stared into the costume's plastic, dead-eyed face. It seemed to be mocking him, this purple symbol of his fallen ambitions. He had once robbed an armored car. He had once outrun three squad cars on a motorcycle. And now, this. "And how exactly," he asked, "do we carry the presents out? In our little dinosaur arms?" Molly grinned, the kind of grin that meant she'd been waiting for this question all morning. She gestured dramatically toward the corner, where a battered ice cream cart sat covered in a tarp. "We roll in selling treats. We roll out with the goods stashed underneath. The kids get popsicles. We get the spoils. Everybody wins." Bill considered this. He looked at the cart. He looked at the dinosaur costume crumpled in his hands. He looked at Molly, who was practically vibrating with criminal enthusiasm. He sighed, a long sigh that carried the weight of every poor life choice that had led him to this exact moment. "Fine," he muttered, stepping one leg into the costume. "But I'm not doing the dinosaur voice." "You'll do the voice." "I'll do the voice."
created_at
"2026-05-31T11:38:47.319Z"
5/31/2026, 11:38:47 AM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1027
moderation_latency
(null)
generation_time
20290
tokens_prompt
77
tokens_completion
799
native_tokens_prompt
93
native_tokens_completion
1148
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.029165
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
id
"msg_01WaFWxwLLxKHy87zVoFKNDL"
is_byok
false
latency
1026
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780227527-Y7elYpqSNqlwI3w5zSS3"
session_id
(null)
preset_id
(null)
api_type
"completions"
id
"gen-1780227527-Q701dL0wCvWivfqhvHFE"
upstream_id
"msg_01WaFWxwLLxKHy87zVoFKNDL"
total_cost
0.029165
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
90.48%
Matches word count
n/a
neededClean
false
words
510
4.99%
Dialogue to Total Word Ratio
Ratio: 37.74%, Deviation: 7.74%
neededClean
false
wordsTotal
514
wordsDialogue
194
47.7366%