NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-29-01KSSJH7SSVR6DDVJZ28TJS31M
Run Details
Model:
Claude Opus 4.8 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
22.3s
Cost:
$0.029890
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
98.71%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.3562%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the blueprint across the table, smoothing out the creases with the careful precision of a surgeon. Except the blueprint was a child's birthday invitation, and the precision was undercut by the glitter now stuck to her palms. "Right," she said. "The Hendersons' backyard. Saturday. Fourteen children, two clowns, and an estimated forty-seven presents." Bill leaned over, squinting. "Forty-seven? How'd you arrive at that number?" "Math, Bill. Each kid brings one gift. Plus the parents overcompensate. Plus the grandparents always go overboard." "That's not math, that's guessing with confidence." Molly glared at him. "It's called criminal intuition, and it's never failed me." "It failed us at the petting zoo." "The llama incident was an act of God." She tapped the invitation. "Focus. The party starts at noon. The presents get stacked by the gazebo. We move during the piñata." Bill frowned, scratching his stubble. "Why the piñata?" "Because," Molly said slowly, as if explaining to a toddler, "when a piñata bursts, chaos erupts. Children scatter. Candy rains down. For approximately ninety seconds, every adult in that yard is watching a sugar-fueled stampede. Nobody's watching the gifts." "Huh." Bill nodded, genuinely impressed. "That's actually brilliant." "I know." "So what's our disguise?" Molly reached into a duffel bag and pulled out two enormous, fuzzy costumes. Bill recoiled. "Absolutely not." "Bill—" "I am not dressing as a giant rabbit." "You're dressing as a giant rabbit," Molly confirmed, tossing him the suit. "I'm the giant duck. Children's entertainers. Nobody questions a mascot. We walk in, we blend, we wait for the piñata, and we waddle out with a sack of loot." Bill held up the rabbit head, staring into its dead, plastic eyes. "This thing smells like regret and old cheese." "That's the smell of victory." "That's the smell of someone else's sweat." Molly snatched the head back and stuffed it into his arms. "Stop complaining. This is the cleanest job we've ever had. No alarms. No security guards. Just frosting and weak supervision." Bill pulled the costume on, his voice now muffled and slightly tragic. "And if a kid recognizes we're not the real mascots?" "They're six, Bill. Their critical thinking peaks at deciding which juice box is best." "Fair." She zipped herself into the duck suit, flexing her webbed gloves. "Once we've got the presents, we load them into the van, drive to the warehouse, and sort through our winnings. Tablets, video games, maybe some cash in birthday cards." "And the toys?" Molly paused. "What about the toys?" "We just... take toys? From children?" Bill's beady rabbit eyes seemed almost sad. "We're thieves, Bill. We don't have a moral high ground. We barely have a moral low ground." He sighed, the rabbit ears drooping. "I just feel like there's a line." "The line," Molly said, adjusting his crooked bow tie, "is wherever the presents are. Now come on. We've got a piñata to exploit and a childhood to ruin." Bill nodded solemnly. "For the record, I want it known I objected." "Noted. Now hop to it, bunny."
created_at
"2026-05-29T09:55:03.724Z"
5/29/2026, 9:55:03 AM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1199
moderation_latency
(null)
generation_time
21071
tokens_prompt
77
tokens_completion
837
native_tokens_prompt
93
native_tokens_completion
1177
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.02989
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
id
"msg_0174oKnjcQ4GXMrtN83jYvdf"
is_byok
false
latency
1199
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780048503-EZ21eKZ1ed07t60tH456"
session_id
(null)
api_type
"completions"
id
"gen-1780048503-w6gzvNsl770DSSS6brjd"
upstream_id
"msg_0174oKnjcQ4GXMrtN83jYvdf"
total_cost
0.02989
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
98.71%
Matches word count
n/a
neededClean
false
words
506
0.00%
Dialogue to Total Word Ratio
Ratio: 67.98%, Deviation: 17.98%
neededClean
false
wordsTotal
509
wordsDialogue
346
49.3562%