NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-29-01KSSJW1EVXEDWANKTR7Z2QPHN
Run Details
Model:
Claude Opus 4.8 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
21.3s
Cost:
$0.027815
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
99.92%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.9595%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the blueprint across the table, smoothing out the wrinkles with the reverence of a surgeon prepping for transplant. Bill leaned over her shoulder, squinting. "That's a drawing of a house," he said. "With a chimney and everything." "It's a tactical overview." "It's got a smiley sun in the corner." "I had to use Madison's crayons. She wouldn't let me near the good markers." Molly tapped the center of the page. "Focus, Bill. The party's at fourteen hundred hours. The presents will be stacked here, by the bouncy castle." Bill scratched his stubble. "Walk me through it again. Slowly. Use small words." "We arrive disguised as entertainers. You're the clown. I'm the magician." "Why do you get to be the magician?" "Because I can do magic." "You did one card trick at Christmas and Aunt Carol cried." "She cried because it was beautiful." Molly straightened up. "While the kids are distracted by your performance, I slip behind the bouncy castle and load the gifts into the van." Bill folded his arms. "And what exactly is my performance?" "Balloon animals. Juggling. The usual clown business." "I can't juggle, Molly. I can barely hold two things at once. I dropped my phone in the toilet last Tuesday while holding nothing else." "Then make balloons." "I made a balloon dog yesterday and it looked like a deflated handshake. A child would weep." Molly pinched the bridge of her nose. "Bill. These are six-year-olds. Their standards are catastrophically low. One of them ate a fistful of sand at the last party and called it 'crunchy.'" "That's fair," Bill admitted. "Sand is underrated." "The point is, nobody's grading your technique. You just have to keep them occupied for nine minutes." "Nine whole minutes? With balloons?" "Or honk a horn. Fall over. Children are essentially drunk adults. They love when you fall over." Bill considered this. "I am good at falling over." "It's your one true talent." "And after I'm done falling over?" "You exit through the back, we drive away, and we're swimming in unwrapped wonders by sundown. Robots, dolls, possibly a scooter." "What if they're all just socks?" Bill said darkly. "What if we risk everything for a sack of educational gifts and one of those wooden puzzles for developing fine motor skills?" Molly paused. This had genuinely not occurred to her. "That's a risk we take," she said finally. "Every heist has uncertainty." "You stole a vending machine last month and it was full of expired pretzels." "That was a learning experience." "You cried in the car." "They were emotional pretzels, Bill." Molly rolled up the blueprint with crisp finality. "Are you in or not?" Bill stared at the smiley sun in the corner for a long moment. Then he sighed. "Fine. But I'm not wearing the wig. It smells like a stranger's regret." "You'll wear the wig." "I'll wear the wig," he muttered. "But if there's a piñata, it's mine. I have unfinished business with piñatas." Molly smiled. "Now you're thinking like a professional."
created_at
"2026-05-29T10:00:56.549Z"
5/29/2026, 10:00:56 AM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1276
moderation_latency
(null)
generation_time
21262
tokens_prompt
77
tokens_completion
811
native_tokens_prompt
93
native_tokens_completion
1094
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.027815
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
id
"msg_013uyjimwUucZhCSq98iWBrf"
is_byok
false
latency
1276
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780048856-69uNyPFavekbGsp2yyyZ"
session_id
(null)
api_type
"completions"
id
"gen-1780048856-BWiCxTJbXdxIjhxCdoR8"
upstream_id
"msg_013uyjimwUucZhCSq98iWBrf"
total_cost
0.027815
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
99.92%
Matches word count
n/a
neededClean
false
words
503
0.00%
Dialogue to Total Word Ratio
Ratio: 80.00%, Deviation: 30.00%
neededClean
false
wordsTotal
505
wordsDialogue
404
49.9595%