NC Bench
Overview
About
Tests
Models
Vendors
Compare
Run: 2026-02-24-01KJ8V4JN46F9Y1YGNX65N0AKS
Run Details
Model:
Claude Sonnet 4.6 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
18.0s
Cost:
$0.012816
Provider:
Amazon Bedrock
Evaluation summary
Score
Evaluator
Details Link
20.19%
Matches word count
(details)
16.28%
Dialogue to Total Word Ratio
(details)
18.2372%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the crayon drawing across the hood of their rusted van and smoothed out the edges with her palm. The drawing depicted a house, a rainbow, and what appeared to be a dog wearing a hat. "Where did you get this?" Bill asked. "Off the fridge. It's a map." "It's a drawing of a unicorn, Molly." "That's the side entrance." She tapped the rainbow firmly. "And this is where they're keeping the presents." Bill leaned closer and squinted. "That's the sun." "It has a bow on it, Bill. The sun doesn't have a bow on it." Bill opened his mouth, then closed it again. He looked at the drawing once more and sighed in the heavy way of a man who had made several poor life decisions. "Okay," he said. "Walk me through it." Molly cracked her knuckles. "The party starts at two. Bouncy castle goes up at one-thirty, which means every single kid in that yard will be distracted and screaming." "I love the screaming part. That's our cover." "That is our cover, yes. You go in through the back gate dressed as a clown." Bill straightened up. "I'm sorry, what?" "Clown costume. Full wig. Big shoes. Nobody questions a clown at a birthday party." "Nobody questions a clown because they're too busy being terrified of a clown." "That's a you problem, Bill. Emotionally." "I'm not wearing the shoes." "You're wearing the shoes. The shoes are load-bearing to the whole operation." Bill stared at the rainbow map again as if hoping it might offer him an alternative. It did not. "And what are you doing while I'm out here looking like a nightmare?" he asked. "I'm posing as the gift table attendant." "That's not a real job." "Parents will believe anything at a birthday party. They're exhausted and slightly drunk on mimosas. I walk in with a clipboard, I smile, I start moving presents to my cart, and nobody says a word." "What kind of presents are we even talking about?" Molly pulled out a folded piece of paper from her jacket pocket. "According to the invite list I may have intercepted, we are looking at three remote-control cars, a junior chemistry set, something called a Mega Blast Laser Tag Arena, and approximately seven variations of the same princess doll." Bill whistled low. "That's a solid haul." "It's a beautiful haul, Bill." "We're still terrible people." "We are absolutely terrible people." Molly rolled up the crayon map and tucked it under her arm. "The clown suit is in the back of the van. There's a red nose on the passenger seat." Bill trudged around to the back of the van and yanked open the doors. He stared at the enormous rainbow wig for a long, mournful moment. "Nobody," he called out, "is going to believe you're a gift table attendant." Molly was already walking away, clipboard in hand.
created_at
"2026-02-24T22:08:26.797Z"
2/24/2026, 10:08:26 PM
model
"anthropic/claude-4.6-sonnet-20260217"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
743
moderation_latency
164
generation_time
17808
tokens_prompt
77
tokens_completion
842
native_tokens_prompt
92
native_tokens_completion
836
native_tokens_completion_images
(null)
native_tokens_reasoning
109
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.012816
router
(null)
provider_responses
0
endpoint_id
"83d802cb-44f2-45a0-a9d5-cb9bf79777c8"
id
"4f22c3fc-be16-4f30-9e89-8b46d5a25653"
is_byok
false
latency
739
model_permaslug
"anthropic/claude-4.6-sonnet-20260217"
provider_name
"Amazon Bedrock"
status
200
api_type
"completions"
id
"gen-1771970906-XcBnqanFPfYtM0qYPW5w"
upstream_id
"4f22c3fc-be16-4f30-9e89-8b46d5a25653"
total_cost
0.012816
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
Evaluation details
Result
Evaluator
Details
Meta Data
20.19%
Matches word count
n/a
neededClean
false
words
480
16.28%
Dialogue to Total Word Ratio
Ratio: 63.98%, Deviation: 6.02%
neededClean
false
wordsTotal
483
wordsDialogue
309
18.2372%