NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-18-01KHQ94Z4WK8S8KWW8T3PHKJ16
Run Details
Model:
Claude Sonnet 4.6
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
16.6s
Cost:
$0.010860
Provider:
Amazon Bedrock
Evaluation summary
Score
Evaluator
Details Link
60.28%
Matches word count
(details)
2.15%
Dialogue to Total Word Ratio
(details)
31.2129%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The van smelled like corn chips and ambition. Molly spread the hand-drawn map across the dashboard while Bill ate a sandwich directly over it, dropping crumbs onto what she had labeled the "perimeter zone." "You're getting mustard on the bounce house," she said. "The bounce house isn't a security threat." "Everything is a security threat, Bill. That's the first thing I taught you." Bill looked at the drawing more carefully. "You drew the birthday clown with a sword." "He's holding a balloon animal. It's a balloon sword. Parents request them. Kids love them. I've done my research." "You've done research on the clown." "I've done research on everything. The birthday boy is turning seven. His name is Ethan. Ethan has requested a dinosaur cake, a magician, and approximately forty guests." Bill set down his sandwich. "How do you know all this?" "His mother posts everything on Facebook with location tagging enabled. I cannot stress enough how easy this was." "Okay," Bill said slowly, "walk me through the entry." "Front gate opens at two. We arrive at two-fifteen, blend with the parents. You're wearing the dad outfit." "What's a dad outfit?" "Polo shirt. Slightly too short khaki shorts. Confusion behind the eyes." "I can do confused." "I know you can, Bill. It's your natural state." He ignored this. "And the presents are where?" "Gift table. Northwest corner of the yard, near the fence. Standard setup. Parents drop presents there immediately upon arrival because they're relieved to have somewhere to put them." "Then what?" "Then we mingle for exactly twenty-two minutes. Long enough to become background noise, short enough that nobody learns our fake names." "What's my fake name?" "Dave." "Why am I always Dave?" "Because you look like a Dave, Bill. You have always looked like a Dave. This is not the conversation we should be having right now." Bill picked up the map and studied it again. "What happens when the magician starts?" "That," Molly said, pointing at him with genuine pride, "is the beautiful part. Every single adult in that yard will be watching a man pull something out of a hat, because adults at children's parties are more entertained than the children. We load the gifts into the laundry bags, go over the fence, and we are back in this van before anyone notices." "What if someone notices?" "You cry." "I cry?" "You cry and say you're having a hard year. Nobody questions a crying dad at a birthday party, Bill. They just hand you a juice box and look away." Bill sat quietly for a moment, staring out at the suburban street ahead of them, the sound of distant children already drifting over the hedgerows. "I feel a little bad," he said. "About the kids." Molly folded the map. "The kids have each other, cake, and a bounce house. They'll be fine." "And us?" She smiled. "We'll have presents."
created_at
"2026-02-18T02:27:16.887904+00:00"
2/18/2026, 2:27:16 AM
model
"anthropic/claude-4.6-sonnet-20260217"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
638
moderation_latency
166
generation_time
16378
tokens_prompt
77
tokens_completion
769
native_tokens_prompt
75
native_tokens_completion
709
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.01086
router
(null)
provider_responses
0
id
"aeffa58e-b43a-4c35-8a67-1552ac7db260"
status
200
is_byok
false
latency
637
endpoint_id
"83d802cb-44f2-45a0-a9d5-cb9bf79777c8"
provider_name
"Amazon Bedrock"
model_permaslug
"anthropic/claude-4.6-sonnet-20260217"
api_type
"completions"
id
"gen-1771381619-u2ljU2UiCgbHBepW1vij"
upstream_id
"aeffa58e-b43a-4c35-8a67-1552ac7db260"
total_cost
0.01086
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
Evaluation details
Result
Evaluator
Details
Meta Data
60.28%
Matches word count
n/a
neededClean
false
words
485
2.15%
Dialogue to Total Word Ratio
Ratio: 78.76%, Deviation: 8.76%
neededClean
false
wordsTotal
485
wordsDialogue
382
31.2129%