NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-18-01KHQ92GZGD5ZZD7MYN9TAHH6P
Run Details
Model:
Claude Sonnet 4.6
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
16.1s
Cost:
$0.011370
Provider:
Amazon Bedrock
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
31.60%
Dialogue to Total Word Ratio
(details)
65.8002%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The van smelled like cheese crackers and desperation. Molly spread the hand-drawn map across the dashboard while Bill ate a sandwich he had definitely not offered to share. "Okay, so the party starts at two," Molly said, tapping the map with a highlighter. "We go in through the back gate during the piñata. Everyone looks at the piñata. Nobody looks at us." Bill chewed thoughtfully. "What kind of piñata?" "What does that matter?" "Donkey piñatas, the kids go maybe forty-five seconds. You get one of those star ones with the seven points, you're looking at two minutes easy. I need to know my window, Molly." "I don't know what kind of piñata, Bill." "Did you do any reconnaissance at all?" "I did plenty of reconnaissance. I know there are going to be seventeen kids, a bouncy castle, and a woman named Deborah who gets very involved in party games. That's your reconnaissance." Bill set down his sandwich. "Who's Deborah?" "Someone's aunt, I think. The point is she's a distraction." "Is she the kind of distraction who wanders or the kind who plants herself?" "Plants herself. Very loud. She does voices." "Okay." Bill nodded slowly. "Okay, that's actually useful. I can work with a loud aunt." Molly uncapped the highlighter and drew a yellow line from the back gate to a circle labeled PRESENTS. "The gift table is here, beside the cake. We load the bags, we're back in the van in under four minutes." "The cake's going to be a problem." "The cake is not going to be a problem." "Molly, you know what happens when I'm around cake." "I'm asking you to walk past a cake, Bill, not make a major life decision." "Last time you asked me to walk past a cake I was in there for eleven minutes." "That was a wedding and you had personal feelings about the couple." "I still have personal feelings about the couple." "The point is, professional focus. We are in, we are out, we do not interact with the cake, the children, the bouncy castle, or Deborah." Bill picked up his sandwich again. "What if a kid talks to me?" "Why would a kid talk to you?" "Kids always talk to me. I have an approachable face." Molly looked at his face. It was a large, red, somewhat asymmetrical face with a mustache that suggested poor planning. "You genuinely do not have an approachable face." "My niece says I look like a cartoon bear." "That is not the compliment you think it is." "Cartoon bears are very approachable." "Bill." Molly put both hands flat on the map. "We are stealing birthday presents from a six-year-old's garden party. I need you mentally present and morally flexible. Can you do that?" Bill finished his sandwich. He looked out at the street, at the cluster of balloons visible above the garden fence, bouncing cheerfully in the afternoon breeze. "I want to look at the cake first. Just look." "Absolutely not."
created_at
"2026-02-18T02:25:56.374525+00:00"
2/18/2026, 2:25:56 AM
model
"anthropic/claude-4.6-sonnet-20260217"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
705
moderation_latency
172
generation_time
15908
tokens_prompt
77
tokens_completion
763
native_tokens_prompt
75
native_tokens_completion
743
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.01137
router
(null)
provider_responses
0
id
"cffa18a3-3d2e-43b7-8ca0-31dd105b6c3d"
status
200
is_byok
false
latency
703
endpoint_id
"83d802cb-44f2-45a0-a9d5-cb9bf79777c8"
provider_name
"Amazon Bedrock"
model_permaslug
"anthropic/claude-4.6-sonnet-20260217"
api_type
"completions"
id
"gen-1771381539-eCNS3MKOiwgjg2xH60f7"
upstream_id
"cffa18a3-3d2e-43b7-8ca0-31dd105b6c3d"
total_cost
0.01137
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
500
31.60%
Dialogue to Total Word Ratio
Ratio: 74.80%, Deviation: 4.80%
neededClean
false
wordsTotal
500
wordsDialogue
374
65.8002%