NC Bench
Overview
About
Tests
Models
Vendors
Compare
Run: 2026-02-24-01KJ8X0ESYREE7CF84CM1WTC95
Run Details
Model:
Claude Sonnet 4.6 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
38.1s
Cost:
$0.048786
Provider:
Amazon Bedrock
Evaluation summary
Score
Evaluator
Details Link
81.27%
Matches word count
(details)
10.13%
Dialogue to Total Word Ratio
(details)
45.6993%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the crayon drawing across the hood of their beat-up van and tapped it with her finger. "This is our target," she said. Bill leaned in, squinting. "Is that a rainbow?" "That's the driveway." "Why is it a rainbow?" "Because I borrowed a crayon from a kid, Bill. Focus. Now, the party starts at two. We go in at three, when everyone's hopped up on cake and nobody's watching the present table." "What's the present table?" "It's a table. With presents on it." Bill nodded slowly. "Right. And we're stealing those." "That is the entire plan I just described, yes." "From a seven-year-old's birthday party." "From twelve seven-year-olds' birthday parties, actually. It's a joint party. Twelve kids, one bouncy castle, one absolutely enormous pile of gifts." Molly tapped the drawing again. "This right here is gold, Bill." "That looks like a sun." "That's the pile of presents." Bill scratched his chin. "I feel like we should talk about whether this is a good idea." "Would a bad idea have aerial reconnaissance?" "You drew that from memory in a parking lot." "Strategic memory," Molly said. "Now, the parents will be distracted by the entertainment." "What's the entertainment?" "A magician named Bongo." "That's not a real name." "Bongo doesn't care what you think his name is, Bill. The point is, Bongo does a forty-five minute show. That's our window. We load the gifts into the van, we're gone before anyone notices." "What about the kids?" "What about them?" "They'll notice. Children are very observant. My nephew notices everything. Last week he told me my shoes were ugly." "Were they?" Bill paused. "Devastatingly. But that's beside the point." "We'll be in and out in eight minutes. Kids that age have the attention span of a goldfish with a concussion. They won't see a thing." "And if they do?" "We tell them we're elves." Bill stared at her. "We're both forty-three." "Seasoned elves." "Molly." "Large elves." "I'm not doing the elf thing." "Fine." Molly rolled up the drawing and tucked it under her arm. "We tell them we're from the present quality control department and we need to do a safety inspection." "That's somehow worse." "Do you have a better idea?" Bill thought about it for a long moment. A pigeon landed on the van's windshield and stared at them both with profound judgment before flying away. "No," Bill admitted. "Wonderful. Disguises are in the back. I got us matching polo shirts." "What color?" "Yellow." "Why yellow?" "Because it was on sale, Bill, what kind of question is that." Molly walked around to the driver's side and climbed in. "Now get in. We've got a birthday party to ruin." Bill looked at the van. He looked at the sky. He got in. "For the record," he said, "I think this is going to go badly." "Everything we do goes badly," Molly said cheerfully, starting the engine. "That's what makes us consistent."
created_at
"2026-02-24T22:41:08.94Z"
2/24/2026, 10:41:08 PM
model
"anthropic/claude-4.6-sonnet-20260217"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
658
moderation_latency
249
generation_time
37686
tokens_prompt
77
tokens_completion
1552
native_tokens_prompt
92
native_tokens_completion
3234
native_tokens_completion_images
(null)
native_tokens_reasoning
753
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.048786
router
(null)
provider_responses
0
endpoint_id
"83d802cb-44f2-45a0-a9d5-cb9bf79777c8"
id
"0369fa19-bc4c-4f4d-afeb-e3d403113db2"
is_byok
false
latency
658
model_permaslug
"anthropic/claude-4.6-sonnet-20260217"
provider_name
"Amazon Bedrock"
status
200
api_type
"completions"
id
"gen-1771972868-YMGTSntVl6pTztmzE2GW"
upstream_id
"0369fa19-bc4c-4f4d-afeb-e3d403113db2"
total_cost
0.048786
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
Evaluation details
Result
Evaluator
Details
Meta Data
81.27%
Matches word count
n/a
neededClean
false
words
488
10.13%
Dialogue to Total Word Ratio
Ratio: 76.77%, Deviation: 6.77%
neededClean
false
wordsTotal
495
wordsDialogue
380
45.6993%