NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-18-01KHQ925Y75SHN9FM6BGWGQRBD
Run Details
Model:
Claude Sonnet 4.6
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
15.1s
Cost:
$0.011355
Provider:
Amazon Bedrock
Evaluation summary
Score
Evaluator
Details Link
60.28%
Matches word count
(details)
22.48%
Dialogue to Total Word Ratio
(details)
41.3756%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The van smelled like stale chips and desperation. Molly spread the hand-drawn map across the dashboard while Bill ate a sandwich with the focus of a man who had absolutely no concerns about anything. "Are you looking at this?" Molly tapped the paper hard enough to leave a dent. "I'm looking at it." "You're looking at your sandwich." "I can do both. I'm multitasking. It's a modern skill." Molly pinched the bridge of her nose. "Okay. So the party starts at two. The bouncy castle gets delivered at one-thirty, which means everyone's eyes are going to be on the bouncy castle." "I love bouncy castles." "Bill." "Sorry. Continue." "The presents are going to be stacked in the garage. The parents always put them in the garage so the kids don't tear into them before cake. I've seen it a hundred times." Bill finally set down his sandwich. "How have you seen it a hundred times? How many children's parties have you surveilled?" "That's not the point." "It feels like it might be a point worth exploring." "The garage has a side door," Molly continued, stabbing the map with her finger. "You come in from the alley, I keep watch from the street. We load the bags in under four minutes and we're gone before they finish singing happy birthday." Bill wiped his hands on his jacket. "What are we actually going to do with a hundred presents meant for a six year old?" "We sell them." "We're going to sell a bunch of toy dinosaurs and craft kits." "People buy anything online, Bill." "Fair point." He leaned over and squinted at the map. "Is this dot the bouncy castle or the birthday child?" "It's the bouncy castle. Why would I draw the birthday child on the map?" "For context." "I don't need a six year old on my operational map for context." Bill nodded slowly. "What's the child's name?" "Why does that matter?" "It doesn't. I'm just curious." "Theo." "Theo." Bill repeated the name quietly, staring through the windshield at nothing. "We're stealing from a kid named Theo." "We're not stealing from Theo. We're stealing from Theo's guests. There's a distinction." "Is there though?" "Philosophically, yes." "I don't think that holds up." "It holds up fine." Molly rolled up the map with authority. "Four minutes, side door, alley exit. You wear the jacket with the deep pockets. I drive." "You always drive." "Because you once took a wrong turn leaving a car park for eleven minutes." "The signage was misleading." "There was one sign, Bill. It said exit." He picked up his sandwich again. "So we're really doing this. Stealing birthday presents from small children." "From their guests." "Right. The philosophical distinction." He chewed thoughtfully. "Do you think they'll have one of those party bags? With the little sweets inside?" Molly stared at him for a long moment. "Get out of the van."
created_at
"2026-02-18T02:25:43.850247+00:00"
2/18/2026, 2:25:43 AM
model
"anthropic/claude-4.6-sonnet-20260217"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
536
moderation_latency
204
generation_time
14779
tokens_prompt
77
tokens_completion
770
native_tokens_prompt
75
native_tokens_completion
742
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.011355
router
(null)
provider_responses
0
id
"0aa8eb8a-11d3-477c-9346-1cc5b7e25612"
status
200
is_byok
false
latency
535
endpoint_id
"83d802cb-44f2-45a0-a9d5-cb9bf79777c8"
provider_name
"Amazon Bedrock"
model_permaslug
"anthropic/claude-4.6-sonnet-20260217"
api_type
"completions"
id
"gen-1771381528-3uqjtvfBBLi0L2pzQThh"
upstream_id
"0aa8eb8a-11d3-477c-9346-1cc5b7e25612"
total_cost
0.011355
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
Evaluation details
Result
Evaluator
Details
Meta Data
60.28%
Matches word count
n/a
neededClean
false
words
485
22.48%
Dialogue to Total Word Ratio
Ratio: 75.46%, Deviation: 5.46%
neededClean
false
wordsTotal
485
wordsDialogue
366
41.3756%