NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-07-01-01KWE8RZ0FMCKMGM3M32A0T0ME
Run Details
Model:
Claude Sonnet 5 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
17.1s
Cost:
$0.011156
Provider:
Amazon Bedrock
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.9995%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The back room of Bill's apartment smelled like stale pizza and desperation, which seemed fitting given their current profession. Molly spread the blueprints across the coffee table, her finger tracing a path through what appeared to be a suburban backyard. "So the party starts at two," she said, tapping a spot marked with a smiley face sticker. "Bounce house goes up at one-thirty. That's our window." Bill leaned back in his chair, arms crossed. "We're really doing this. We're really stealing presents from a seven-year-old's birthday party." "Timmy Henderson's birthday party, specifically. And yes, Bill, we're really doing this. You know how many presents a kid that age gets? His parents are loaded. We're talking Legos, video games, probably a drone or something ridiculous." "A drone. For a seven-year-old." "Rich people don't know what to do with their money, Bill. That's why we take it from them." Bill picked up a small rubber mask from the table—a cartoon dinosaur, the kind you'd buy at a party store. "And this is supposed to help us blend in?" "It's a theme party. Dinosaur theme. Everyone's wearing masks." "I look like I'm about to rob a preschool." "You are about to rob a preschool. Well, not a preschool. A birthday party. Same energy." Molly grabbed her own mask—a purple triceratops—and pulled it over her face, muffling her next words. "The plan is simple. We arrive during the cake cutting. Everyone's distracted, eyes on the candles, singing that ridiculous song. We slip into the present pile, load up the bags, and walk out the side gate before anyone notices two extra dinosaurs at the party." "And if someone does notice?" "Then we say we're entertainers. Dino-themed entertainment. Very popular these days." "Molly, we don't have any dinosaur entertainment skills." "I can do a mean Jurassic Park roar." Bill stared at her. "That's not a skill, that's a sound effect." "It's a skill if it gets us out of a birthday party without getting arrested." He sighed, running a hand through his hair. "This feels wrong on a fundamental level. We're professional thieves, Molly. We've hit banks. We've hit jewelry stores. And now we're going after Tonka trucks and stuffed animals." "The Hendersons have a home security system worth more than most banks' vaults. This is a lucrative career move, actually. Nobody expects a heist at a kid's birthday party." "Because it's insane." "Because it's genius." Molly rolled up the blueprints and tucked them under her arm. "Besides, think of the resale value on unopened LEGO sets. Collectors go crazy for that stuff." Bill pulled the dinosaur mask over his face, his voice coming out muffled and defeated. "If we get caught, I'm blaming you." "If we get caught, we'll have bigger problems than blame, Bill. We'll be in a holding cell with actual criminals, telling them we got busted stealing from children." "When you put it that way, it doesn't sound better." "It never does. Let's go steal some presents."
created_at
"2026-07-01T07:19:56.7Z"
7/1/2026, 7:19:56 AM
model
"anthropic/claude-sonnet-5-20260630"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
3097
moderation_latency
128
generation_time
16962
tokens_prompt
77
tokens_completion
827
native_tokens_prompt
93
native_tokens_completion
1097
native_tokens_completion_images
(null)
native_tokens_reasoning
39
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.011156
router
(null)
provider_responses
0
endpoint_id
"37b16f45-eeae-45f3-9b85-2137f7c6d61b"
id
"msg_01KXvec2NXvhuJhbe7ycGVcm"
is_byok
false
latency
1601
model_permaslug
"anthropic/claude-sonnet-5-20260630"
provider_name
"Amazon Bedrock"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.17.0; linux; x64))"
http_referer
(null)
request_id
"req-1782890396-uz8eFyqm8zbYpMc3a2bL"
session_id
(null)
preset_id
(null)
api_type
"completions"
id
"gen-1782890396-h2s7R1QzEGnSrtYhhP5O"
upstream_id
"msg_01KXvec2NXvhuJhbe7ycGVcm"
total_cost
0.011156
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
response_cache_source_id
(null)
data_region
"global"
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
499
0.00%
Dialogue to Total Word Ratio
Ratio: 73.07%, Deviation: 23.07%
neededClean
false
wordsTotal
505
wordsDialogue
369
49.9995%