NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-07-01-01KWE9TF81JD1WJ6R4G4BRR8SY
Run Details
Model:
Claude Sonnet 5 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
18.5s
Cost:
$0.012176
Provider:
Amazon Bedrock
Evaluation summary
Score
Evaluator
Details Link
99.98%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.9926%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The warehouse smelled like old cardboard and Bill's cologne, which he insisted was called "Midnight Thunder" but smelled suspiciously like gas station cinnamon sticks. Molly spread the blueprints across a wobbly table, careful not to spill her coffee on what had taken her three days to steal from a birthday planning website. "So," Bill said, squinting at the papers like they were written in ancient Sumerian, "we're robbing a seven-year-old's birthday party." "We're liberating presents from a seven-year-old's birthday party," Molly corrected. "There's a difference." "Is there?" "Yes. One sounds like we're monsters, and the other sounds like we're Robin Hood." Bill scratched his beard, which he'd grown specifically for this job because he thought it made him look "less recognizable," though it mostly made him look like a confused lumberjack. "Robin Hood didn't steal Nintendo Switches from children, Molly." "Robin Hood didn't have my rent due Friday either." She tapped the blueprint's largest room, a rectangle labeled "BOUNCE HOUSE ZONE - DO NOT ENTER WITH SHOES." "The presents will be stacked here, by the cake table. Forty-three kids, one inflatable castle, and a clown named Zippy who's actually an off-duty accountant." "How do you know all this?" "I infiltrated the group chat." "There's a group chat for seven-year-olds?" "For their mothers. It's terrifying, Bill. They coordinate snacks with military precision." Bill leaned back, chair creaking dangerously. "What's our cover?" "You're the balloon guy." "I don't know how to make balloon animals." "Nobody does, Bill. That's the beauty of it. You just twist things until a child cries, and everyone assumes it's art." "And you?" "Face painting." Molly held up a small kit she'd lifted from a craft store, price tag still dangling. "I watched exactly one YouTube tutorial." "One?" "It was eleven minutes long. I know things now." Bill stood, pacing in the tight space between crates. "This feels wrong. Aren't we supposed to be, I don't know, sophisticated criminals? Jewel thieves? Bank robbers?" "Bill, we tried the bank thing. You forgot the getaway car had a manual transmission." "That was one time—" "You stalled in the vault doorway for four minutes. The security guard offered you driving lessons." He winced. "Fine. Fair." "This is simple. We get in, we do our little jobs, and while everyone's distracted by cake and Zippy's inevitable breakdown, we load the presents into the rented minivan." "We rented a minivan?" "It has a Blue's Clues sticker on it. Very inconspicuous." Bill grabbed his balloon pump, testing it with a squeeze that produced a sound like a dying goose. "And if we get caught?" Molly smiled, tucking the blueprints into her bag. "We're adults at a children's party, Bill. Worst case, they just think we're weird." "We ARE weird." "Exactly. Perfect cover." He sighed, slinging the balloon bag over his shoulder like it weighed as much as his dignity. "For the record, I think stealing from children might be a new low." "Bill, we once stole a wedding cake." "That guy owed us money." "The cake didn't."
created_at
"2026-07-01T07:38:14.669Z"
7/1/2026, 7:38:14 AM
model
"anthropic/claude-sonnet-5-20260630"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2601
moderation_latency
(null)
generation_time
18382
tokens_prompt
77
tokens_completion
877
native_tokens_prompt
93
native_tokens_completion
1199
native_tokens_completion_images
(null)
native_tokens_reasoning
39
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.012176
router
(null)
provider_responses
0
endpoint_id
"37b16f45-eeae-45f3-9b85-2137f7c6d61b"
id
"msg_01QdW9XJC4dyA3af323qp3Fm"
is_byok
false
latency
1582
model_permaslug
"anthropic/claude-sonnet-5-20260630"
provider_name
"Amazon Bedrock"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.17.0; linux; x64))"
http_referer
(null)
request_id
"req-1782891494-sw1GhrDArliwqvxsgVYn"
session_id
(null)
preset_id
(null)
api_type
"completions"
id
"gen-1782891494-2n6aIedKGj7emMDuR3Ji"
upstream_id
"msg_01QdW9XJC4dyA3af323qp3Fm"
total_cost
0.012176
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
response_cache_source_id
(null)
data_region
"global"
Evaluation details
Result
Evaluator
Details
Meta Data
99.98%
Matches word count
n/a
neededClean
false
words
502
0.00%
Dialogue to Total Word Ratio
Ratio: 65.10%, Deviation: 15.10%
neededClean
false
wordsTotal
510
wordsDialogue
332
49.9926%