NC Bench
Overview
About
Tests
Models
Model
Claude Opus 4.5
Test
Dialogue tags
Scenario
Write 500 words with 70% dialogue
Run ID
2026-02-08-01KGZ055KBQVM9C4S6EQRFM3D2
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the crayon-drawn map across the sticky diner table and tapped it with her coffee spoon. "This is the target. Little Timmy Henderson's seventh birthday party. Saturday at two." Bill squinted at the paper. "Is that a dinosaur or the bouncy castle?" "Bouncy castle. Focus, Bill. We've got a narrow window here." "Walk me through it again. I'm still not clear on why we're stealing presents from children." "We're not stealing from children. We're stealing from their parents who bought the presents. There's a moral difference." "Is there though?" Molly sighed and leaned back in the booth. "Look, do you know how much a LEGO Death Star costs these days? And the Henderson kid's parents are loaded. His friends are loaded. We're talking premium gifts here. None of that dollar store nonsense." "Fair point. So what's our entry strategy?" "I've arranged for us to be hired as the entertainment. You're going to be a clown." Bill's face went pale. "Absolutely not. I have dignity, Molly." "You literally cried during our last job because a squirrel startled you." "That squirrel was aggressive and you know it." "You're wearing the clown suit. I've already rented it. The shoes are enormous." "What are you going as then?" "I'm the balloon animal artist. Much more sophisticated." "That's the same thing with extra steps!" "It's really not. Now, the presents will be stacked on a table in the dining room. While you're doing your routine in the backyard, I'll excuse myself to use the bathroom and start moving packages to the van." "How many trips are we talking?" "Depends on generosity levels. Could be six, could be twelve. These people wrap everything individually. Last year someone gave their kid a pony." "A pony won't fit in the van, Molly." "I know that. I'm just illustrating the wealth we're dealing with." "What about security?" "It's a seven-year-old's birthday party, Bill." "You'd be surprised. My nephew's party had a bouncer." "Did it really?" "Well, it was his uncle Steve, but he took it very seriously." Molly rubbed her temples. "There's no bouncer. There's one golden retriever named Biscuit who, according to my research, is afraid of loud noises." "Perfect. I'll bring my accordion." "You don't play accordion." "No, but I own one, and I've been waiting for a chance to use it." "This is exactly why our jobs keep going sideways." "Our jobs go sideways because you insist on over-planning everything. Sometimes you just need to improvise." "Improvisation is how you ended up stuck in that chimney for six hours." "The architectural plans were misleading." "You couldn't read them because you forgot your glasses." Bill crossed his arms. "Are we doing this heist or not?" "We're doing it. Saturday. Two o'clock. Wear the clown suit. Don't bring the accordion." "Fine. But I'm keeping the red nose afterward." "Whatever makes you happy, Bill."
created_at
"2026-02-08T16:08:20.987477+00:00"
Sunday, February 8th, 2026 at 4:08:20 PM GMT+00:00
model
"anthropic/claude-4.5-opus-20251124"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1515
moderation_latency
194
generation_time
17156
tokens_prompt
61
tokens_completion
645
native_tokens_prompt
75
native_tokens_completion
757
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0193
router
(null)
provider_responses
0
id
"c2b8ce49-345a-4cfe-8264-cf5c28e042dd"
status
200
is_byok
false
latency
1515
endpoint_id
"de32581c-2e11-4f54-8d29-e8a6e19d3038"
provider_name
"Amazon Bedrock"
model_permaslug
"anthropic/claude-4.5-opus-20251124"
api_type
"completions"
id
"gen-1770566882-X8p8Ik3pZEhrNxCxnpr7"
upstream_id
"c2b8ce49-345a-4cfe-8264-cf5c28e042dd"
total_cost
0.0193
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
Evaluation details
Result
Evaluator
Details
Meta Data
20.1897%
Matches word count
n/a
neededClean
false
words
480
0.0000%
Dialogue to Total Word Ratio
Ratio: 91.04%, Deviation: 21.04%
neededClean
false
wordsTotal
480
wordsDialogue
437
10.0948%