NC Bench
Overview
About
Tests
Models
Model
Claude Opus 4.5
Test
Dialogue tags
Scenario
Write 500 words with 70% dialogue
Run ID
2026-02-08-01KGYZHRA95MZESN7FNDT679B6
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the crayon-drawn map across the sticky diner table and jabbed her finger at a wobbly circle labeled BOUNCY CASTLE in purple. Bill squinted at it. "Where did you get this?" "I paid a six-year-old informant two dollars and a fruit roll-up. Worth every penny. Now focus. The party starts at two." "I still can't believe we're doing this. It's a kid's birthday party, Molly." "It's not just any kid's birthday party. It's Preston Worthington the Fourth's birthday party. His dad is a tech billionaire. We're talking about presents that cost more than my car." "Everything costs more than your car. Your car is held together with duct tape and spite." "Exactly. Which is why we need this score. Now, according to my source, the presents will be kept in the pool house until cake time. That gives us a forty-five minute window." Bill rubbed his temples. "Your source is in kindergarten." "First grade, actually. She's very advanced. Now, I'll create a diversion at the balloon animal station while you slip around back." "What kind of diversion?" "I'm going to request a balloon giraffe with anatomically correct detail." "That's your plan? Confusing a clown?" "Disturbing a clown. There's a difference. While he's having an existential crisis, you grab whatever you can carry and meet me at the bounce house." "Why the bounce house?" "Because nobody suspects the bounce house, Bill. It's the perfect extraction point. We bounce our way to freedom." "You want me to escape through a bounce house carrying stolen presents?" "It's called thinking outside the box. Actually, it's thinking inside a large inflatable box, but you get the idea." Bill stared at her. "What about security?" "There's a golden retriever named Biscuits. I've been told he can be bribed with hot dogs." "Hot dogs. You're bribing a dog with hot dogs." "I didn't name him, Bill. I just exploit his weaknesses like a professional." "And if someone sees us?" "We're wearing disguises. I got us matching princess costumes." "Absolutely not." "Fine, you can be the pirate. But I'm keeping the tiara." "This is insane. We're going to get caught by a bunch of eight-year-olds." "Seven-year-olds. And no, we won't. Children are notoriously bad witnesses. They can barely describe what they had for breakfast." "What about the parents?" "The parents will be wine drunk by two-thirty. Trust me, I've done reconnaissance. These people start early." Bill looked at the crayon map again, then at Molly's earnest face. "If I end up in prison over a birthday party, I'm never speaking to you again." "You won't go to prison. Worst case scenario, we end up in a ball pit." "How is that worst case?" "Have you ever tried to run in a ball pit? It's basically quicksand with primary colors." Bill sighed heavily. "I want seventy percent." "Sixty-forty, and I'll let you keep the pirate hat." "Deal."
created_at
"2026-02-08T15:57:44.610587+00:00"
Sunday, February 8th, 2026 at 3:57:44 PM GMT+00:00
model
"anthropic/claude-4.5-opus-20251124"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1571
moderation_latency
194
generation_time
16975
tokens_prompt
61
tokens_completion
661
native_tokens_prompt
75
native_tokens_completion
759
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.01935
router
(null)
provider_responses
0
id
"469c0c6c-fee6-4dda-90b2-6ad20912db5f"
status
200
is_byok
false
latency
1570
endpoint_id
"de32581c-2e11-4f54-8d29-e8a6e19d3038"
provider_name
"Amazon Bedrock"
model_permaslug
"anthropic/claude-4.5-opus-20251124"
api_type
"completions"
id
"gen-1770566246-kwLOjxcQl5dCDmX0itC7"
upstream_id
"469c0c6c-fee6-4dda-90b2-6ad20912db5f"
total_cost
0.01935
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
Evaluation details
Result
Evaluator
Details
Meta Data
90.4837%
Matches word count
n/a
neededClean
false
words
490
0.0000%
Dialogue to Total Word Ratio
Ratio: 89.59%, Deviation: 19.59%
neededClean
false
wordsTotal
490
wordsDialogue
439
45.2419%