NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-18-01KHQ929VQPMRQ3VFKYP2ENJVY
Run Details
Model:
Claude Sonnet 4.6
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
20.3s
Cost:
$0.011190
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
35.00%
Matches word count
(details)
99.92%
Dialogue to Total Word Ratio
(details)
67.4624%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The van smelled like corn chips and desperation. Molly spread the hand-drawn map across the dashboard and smoothed out the creases with her palm. "Okay, so the party starts at two," she said, tapping a crayon-drawn rectangle labeled THE HOUSE. "We go in through the back gate during the piñata. Everyone looks at the piñata. Nobody looks at us." Bill squinted at the map. "Did you draw this with an actual crayon?" "It was all I had." "Is that a unicorn in the corner?" "That is a guard dog, Bill." "It has a horn." "It has a very pointy ear. Focus." Molly capped her marker and turned to face him. "We back the van to the side fence, load up fast, and we're gone before they finish singing happy birthday." Bill scratched the back of his neck. "How many kids we talking?" "The invitation said fifteen." "You got an invitation?" "I RSVPd yes and everything. They're expecting a Molly Hendricks." Bill stared at her. "You RSVPd to the birthday party you're robbing." "It's called reconnaissance, Bill. I also know there's going to be a bouncy castle and a magician named The Amazing Gerald. Gerald is our window." "Our window." "When Gerald does his big finale, every single one of those children is going to be watching a scarf come out of his nose. That is our moment. That is when we move." Bill leaned back in his seat and rubbed his eyes. "Molly, they're children's presents. We're going to steal toys." "Expensive toys. Derek Fenton is turning seven and his parents are absolutely loaded. I saw the registry." "The registry." "Birthday parties have registries now, Bill. It's a whole thing. There's a remote control excavator on there that retails for two hundred dollars." "He's seven." "He's a seven year old with rich parents and no concept of how lucky he is. We are simply redistributing." "To ourselves." "To ourselves, yes, obviously, but the point stands philosophically." Bill picked up the crayon map and studied it again. The unicorn-dog stared back at him from the corner with enormous, innocent eyes. Someone had given it a tiny smile. "What's the magician's finale?" he asked. "He pulls a rabbit out of a hat, apparently. Full rabbit. Live." "Gerald travels with a live rabbit?" "Gerald travels with two, according to his website. He's very professional." "What's the second rabbit for?" "Bill, I don't know Gerald's entire life. The point is, rabbit comes out, children lose their minds, we load the van." Molly rolled up the map and tucked it under her arm. "Simple. Clean. Elegant." Bill looked out the windshield at the sunny suburban street ahead of them. A family of lawn flamingos stood near a mailbox, perfectly still and somehow judgmental. "I just want the record to show," he said quietly, "that I had reservations." "Noted. Now put on your balloon hat."
created_at
"2026-02-18T02:25:53.457211+00:00"
2/18/2026, 2:25:53 AM
model
"anthropic/claude-4.6-sonnet-20260217"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
436
moderation_latency
(null)
generation_time
20197
tokens_prompt
77
tokens_completion
756
native_tokens_prompt
75
native_tokens_completion
731
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.01119
router
(null)
provider_responses
0
id
"msg_011MYWVmvUfvZMmhwZxAgbc6"
status
200
is_byok
false
latency
429
endpoint_id
"6d408764-7dd7-4626-bb87-a6cc1589bc86"
provider_name
"Anthropic"
model_permaslug
"anthropic/claude-4.6-sonnet-20260217"
api_type
"completions"
id
"gen-1771381532-PkYKKCt0vEHkw29cgQvN"
upstream_id
"msg_011MYWVmvUfvZMmhwZxAgbc6"
total_cost
0.01119
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
Evaluation details
Result
Evaluator
Details
Meta Data
35.00%
Matches word count
n/a
neededClean
false
words
482
99.92%
Dialogue to Total Word Ratio
Ratio: 70.12%, Deviation: 0.12%
neededClean
false
wordsTotal
482
wordsDialogue
338
67.4624%