NC Bench
Overview
About
Tests
Models
Vendors
Compare
Run: 2026-02-24-01KJ8V2TFBQPH9G360EXA0PQ62
Run Details
Model:
Claude Sonnet 4.6 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
2m 7s
Cost:
$0.157251
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
95.99%
Matches word count
(details)
98.73%
Dialogue to Total Word Ratio
(details)
97.3597%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The van smelled like old cheese and ambition, which Molly had decided was simply the natural scent of criminal enterprise. She spread the hand-drawn map across the dashboard, smoothing out the crayon marks that had come with it — an unfortunate consequence of stealing your blueprints from a six-year-old's party invitation. "The bounce house is our biggest problem," she said, tapping a pink smudge that represented the backyard of 14 Primrose Lane. "It blocks the eastern approach. We go north, past the clown." "I'm not going past the clown." "Bill." "I have a medically recognized and deeply reasonable discomfort around men with painted faces who twist animals from rubber." "That's just being afraid of clowns." "Call it what you like." Bill folded his arms and looked out the windshield with the expression of a man who felt profoundly misunderstood by his chosen profession. Molly returned to the intelligence report, which was a collection of photographs Bill had taken while lurking near the garden fence of number fourteen the previous Saturday. In nearly every photo, his thumb occupied the foreground with enormous confidence. The party was still visible in the background — streamers in aggressive yellow, and a table stacked eight presents high, wrapped in paper so cheerful it constituted a mild visual assault on any adult within viewing distance. "There's something in photo nine," Bill said, tapping the pile. "Behind my thumb." Molly squinted at it. Behind approximately eighty percent thumb, the blurred outline of a golden retriever wearing a party hat was just visible. "It's a dog." "A large dog." "We'll manage." "It looks aggressive, Molly." "It's wearing a party hat, Bill." She returned the photo to the pile and moved on. "The gifts get moved inside at cake time. Standard practice. Twelve minutes of singing, thirteen if the child gets emotional, and that table is ours." "What if someone asks what we're doing there?" "We wave and look friendly. We're aunts." "I'm not an aunt, Molly. I'm a man." "An aunt can be a man." "I genuinely did not know that." Bill reached into the back of the van and produced two small rucksacks decorated with cartoon dinosaurs. He had purchased them specifically to blend in. Molly had provided seventeen solid minutes of counterargument explaining that two adults wearing dinosaur rucksacks would not blend in at a children's birthday party under any reasonable interpretation of blending. Bill had noted they were on sale. The debate had reached no formal resolution. Outside, the party had begun. The sound of children in a state of peak excitement rolled across the quiet suburban afternoon like a small, sugary thunderstorm. Balloons bobbed at the gate with an optimism that Molly found quietly offensive. She checked her watch. Twenty minutes to cake. "We go in calm," she said, hoisting the rucksack onto her shoulders with as much dignity as the dinosaurs allowed. "We smile. We are professionals." Bill put his rucksack on upside down. "Naturally," he said.
created_at
"2026-02-24T22:07:29.276Z"
2/24/2026, 10:07:29 PM
model
"anthropic/claude-4.6-sonnet-20260217"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
650
moderation_latency
(null)
generation_time
127078
tokens_prompt
77
tokens_completion
5150
native_tokens_prompt
92
native_tokens_completion
10465
native_tokens_completion_images
(null)
native_tokens_reasoning
4349
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.157251
router
(null)
provider_responses
0
endpoint_id
"6d408764-7dd7-4626-bb87-a6cc1589bc86"
id
"msg_016tbbLVFLp8JALrH5vzZvfk"
is_byok
false
latency
634
model_permaslug
"anthropic/claude-4.6-sonnet-20260217"
provider_name
"Anthropic"
status
200
api_type
"completions"
id
"gen-1771970849-RXH6m1LZc9GbDgQU2CBa"
upstream_id
"msg_016tbbLVFLp8JALrH5vzZvfk"
total_cost
0.157251
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
Evaluation details
Result
Evaluator
Details
Meta Data
95.99%
Matches word count
n/a
neededClean
false
words
492
98.73%
Dialogue to Total Word Ratio
Ratio: 30.51%, Deviation: 0.51%
neededClean
false
wordsTotal
495
wordsDialogue
151
97.3597%