NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-18-01KHQ934JRFD4B6V7ZN7RM53J7
Run Details
Model:
Claude Sonnet 4.6
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
18.1s
Cost:
$0.011640
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
68.10%
Matches word count
(details)
15.26%
Dialogue to Total Word Ratio
(details)
41.6815%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The van smelled like cigarettes and broken dreams. Molly spread the hand-drawn map across the dashboard while Bill ate a bag of chips with the enthusiasm of a man who had nowhere better to be, which was accurate. "Okay, listen up," Molly said, tapping the paper with a pen. "The party starts at two. We go in at three, when the cake comes out. Everyone watches the cake." "Everyone watches the cake," Bill repeated, nodding slowly. "Why do we know so much about how children's parties work?" "Because I researched it, Bill." "You researched children's parties." "For the heist. Stop making it weird." Bill set down his chips. "The presents are going to be in a pile, yeah? Like a big colorful pile?" "That's correct." "And we're stealing from a seven-year-old." "We're stealing from wealthy parents who spent too much money on a seven-year-old," Molly said. "There's a moral distinction." "Is there?" "I've decided there is, so yes." Bill picked his chips back up. "What about the clown?" Molly's pen stopped moving. "What clown?" "I saw on the invitation, it said Bingo the Birthday Clown. There's going to be a clown there, Molly." "You saw the invitation? How did you see the invitation?" "I found it on the ground." "You stole it from the mailbox, didn't you." "I found it," Bill said firmly, "on the ground, near a mailbox, which is practically outside." Molly closed her eyes for a moment. "The clown is not a problem. Clowns are focused on the children." "Clowns are focused on chaos, Molly. That's their whole thing. You cannot predict a clown." "We don't need to predict him, we need to avoid him." "And if we can't avoid him?" "Then we improvise." "Last time you said we'd improvise I ended up handcuffed to a shopping cart in Wolverhampton." "That was a completely different situation." "I had to eat a Pret A Manger sandwich with one hand." "Bill, focus." Molly circled something on the map. "We come in through the garden gate, which the invitation says will be propped open for guests." "So we're using the clown's invitation." "We're using the information contained within it, yes." "Molly, we are crashing a child's birthday party dressed as guests to steal presents while a clown performs." "When you say it like that it sounds bad." "It sounds like a fever dream." "It sounds like a payday," Molly said. "There are forty kids on that guest list. Forty kids from a school in Kensington. Do you know what those parents spend on birthday presents?" Bill went quiet for a moment. "How much?" "An average of sixty pounds each." Bill did the math slowly, his lips moving. His eyes went wide. "That's two thousand four hundred pounds." "Give or take." He folded the chip bag carefully and put it in his pocket. "I'll get the black bags." "That's what I thought."
created_at
"2026-02-18T02:26:18.295755+00:00"
2/18/2026, 2:26:18 AM
model
"anthropic/claude-4.6-sonnet-20260217"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
486
moderation_latency
(null)
generation_time
18072
tokens_prompt
77
tokens_completion
762
native_tokens_prompt
75
native_tokens_completion
761
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.01164
router
(null)
provider_responses
0
id
"msg_01Nxn2taf4pZmJCqBPjx5kcb"
status
200
is_byok
false
latency
486
endpoint_id
"6d408764-7dd7-4626-bb87-a6cc1589bc86"
provider_name
"Anthropic"
model_permaslug
"anthropic/claude-4.6-sonnet-20260217"
api_type
"completions"
id
"gen-1771381559-orIIqDYiCc9fljbnzT9A"
upstream_id
"msg_01Nxn2taf4pZmJCqBPjx5kcb"
total_cost
0.01164
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
Evaluation details
Result
Evaluator
Details
Meta Data
68.10%
Matches word count
n/a
neededClean
false
words
486
15.26%
Dialogue to Total Word Ratio
Ratio: 76.13%, Deviation: 6.13%
neededClean
false
wordsTotal
486
wordsDialogue
370
41.6815%