NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-18-01KHQ91CYT3DXQJQCQ5HNA1KNY
Run Details
Model:
Claude Sonnet 4.6
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
18.6s
Cost:
$0.011385
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
75.16%
Matches word count
(details)
53.99%
Dialogue to Total Word Ratio
(details)
64.5704%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The van smelled like beef jerky and broken dreams. Molly spread the hand-drawn map across the dashboard while Bill ate his third granola bar of the morning. "Okay, so the party starts at two," Molly said, tapping the paper with her pen. "We go in through the back gate around two-thirty, when everyone's distracted by the clown." "What if there's no clown?" Bill asked. "There's always a clown, Bill." "My cousin Trevor had a magician at his party." "Bill, I don't care about Trevor." "He pulled a rabbit out that wasn't even his rabbit. Borrowed it from a neighbor. The neighbor showed up halfway through and there was a whole thing." Molly stared at him for a long moment. "Can we please focus on the job?" Bill crumpled his granola bar wrapper and shoved it in the cupholder. "Focused. Totally focused. What's the gift situation?" "According to my intel, there are approximately twenty-three presents stacked by the back fence. Parents go all out for this neighborhood. We're talking Legos, remote control cars, probably one of those robots kids can program." "Oh I want the robot," Bill said. "We're selling the robot." "What if I just keep the robot?" "Then I'll sell you." Bill nodded slowly. "Fair enough. What's the extraction plan?" "We back the van up to the side gate at exactly two forty-five. You load, I watch the perimeter. We're gone in four minutes." "What about the bouncy castle?" Molly looked up from the map. "What about it?" "The invite said there'd be a bouncy castle. I'm just saying, if we're already there." "We are not stealing a bouncy castle." "I'm not saying steal it. I'm saying maybe bounce in it for a minute. Just one minute. We've had a hard week." "We absolutely have not earned a bounce." "The Henderson job went sideways through no fault of my own." "You tripped over a garden gnome you were trying to steal that wasn't part of the plan." "He had a very charming face," Bill said quietly. Molly rolled up the map and pointed it at him. "Here is what is going to happen. We drive over, we wait, we grab the presents, we leave. Nobody bounces, nobody takes home a robot, nobody interacts with any garden gnomes or clowns or magicians or neighbors looking for their rabbits. Understood?" "Understood." Bill paused. "Will there be cake, do you think?" "Bill." "It's a legitimate question. I skipped breakfast." "You've eaten three granola bars." "Those are snacks. They don't count as breakfast emotionally." Molly pinched the bridge of her nose and took a very slow breath. Outside the van, a neighborhood kid rolled past on a scooter, blissfully unaware of the criminal enterprise idling at the curb. "Start the van," she finally said. Bill turned the key and grinned. "This is going to be our best job ever." "It's going to be something," Molly muttered.
created_at
"2026-02-18T02:25:21.862212+00:00"
2/18/2026, 2:25:21 AM
model
"anthropic/claude-4.6-sonnet-20260217"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
424
moderation_latency
(null)
generation_time
18569
tokens_prompt
77
tokens_completion
758
native_tokens_prompt
75
native_tokens_completion
744
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.011385
router
(null)
provider_responses
0
id
"msg_01XaCpDSxSSvhJRasqfTU9sN"
status
200
is_byok
false
latency
424
endpoint_id
"6d408764-7dd7-4626-bb87-a6cc1589bc86"
provider_name
"Anthropic"
model_permaslug
"anthropic/claude-4.6-sonnet-20260217"
api_type
"completions"
id
"gen-1771381502-4UWwuoS11xSg8cFNa8jK"
upstream_id
"msg_01XaCpDSxSSvhJRasqfTU9sN"
total_cost
0.011385
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
Evaluation details
Result
Evaluator
Details
Meta Data
75.16%
Matches word count
n/a
neededClean
false
words
487
53.99%
Dialogue to Total Word Ratio
Ratio: 73.51%, Deviation: 3.51%
neededClean
false
wordsTotal
487
wordsDialogue
358
64.5704%