NC Bench
Overview
About
Tests
Models
Vendors
Compare
Run: 2026-02-24-01KJ8W36T5PFSJ34WRR5WMZQWN
Run Details
Model:
Claude Opus 4.6 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
1m 48s
Cost:
$0.215235
Provider:
Amazon Bedrock
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
71.41%
Dialogue to Total Word Ratio
(details)
35.7035%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the hand-drawn map across the hood of their rusted sedan, smoothing out the crayon marks with the reverence of a general surveying a battlefield. The map had cost them eleven dollars and a juice box, procured from a kindergartner named Devin who had no loyalty to anyone but himself. Bill leaned over her shoulder, squinting at what appeared to be a lopsided house surrounded by stick figures and a disturbing number of suns. "Why are there four suns?" Bill asked. "Artistic license. Focus." Molly tapped a wobbly rectangle near the back of the house. "This is the gift table. Devin confirmed at least thirty presents. We're talking remote control cars, building sets, maybe even one of those tablets parents buy to keep their kids quiet at dinner." Bill scratched the stubble on his chin and frowned at the map like it owed him money. He had pulled off a jewelry store in Reno and a payroll truck outside Sacramento, but something about this particular job made his stomach clench in a way that felony grand theft never had. Maybe it was the thought of two dozen sugar-fueled six-year-olds standing between him and the exit. Maybe it was the fact that their intel came from a child who ate glue. "What about security?" "One dad on the grill, two moms at the bounce house, and a grandmother who Devin says falls asleep by two. We go in at two fifteen." Bill absorbed this. He had questions, most of them existential, but he stuck to logistics. "And the clown?" Molly paused. She pulled a folded flyer from her jacket pocket and studied it under the pale afternoon light. The party entertainment was listed as Mr. Giggles the Magnificent, a local performer who specialized in balloon animals and minor emotional trauma. His online reviews were deeply polarizing. "The clown is the wild card," she admitted. "He does his act for forty-five minutes, then disappears into the bathroom for what Devin described as a very long time. That's our window. We grab everything and we're gone before anyone notices." Bill nodded slowly, the way he always did when he was pretending to understand a plan he found deeply stupid. He had concerns. Legitimate, adult concerns, like the fact that their getaway vehicle had a check engine light that had been glowing since February, or that Molly's disguise consisted entirely of a yellow sundress and a floppy hat she had stolen from a scarecrow. There was also the matter of his own costume, which Molly had laid out on the back seat that morning without explanation or apology. A Hawaiian shirt. Cargo shorts. The other floppy hat. "I'm not wearing the hat," he said. "You're wearing the hat, Bill. We're supposed to be eccentric aunt and uncle types. Nobody questions a man in a floppy hat carrying a large bag." "Everybody questions that." Molly folded the map with the careful precision of someone who had never once second-guessed herself and tucked it back into her jacket. She looked Bill dead in the eyes with the calm, terrifying confidence that had gotten them into every terrible situation they had ever shared. "Trust the hat," she said. Bill did not trust the hat. But he got in the car anyway.
created_at
"2026-02-24T22:25:10.477Z"
2/24/2026, 10:25:10 PM
model
"anthropic/claude-4.6-opus-20260205"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1380
moderation_latency
124
generation_time
108020
tokens_prompt
77
tokens_completion
2333
native_tokens_prompt
92
native_tokens_completion
8591
native_tokens_completion_images
(null)
native_tokens_reasoning
1515
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.215235
router
(null)
provider_responses
0
endpoint_id
"a8708fb9-08ea-43bb-a3b1-165c87ef09df"
id
"b5f3502c-0d88-4d8e-b77e-30e0d0cdc085"
is_byok
false
latency
1349
model_permaslug
"anthropic/claude-4.6-opus-20260205"
provider_name
"Amazon Bedrock"
status
200
api_type
"completions"
id
"gen-1771971910-QwPmMRO1ti5URJOCETN4"
upstream_id
"b5f3502c-0d88-4d8e-b77e-30e0d0cdc085"
total_cost
0.215235
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
545
71.41%
Dialogue to Total Word Ratio
Ratio: 27.40%, Deviation: 2.60%
neededClean
false
wordsTotal
551
wordsDialogue
151
35.7035%