NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KPZV9WSAFGZTZC937TBB1W6N
Run Details
Model:
Claude Opus 4.7
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
7.2s
Cost:
$0.012015
Provider:
Amazon Bedrock
Evaluation summary
Score
Evaluator
Details Link
60.28%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
30.1376%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the blueprints across the kitchen table, anchoring the corners with half-empty soda cans. The crayon drawing depicted a suburban living room, a bouncy castle, and what appeared to be a lopsided pony. "Security's tight," she said, tapping a stick figure labeled *Aunt Karen*. "She's got eyes in the back of her head, Bill. Literally. The kid drew them." Bill squinted. "What about the clown?" "Decoy. We use him." "I'm not working with Pickles again." "You'll work with whoever I say you'll work with." Bill crossed his arms. "Last time, Pickles ate the evidence. Actually ate it. Balloon animals and everything." "Focus," Molly snapped. "The presents are stacked here, by the piƱata. We've got maybe eight minutes between cake and goodie bags." "And the birthday boy?" "Turning seven. Name's Tyler. Allergic to peanuts and, according to our inside source, emotionally fragile." "Your source being?" "His babysitter. She's cut for fifteen percent." Bill whistled. "Steep." "She's the one who got us the gift list. Two drones, a Nintendo, and something called a Bluey." "What's a Bluey?" Molly shrugged. "Don't care. If it's wrapped, it's ours."
created_at
"2026-04-24T13:36:40.25Z"
4/24/2026, 1:36:40 PM
model
"anthropic/claude-4.7-opus-20260416"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
651
moderation_latency
(null)
generation_time
7145
tokens_prompt
77
tokens_completion
316
native_tokens_prompt
98
native_tokens_completion
461
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
usage
0.012015
router
(null)
provider_responses
0
endpoint_id
"0fa65117-00d9-4835-808d-9b68041d9ce4"
id
"53c2f6c9-c139-48c6-922b-fb6795b7c52b"
is_byok
false
latency
651
model_permaslug
"anthropic/claude-4.7-opus-20260416"
provider_name
"Amazon Bedrock"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777037800-m6Mvn8FXugXGIbYEvCRc"
session_id
(null)
api_type
"completions"
id
"gen-1777037800-kG7GgIeuSwMva4EYhbD2"
upstream_id
"53c2f6c9-c139-48c6-922b-fb6795b7c52b"
total_cost
0.012015
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
60.28%
Matches word count
n/a
neededClean
false
words
185
0.00%
Dialogue to Total Word Ratio
Ratio: 69.89%, Deviation: 19.89%
neededClean
false
wordsTotal
186
wordsDialogue
130
30.1376%