NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-31-01KSYXGSHWD6VAKKVA3E8BAE3F
Run Details
Model:
Claude Opus 4.8 (Reasoning, High)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
10.9s
Cost:
$0.012840
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
87.39%
Dialogue to Total Word Ratio
(details)
93.6948%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"So we waltz in dressed as clowns?" Bill asked. "Exactly. Nobody questions clowns at a kid's party," Molly said. "What about the presents?" "We stuff them in the balloon bags." "Clever. And the cake?" "We're not stealing cake, Bill." "Why not? It's right there." "Focus. We grab the gifts, we leave. In and out, three minutes." "What if a kid recognizes me?" "From where, exactly?" "I do birthdays on weekends." Molly stared at him. "You're a part-time party clown?" "It pays well, Molly. Children love me." "That's actually perfect cover. You distract them with balloon animals." "I only know the giraffe." "One animal?" "And a sword. Sort of." "Make the sword, hand out giraffes, I'll clear the gift table." "What if the parents are watching?" "They never watch. They're all on their phones." "True. My last gig, a dad missed his own kid's face-planting in the piƱata." "See? We're invisible." "What's our escape route?" "The bouncy castle." "We escape through a bouncy castle?" "Behind it, Bill. There's a fence." "I can't climb fences. My knee." "Then we take the minivan." "Whose minivan?" "Doesn't matter. Keys are always in the diaper bag." "Molly, you're a genius." "I know. Now grab your nose."
created_at
"2026-05-31T11:43:14.504Z"
5/31/2026, 11:43:14 AM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1614
moderation_latency
(null)
generation_time
10831
tokens_prompt
77
tokens_completion
356
native_tokens_prompt
93
native_tokens_completion
495
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.01284
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
id
"msg_01P9FzPP4M3n5L3tBZZM9D9e"
is_byok
false
latency
1574
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780227794-W1Ttdd9AGOCgRtEpMi9o"
session_id
(null)
preset_id
(null)
api_type
"completions"
id
"gen-1780227794-hfKK3fjtcNRAGqdc02H0"
upstream_id
"msg_01P9FzPP4M3n5L3tBZZM9D9e"
total_cost
0.01284
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
201
87.39%
Dialogue to Total Word Ratio
Ratio: 96.06%, Deviation: 6.06%
neededClean
false
wordsTotal
203
wordsDialogue
195
93.6948%