NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-29-01KSSJKY0BTCJWX9TH8ZEHEGTW
Run Details
Model:
Claude Opus 4.8 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
23.2s
Cost:
$0.029690
Provider:
Anthropic
Evaluation summary
Score
Evaluator
Details Link
99.98%
Matches word count
(details)
48.40%
Dialogue to Total Word Ratio
(details)
74.1931%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the blueprints across the kitchen table, smoothing out the wrinkles with the reverence of a surgeon prepping for an operation. "Okay, Bill. The target is the Henderson residence. Saturday. Two o'clock." Bill leaned over, squinting. "Is that... a bouncy castle?" "Affirmative. Inflatable structure, northeast corner of the backyard. We'll use it for cover." "Molly, we're stealing birthday presents from a seven-year-old." "We're *liberating* assets," she corrected, jabbing the map with a pen. "Do you know how much a Nintendo Switch goes for these days? And these kids get like fifteen of them. It's economic redistribution." "It's a felony." "It's a *small* felony. Now focus. What's our entry point?" Bill scratched his chin. "The clown." "The what?" "They hired a clown. Bozo's Birthday Bonanza. I saw it on the Facebook invite." Molly's eyes narrowed. "You're friends with the Hendersons?" "I'm friends with *everybody*, Molly. That's reconnaissance." "Fine. So we pose as entertainment. You distract the parents, I grab the loot from the present table. In and out, ninety seconds." "And the costumes?" She reached into a duffel bag and tossed him a rumpled wad of polyester. Bill held it up. It was a banana suit. "I'm not wearing this." "You're a singing telegram. It's your cover." "What are *you* wearing?" Molly pulled out a second costume. It was an identical banana. "We're both bananas?" "It's less suspicious if we match." Bill groaned, stepping into the costume one leg at a time. "This is the dumbest heist we've ever planned. Worse than the time you tried to rob the aquarium." "That dolphin had it coming." "And what about security?" "The dad has a phone with a doorbell camera. Otherwise, we're dealing with civilians under four feet tall." "Children are unpredictable, Molly. They bite. They scream. One of them might be feral." "Then we don't make eye contact." She zipped up her banana to the chin. "Listen. The presents are stacked by the gift table. Wrapped, sorted, completely unguarded. The adults will be too busy filming the cake to notice two professionals at work." "Two bananas at work." "Two *highly trained* bananas." She slapped the table. "Now, escape route. We exit through the side gate, hop the fence, and the getaway car—" "Is your mom's Prius." "—is positioned for rapid extraction, yes." Bill waddled toward the mirror, examining himself. "I look ridiculous." "You look *delicious*. Now, run the plan back to me." He sighed, holding up his stubby yellow arms. "We infiltrate via clown affiliation, distract the parents with festive banana energy, secure the presents, and exfiltrate through the side gate into your mother's hybrid vehicle." "Beautiful." Molly beamed. "Bill, this is the heist that changes everything. We retire after this. Beachfront property. No more crime." "You said that about the aquarium." "The aquarium was a learning experience." Bill straightened his banana stem. "If I get arrested in this costume, I'm telling them it was your idea." "Noted." Molly tossed him a party hat. "Now smile. We've got a birthday to ruin."
created_at
"2026-05-29T09:56:30.869Z"
5/29/2026, 9:56:30 AM
model
"anthropic/claude-4.8-opus-20260528"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1141
moderation_latency
(null)
generation_time
23197
tokens_prompt
77
tokens_completion
844
native_tokens_prompt
93
native_tokens_completion
1169
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
service_tier
"default"
usage
0.02969
router
(null)
provider_responses
0
endpoint_id
"b0d3beaa-5ff7-4666-a749-b1bb2b14c916"
id
"msg_01F6mSjDsKeuxoRSNizCVYng"
is_byok
false
latency
1141
model_permaslug
"anthropic/claude-4.8-opus-20260528"
provider_name
"Anthropic"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.16.0; linux; x64))"
http_referer
(null)
request_id
"req-1780048590-du9V9NaSczWCsTFx4Hdv"
session_id
(null)
api_type
"completions"
id
"gen-1780048590-6alN6zIL6TtUl9u3rj2p"
upstream_id
"msg_01F6mSjDsKeuxoRSNizCVYng"
total_cost
0.02969
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Anthropic"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
99.98%
Matches word count
n/a
neededClean
false
words
502
48.40%
Dialogue to Total Word Ratio
Ratio: 73.81%, Deviation: 3.81%
neededClean
false
wordsTotal
504
wordsDialogue
372
74.1931%