NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHM5X149CEWTCP9Y4J6X0TFQ
Run Details
Model:
DeepSeek V3.1
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
11.7s
Cost:
$0.001392
Provider:
WandB
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The map of the upscale suburban neighborhood was spread across their rickety kitchen table, held down at the corners by a half-empty beer bottle, a grease-stained pizza box, and a set of lock picks. Molly tapped a perfectly manicured nail on a house circled in red. “The Hardwick residence. Saturday. Three PM. The目标 is the birthday haul for one… Chloe Hardwick, age seven.” Bill squinted at the drawing next to the address, a crude crayon sketch of a unicorn with lopsided wings. “A kid’s party? Mol, we stole the Crown Jewels of Cadabra. We lifted the Diamond As Big As Your Head from the Museum of moderately Interesting Rocks. This is a new low.” “It’s a high-yield, low-risk operation,” Molly countered, her voice smooth as silk. “The parents will be distracted by twenty sugar-crazed maniacs and a clown they found on the internet named ‘Tingles.’ The security system is a joke. The dog is a geriatric Labrador named Mr. Snuffles who can be neutralized with a single slice of pepperoni.” “Neutralized? He’s not a security system, he’s a senior citizen. I’m not pepperoni-bombing a pensioner.” “Fine. Bribe him with chin scratches. The point is, the presents will all be in one room, presumably wrapped. We go in during the peak chaos of the piñata incident. In and out. Nobody gets hurt.” Bill ran a hand over his face. “What’s the play? We can’t exactly stroll in.” Molly’s smile was wicked. “We’re the entertainment. The clown cancelled. Food poisoning from a questionable schwarma. We got the gig.” Bill stared, his jaw slack. “You signed us up to be clowns.” “You’ll be great! You’re already got the big shoes and the perpetually sad expression. Your character is ‘Weepy the Clown.’ You just wander around looking tragically hilarious. I’ll be ‘Sparklefists,’ the magician. My sleight-of-hand will be put to its ultimate test: making an entire pile of presents disappear.” “Sparklefists? Weepy the Clown?” Bill groaned, sinking into a chair. “The crew is going to hear about this. The guys in prison are going to hear about this. My reputation will be a smoking crater.” “Your reputation will be that of a man who pulled off the cleanest, quietest heist of the year,” Molly said. “Think of the score. All those shiny, new, unopened toys. The resale value on collectible dolls alone is staggering. We’re not stealing from a child, Bill. We’re… laterally redistributing plastic wealth.” “She’s seven, Molly. What if she sees me? What if I have to make a balloon animal? I can’t make a balloon animal. The only thing I can make is a balloon… sad tube.” “You point the balloon and make a squeaky noise. The kids will be terrified. It’s perfect misdirection.” She leaned in, her eyes gleaming. “This is it, Bill. One last job. We get the presents, we fence them online, and we retire to a tropical island where no one will ever ask us about our clown phase.” Bill looked at the map, then at the hopeful, slightly unhinged look on Molly’s face. He sighed, a long, defeated sound. “What’s my code name for this disaster?” Molly handed him a giant pair of oversized polka-dot pants and a bright red nose. “It’s ‘Honk if you see a可疑 activity.’ Now practice your frown. You need to look authentically weepy.” Bill put his head in his hands. “I don’t need to practice.”
created_at
"2026-02-16T21:32:42.908625+00:00"
2/16/2026, 9:32:42 PM
model
"deepseek/deepseek-chat-v3.1"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
247
moderation_latency
(null)
generation_time
11553
tokens_prompt
95
tokens_completion
903
native_tokens_prompt
68
native_tokens_completion
821
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00139205
router
(null)
provider_responses
0
id
"cmpl-c9674b49a68e42859378740336fe967a"
status
200
is_byok
false
latency
53
endpoint_id
"a50fbc5c-1405-4940-92ad-c455b0396643"
provider_name
"WandB"
model_permaslug
"deepseek/deepseek-chat-v3.1"
api_type
"completions"
id
"gen-1771277550-UEMcatswVxlM3on7jsCn"
upstream_id
"cmpl-c9674b49a68e42859378740336fe967a"
total_cost
0.00139205
cache_discount
(null)
upstream_inference_cost
0
provider_name
"WandB"
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
573
41.9004%
Dialogue to Total Word Ratio
Ratio: 74.17%, Deviation: 4.17%
neededClean
false
wordsTotal
573
wordsDialogue
425
20.9502%