NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2FJHXR65S7X6WVTQY9TBZ1
Run Details
Model:
DeepSeek V4 Pro
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
26.6s
Cost:
$0.000914
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Bill, I can’t believe we’re casing a party where the average guest still believes in the afterlife for hamsters.” “Molly, you agreed to this. You said, and I quote, ‘We need a low-risk, high-reward gig that doesn’t involve laser grids or angry Dobermanns.’” “I didn’t know the alternative was a bouncy castle and a room full of tiny, sticky informants. My face is already on a milk carton just from standing here.” “Will you look at the guest list, though? Little Timothy Turnbull is the grandson of Harold Turnbull. The man owns half the cacao plantations in South America. Every present in that room is wrapped in paper made from crushed Fabergé eggs. We’re not swiping plastic dinosaurs, Moll. We’re liberating first-edition, signed picture books and bespoke, Italian toddler loafers.” “Right, and how do you propose we waltz out with an armful of loot without seventy weeping sugar-fiends setting off a psychological alarm?” “Simplicity itself. We’re not thieves tonight. I’m Gary the Gullible, a third-rate children’s entertainer with a profound inability to remember anyone’s name. You’re my glum, long-suffering assistant, Janet the Jittery, who hates the sound of laughter.” “I’m not being Janet the Jittery.” “Fine. You can be Wanda the Weeper. Your act is you just stand in the corner and cry into a handkerchief soaked in a non-toxic, lavender-scented chloroform derivative of my own invention.” “You’ve invented a napkin that gently puts children to sleep? That’s the most disturbing thing you’ve ever created, and I’ve seen you make a grappling hook out of a stapler and a will to live.” “It’s a revolutionary sedative. They’ll think they’re just overcome with the existential ennui of a failed magic trick. The plan is airtight. I make a balloon poodle, I dab it with the 'Oblivion Oshibori', the kid’s eyes flutter shut, and the parents are so grateful for a moment of peace they won’t even notice we’re stacking gift bags like a fortress of solitude.” “And the cake? The cake is a sentinel. It’s a four-tiered fondant monstrosity with a working miniature train on it. That’s a mechanical booby trap. The birthday boy’s uncle is an engineer with too much time and clearly a vendetta against joy. The train will derail the entire operation.” “We avoid the cake. Rule one of the children’s-party circuit: never engage with a dessert that’s trying to achieve locomotion.” “Okay, so I’m Wanda the Weeper, you’re Gary the—what pharmacological nightmare did you call him? The Gullible? What’s your actual trick? You can’t do magic.” “What do you mean? I’m a fantastic magician. I made my last three identities vanish without a trace.” “You’re a felon in a polyester waistcoat holding a handkerchief that smells like a spa in a dystopian novel. What do you do if a four-year-old asks to see a card trick and they’re somehow resistant to your sadness-shammy?” “I distract them with the rabbit.” “We don’t have a rabbit. The budget for this job didn’t stretch to an emotional-support animal with a pension plan.” “We have the cuckoo clock I retrofitted with fur and a twitchy nose. I call him Mr. Twitchy, the Chrono-Bunny. You wind him up, and on the hour, he pops out, twitches three times, and lays a single chocolate-covered espresso bean. It’s a perfect distraction. By the time the caffeine kicks in and the children start vibrating at a frequency unseen in nature, we’ll be in a van three blocks away, splitting the proceeds from a hand-stitched cashmere romper.” “I’m going to need more chloroform. For me. So, to recap, our masterstroke is to infiltrate a sugar-laced pack of feral trust-fund heirs, bore them into a chemical coma with terrible magic, and rob them blind while their parents applaud the silence.” “Now you’re getting into the spirit, Wanda. Did you remember the non-slip grip socks? The lion-tamer motif is essential for blending in.” “They’re in the bag, Bill. Right next to the industrial-grade tissues and a signed, legally questionable waiver for any 'emotional distress caused by a weeping stranger and a twitchy squirrel-cuckoo hybrid'. I’m ready. Let’s go ruin some memories.”
created_at
"2026-04-25T14:09:26.129Z"
4/25/2026, 2:09:26 PM
model
"deepseek/deepseek-v4-pro-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1141
moderation_latency
(null)
generation_time
24746
tokens_prompt
77
tokens_completion
1111
native_tokens_prompt
67
native_tokens_completion
1017
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.000913935
router
(null)
provider_responses
0
endpoint_id
"d084170b-dd99-4e53-9ae1-b63881f3cfbf"
id
"7e2b5740-bb15-4930-b0ba-88ac12549394"
is_byok
false
latency
279
model_permaslug
"deepseek/deepseek-v4-pro-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777126166-jjRSpEtFxCdL5U1CRBmf"
session_id
(null)
api_type
"completions"
id
"gen-1777126166-j3q5Okb0UsHuWob0YTR9"
upstream_id
"7e2b5740-bb15-4930-b0ba-88ac12549394"
total_cost
0.000913935
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
684
0.00%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 30.00%
neededClean
false
wordsTotal
706
wordsDialogue
706
0.0000%