NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2G1ZMW266FZRW2ZXGN981F
Run Details
Model:
DeepSeek V4 Pro (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
28.2s
Cost:
$0.001069
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The air inside the converted ice-cream van tasted of desperation and cheap cherry air freshener. Molly unfolded a crayon-drawn blueprint across a stack of deflated bouncy castle vinyl, her finger tapping on a wobbly rectangle labeled “PRESENT MOUNTAIN (TREASURE ZONE).” “This is the kill box,” she said, her voice a low, conspiratorial hum. “See that? Right there, under the pin-the-tail-on-the-donkey poster. A tower of loot wrapped in unicorn paper so glittery it’s practically a homing beacon. We’re not dealing with amateurs, Bill. This is a six-year-old’s birthday. The security will be primal. Crayon-based, but primal.” Bill, a man whose face looked like a forgotten potato, squinted at the drawing. “What’s the red squiggle? The one circling the whole living room? Looks like a possessed piece of spaghetti.” “That’s the primary threat. A bouncy castle. Not for bouncing. For territorial demarcation. Mrs. Gable, the birthday boy’s mother, is a former Olympic volleyball player. She’s got a serve that can take the curl out of a toupee at twenty paces. We’re going in during the pizza limbo, when her glucose levels spike and her guard drops for a five-minute window. That’s our ingress point.” Bill chewed his lip, which was a nervous habit and a horrifying spectacle. “I don’t like kids. Their fingers are always sticky. You said this was a silent alarm job. Sticky fingers are the opposite of silent. They’re a tactile betrayal.” “The party bags,” Molly said, ignoring him and tracing a line to a secondary blob. “Are our alibi. Each one is a strategic diversion. We swap their contents: we replace the plastic frog that jumps when you push its back with a remote-controlled smoke pellet of my own design. The one I call the ‘Toddler Distractor 3000’. Press a button, and a dozen tiny smoke plumes and wiggly-eye springs go off behind the sofa. The parents will think it’s a rogue science kit. The kids will migrate towards the chaos. We grab the mountain.” “You want to weaponize party favours.” Bill’s tone was flat. “I want to *liberate* the loot under the cover of supervised anarchy. Now, your role. You’re the entertainer. Clown-Cop Bill. The world’s most bureaucratic children’s performer.” She slid a photocopied badge across the crayon map. “You slippers. You produce a fake but official-looking ‘Noise Violation Ticket’ for the bouncy castle. You’ll need a clipboard. An unflappable belief in municipal code is the most disarming thing a clown can possess.” “A clown-cop. And what’s your cover? The deputy of regret?” “Face-painter. An artist of sorrow. I paint tears on their happy little cheeks.” Molly allowed herself a rare, thin smile. “They queue up. ‘Paint me sad,’ they’ll beg. It’s a profound and distracting existential crisis I’ll dab on with a sponge. While you’re citing the inflatable structure for excessive boinging, and the toddlers are exploring their inner bleakness, I’ll be bagging the deluxe Lego and the remote-control dinosaur. The one that roars and the birthday boy’s uncle definitely overpaid for.” Bill sighed, a long, defeated sound like a balloon giving up. “I hate the red nose. It honks every time I breathe. My allergies will give us away.” “The honk is your human shield, Bill. No one interrogates a man whose breathing plays a jaunty little tune. Are your pockets sewn for the quick-cache? We need to ditch the big stuff down the laundry chute you scoped.” “The chute’s greased. Deposits bypass the basement playroom and land directly in a massive pile of Teddy’s own plushies. Silent as a kitten’s thought.” He paused. “What about the piñata?” “Don’t talk to me about the piñata,” Molly hissed, her eyes turning to flint. “That’s a wild card. A blindfolded child with a wiffle bat is a chaos agent I haven’t cracked. We treat it like a bomb. If they swing before the fourteen-minute mark, we abort. If the candy cascades, it’s a hundred tiny hostages to stomping feet. We’re professionals, not monsters.” She rapped the map. “Any more questions, or can we review your honk-muffling technique one more time?”
created_at
"2026-04-25T14:17:49.985Z"
4/25/2026, 2:17:49 PM
model
"deepseek/deepseek-v4-pro-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1020
moderation_latency
(null)
generation_time
27647
tokens_prompt
77
tokens_completion
1304
native_tokens_prompt
67
native_tokens_completion
1195
native_tokens_completion_images
(null)
native_tokens_reasoning
184
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.001068795
router
(null)
provider_responses
0
endpoint_id
"d084170b-dd99-4e53-9ae1-b63881f3cfbf"
id
"4aaf5163-24c8-4187-ab17-3c05ba26d8d0"
is_byok
false
latency
272
model_permaslug
"deepseek/deepseek-v4-pro-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777126669-R9FK8e6JAjnQX7wXI0HU"
session_id
(null)
api_type
"completions"
id
"gen-1777126669-Dw4WvrbXLhPn7JBrFyEc"
upstream_id
"4aaf5163-24c8-4187-ab17-3c05ba26d8d0"
total_cost
0.001068795
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
673
0.00%
Dialogue to Total Word Ratio
Ratio: 81.39%, Deviation: 31.39%
neededClean
false
wordsTotal
693
wordsDialogue
564
0.0000%