NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2DVPTMVTFYG0E53QRCV7EN
Run Details
Model:
DeepSeek V4 Pro (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
4m 8s
Cost:
$0.008077
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
95.99%
Matches word count
(details)
88.84%
Dialogue to Total Word Ratio
(details)
92.4129%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly unfurled the crinkled blueprint across the card table, its surface a chaos of pizza grease and glitter. She’d spent the morning sketching the target: a suburban backyard transformed into a pastel-colored fortress of childhood excess, complete with a bouncy castle shaped like a cartoon dog whose perpetually smiling plastic face seemed to mock their ambitions. Bill leaned in, a half-eaten lollipop sticking out of his shirt pocket like a sugary shiv. “The gifts are under that?” he asked. “Right under the inflatable canine’s rump,” Molly said, tapping the spot with a glue-gun calloused finger. “Operation Whimper Snap is a go.” He scratched his stubble, the sound rasping through the cramped hideout that doubled as her aunt’s mothball-scented storage unit. “Sounds risky.” “It’s a party for a four-year-old named Braxton, Bill. The only thing at risk is my remaining faith in curated birthday registries.” She then proceeded to outline her plan with the enthusiasm of a general who’d spent too long in the planning tent. Disguises were essential. From a duffel bag reeking of mothballs and desperation, she produced a unicorn headband whose horn listed to the left like a defeated party hat, and a pair of oversized striped trousers with reinforced pockets that could conceal a bread machine. Bill would be Gary the Mime, a silent purveyor of gluten-free crackers and misdirection, his painted-on smile a frozen mask of existential dread. Molly would be Sparkles, a singing unicorn whose off-key ballads could distract any parent desperate enough to maintain the illusion of a perfect party. The extraction method was a modified monster truck, gutted and fitted with a jerry-rigged remote control that sparked ominously, its cargo hold just large enough for a few dozen gift boxes if crushed with sufficient optimism. Molly demonstrated by cramming a naked Barbie doll inside; the truck’s plastic seams groaned in protest. “We just drive it out the back gate?” Bill asked, watching a wheel spin uselessly against the table, flinging a stray sequin into his coffee. “Through the neighbor’s prize-winning petunias and into the alley where the van idles. I call it Giftzilla. It’s got a turbo button and a glitch that makes it veer hard left, which adds an element of unpredictability.” She pressed the turbo button; the truck shot off the table and landed in an open box of donuts, sprinkling powdered sugar across the blueprint like a miniature blizzard. Bill stared at the sugary carnage, the droopy horn now perched on his head giving him the air of a man who had lost a very specific wager with a mythological creature. “What if the kids catch on and start crying?” Molly grinned, a smear of glaze on her chin. “That’s the beauty of it. No one questions a mime weeping silently in a corner while a unicorn belts a sorrowful, off-register rendition of ‘Let It Go.’ We’ll be untouchable. They’ll think we’re performance artists.” Bill adjusted the horn, which emitted a sad jingling sound. “I should have stuck to stealing office furniture. Ergonomic chairs don’t have birthday parties.”
created_at
"2026-04-25T13:39:27.192Z"
4/25/2026, 1:39:27 PM
model
"deepseek/deepseek-v4-pro-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
854
moderation_latency
(null)
generation_time
248247
tokens_prompt
77
tokens_completion
9039
native_tokens_prompt
67
native_tokens_completion
9250
native_tokens_completion_images
(null)
native_tokens_reasoning
8557
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.008076645
router
(null)
provider_responses
0
endpoint_id
"d084170b-dd99-4e53-9ae1-b63881f3cfbf"
id
"a8429fce-d2c0-4661-ac35-3a6ee760d090"
is_byok
false
latency
271
model_permaslug
"deepseek/deepseek-v4-pro-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777124367-KOArv4isM4xVsbguWMeZ"
session_id
(null)
api_type
"completions"
id
"gen-1777124367-2JSomS9Td1xOD9cB8P0W"
upstream_id
"a8429fce-d2c0-4661-ac35-3a6ee760d090"
total_cost
0.008076645
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
95.99%
Matches word count
n/a
neededClean
false
words
508
88.84%
Dialogue to Total Word Ratio
Ratio: 28.46%, Deviation: 1.54%
neededClean
false
wordsTotal
520
wordsDialogue
148
92.4129%