NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2FC5KHQC5NE3WZEE3QZ09B
Run Details
Model:
DeepSeek V4 Pro
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
21.5s
Cost:
$0.000824
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The air inside the van smelled of desperation and half-eaten gas station taquitos. Molly had a sparkly pink notepad balanced on her knee, a crayon in her hand because it was the only writing implement she could find in the glove box. “Alright, walk me through the ingress again, Bill. And use the code names this time. Operational security is paramount,” she said, tapping the crayon against her chin. “You want me to use the code names for a plan that involves a bouncy castle?” Bill asked, peering through a pair of binoculars at a house across the street that was festooned with enough rainbow balloons to signal aircraft. “I want you to use the code names *because* it involves a bouncy castle. It’s a sensitive environment. Go on.” Bill sighed, lowering the binoculars. “Fine. Operative Gummy Bear will approach the southern perimeter, marked by the piñata.” “That’s me, Operative Gummy Bear. I’m the muscle. Continue.” “The objective is to gain entry by exploiting a known vulnerability in the host’s emotional defenses,” Bill said, his voice a flat monotone. “Phase one: ‘The Adorable Distraction.’ I, Operative Juice Box, will say my kitten is stuck in a tree, requiring the immediate, heroic intervention of the birthday child’s father.” “Genius. A classic. The kitten that does not exist will be named Fluffernutter. It adds verisimilitude,” Molly said, scribbling with the crayon. “While you’ve got the Dad poking a stick at an empty sycamore, what’s my move?” “You, having previously and convincingly sobbed to the gatekeeping Aunt near the dessert table about a fictitious peanut allergy, will be granted full, sympathetic access to the gift table, which is discreetly located near the emergency rhubarb pie.” “I’m allergic to the very concept of legumes,” Molly nodded. “It gets very real, very fast. Then the extraction?” “Extraction is the tricky part. We don’t know the exact nature of the loot. It could be a Lego Death Star, it could be a pony-themed makeup kit that actually smells of toxic grape. Our gear has to be adaptable,” Bill explained, reaching into a duffel bag. He pulled out a colossal, brightly-wrapped box with a huge, clumsily-tied bow. “This is the Trojan Gift.” Molly squinted. “It’s… ostentatious.” “That’s the point. Operative Bubble Wrap will place this distraction box on the table. It’s filled with packing peanuts and one single, activated whoopee cushion. The resulting audio chaos will provide cover. You then shovel the real gifts into the false-bottomed wagon I’ll have positioned by the hydrangeas.” “And what is my extraction vehicle’s code name?” Molly asked, a slow smile spreading across her face. Bill sighed again, a deeper, more soul-weary exhalation. “The Getaway Wagon is to be known as… the Sugar Rush.” “The Sugar Rush!” Molly squealed, clapping her hands. “I love it! See, this is why we’re the best. No one suspects a weeping woman with a nut allergy and a deeply committed glitter glue enthusiast. We’re going to be swimming in plush unicorns and video game gift cards.” “We’re not keeping the plush unicorns, Molly. We’re strictly in this for the resale value of the high-end building sets,” Bill said, his eye twitching as a tiny guest in a superhero cape tripped over the garden hose. “The margin on pretend friendship bracelets is a nightmare.” “Speak for yourself, Bill. I’ve grown rather attached to the idea of a unicorn army. Alright, one more rehearsal. What’s my signal to abort the mission?” Bill stared at her for a long moment, the sounds of a children’s pop song drifting through the van’s thin metal walls. “If you see the clown, Molly. The one making balloon animals. If you see him twisting a poodle, we run. Everyone knows those things are structurally unsound and a clear indicator of an unsecured perimeter. This isn’t a game. This is a preschool birthday party.”
created_at
"2026-04-25T14:05:55.191Z"
4/25/2026, 2:05:55 PM
model
"deepseek/deepseek-v4-pro-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
833
moderation_latency
(null)
generation_time
21260
tokens_prompt
77
tokens_completion
1050
native_tokens_prompt
67
native_tokens_completion
914
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.000824325
router
(null)
provider_responses
0
endpoint_id
"13bda129-f8a6-4a60-b1db-98082e5f7ccd"
is_byok
false
latency
131
model_permaslug
"deepseek/deepseek-v4-pro-20260423"
provider_name
"Together"
status
429
1
endpoint_id
"d084170b-dd99-4e53-9ae1-b63881f3cfbf"
id
"a76572bb-0f43-4cae-a9ef-d1c762d9dd1a"
is_byok
false
latency
301
model_permaslug
"deepseek/deepseek-v4-pro-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777125955-yrhAgBSb4j4ay3X0pemf"
session_id
(null)
api_type
"completions"
id
"gen-1777125955-iPqW8ZTCzEBlJIJfG74Y"
upstream_id
"a76572bb-0f43-4cae-a9ef-d1c762d9dd1a"
total_cost
0.000824325
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
641
0.00%
Dialogue to Total Word Ratio
Ratio: 72.07%, Deviation: 22.07%
neededClean
false
wordsTotal
648
wordsDialogue
467
0.0000%