NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2DKNQHVCESJAV1AX4T98GD
Run Details
Model:
DeepSeek V4 Pro (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
35.2s
Cost:
$0.001037
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
35.38%
Dialogue to Total Word Ratio
(details)
17.6879%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The sedan idled beneath a canopy of dripping oak trees, its windshield wipers beating a lethargic rhythm against the drizzle. Molly stared through the moisture-beaded glass at the house across the street, a sugar-frosted gingerbread nightmare of pastel balloons and a bouncy castle shaped like a vomiting unicorn. "The thing about four-year-olds," Molly said, tapping a finger on the steering wheel, "is they are deeply, profoundly stupid. It's our greatest asset." Bill, hunched in the passenger seat, was attempting to fold a clown onesie into a manageable square and failing miserably. A red foam nose perched on his knee kept rolling off. "I don't think stupid's the right word. More like… empirically unobservant. Wait, is that a real word?" "I'm not looking at a thesaurus, Bill. I'm looking at a heist. A heist of tiny, unwrapped loot." She gestured at the pile of glitter-encrusted packages visible through the bay window. "Presents stacked like ingots in the Temple of Toddler Doom. And our ticket in is right there." She pointed to a scribbled sign taped to the mailbox: *Party Entertainer – Use Side Gate.* "I still think the 'Magical Sparkle Wizard' is a stretch," Bill mumbled, holding up the purple velour robe with moons and stars. "It smells like a wet ferret." "That's the authentic wizard scent. Commit to the craft. You enter through the garden gate, hit 'em with the classic 'I have come to banish the Gloom Goblin' bit, and while nineteen sugar-brained kids are vying for a turn with the bubble wand, I’m crawling through the dog flap dressed as a giant gift bag. The birthday girl opens a present from 'Auntie Marge' she didn't know she had, and it's me, popping out to bag the rest." Bill blinked. "You want to be the Trojan Horse of birthday gifts? You think a child won't notice a human-shaped present?" "Bill, last week my niece asked a lamppost if it was her daddy. They won't notice. I'll freeze when they look. Children have the object permanence of a goldfish. By the time they re-focus, I'll be a motionless sack of loot." "And what do I do when the Gloom Goblin banishment fails to captivate?" "Improvise. Tell them the goblin stole all their noses. Kids love thinking their noses are gone. It's anarchy. Controlled anarchy." Molly took a bite of a candy cigarette. "We clear the playroom in under four minutes. The big-ticket items are obvious: the Lego Death Star, the motorized mini-jeep, anything in an Apple Box shape. Leave the stuffed animals; they're low margin and impossible to fence." "Right, because moving a hot pink Jeep Wrangler with a Frozen decal is a simple Sunday." "The black market for tiny electric vehicles is booming, Bill. Your lack of vision wounds me." Molly pulled a pair of oversized sunglasses from the visor. "Now, the sticky part is the magician who was actually booked. I paid him a hundred bucks to not show up and a further fifty to give me his stilts. I look excellent on stilts. I'll be up in the canopy with a grappling hook made of curly ribbon, ready to lower the big haul out the bathroom window before the piñata even breaks." Bill squinted. "You promised me we were done with themed grappling equipment after the tinsel garrotte incident." "That was a misunderstanding with the carolers and you know it." She revved the engine gently. "Final check. You’ve got the bubble mixture laced with glitter-glue to slow down any pursuing parents? The decoy piñata filled with live crickets as a distraction?" Bill nodded, patting a trembling cardboard donkey. "One small problem. What if the kids start crying? I can't handle a room full of crying kids. It breaks something in me, Molly." Molly placed a comforting hand on his shoulder. "That's the neat part. They're not your kids. Their tears are just the socially acceptable face of a successful operation. Now, put on your ferret robe, Sparkles. Softer targets have softer screams. Let's go ruin some core memories."
created_at
"2026-04-25T13:35:03.926Z"
4/25/2026, 1:35:03 PM
model
"deepseek/deepseek-v4-pro-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
895
moderation_latency
(null)
generation_time
35116
tokens_prompt
77
tokens_completion
1289
native_tokens_prompt
67
native_tokens_completion
1158
native_tokens_completion_images
(null)
native_tokens_reasoning
238
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.001036605
router
(null)
provider_responses
0
endpoint_id
"d084170b-dd99-4e53-9ae1-b63881f3cfbf"
id
"a2b2511a-d121-4c20-9e76-8b1c0abcb432"
is_byok
false
latency
278
model_permaslug
"deepseek/deepseek-v4-pro-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777124103-geiV3aVl2mTr1jQudI7j"
session_id
(null)
api_type
"completions"
id
"gen-1777124103-163AKKw2ecD85vZD9Kh3"
upstream_id
"a2b2511a-d121-4c20-9e76-8b1c0abcb432"
total_cost
0.001036605
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
670
35.38%
Dialogue to Total Word Ratio
Ratio: 74.56%, Deviation: 4.56%
neededClean
false
wordsTotal
680
wordsDialogue
507
17.6879%