NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2JD2KDAK79TNN8KSK4SNXK
Run Details
Model:
DeepSeek V4 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
11.7s
Cost:
$0.000233
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
35.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
17.5011%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“You sure about this, Bill?” Molly asked, adjusting the strap of her backpack. “Breaking into Timmy’s seventh birthday party feels… wrong.” “Wrong is relative,” Bill said, checking his watch. “The parents will be half-drunk on cheap wine by four. The kids are sugared-up zombies. And the presents are stacked like a gold mine under a piñata.” “But it’s a unicorn piñata. With rainbow-colored wings. I saw it through the window,” Molly said. “I’m not sure I can look a unicorn in the eye and steal from its shadow.” “It’s candy, Molly. Not a sacred beast. The unicorn is filled with cheap plastic toys and sugar that’ll make them see sounds. We’re doing them a favor.” “A favor? We’re stealing ‘Squeaky the Hedgehog’ from little Chloe.” “Chloe has twelve hedgehogs already. Her mom posted it on Facebook. We’re curating her collection.” Molly pulled out a crumpled napkin with a map. “Okay. The back gate has a latch that’s been broken since May. The dog is a elderly pug named Nugget. He barks once every twenty minutes, but only if you mention the mailman.” “So if we avoid postal conversation, we’re golden.” “And the bouncy castle. The rental guy always leaves the blower on. We can crawl under the castle and pop up near the gift table.” “Pop up? From a bouncy castle? I’m not a ninja magician.” “You’re a thief, Bill. Improvise.” “Fine. What’s the extraction plan?” Molly gestured with her hand like a pilot. “We grab the gift bags. I take the left side—educational toys, socks, and Amazon returns. You take the right—all the flashy stuff. We meet by the weeping willow in ten.” “And the clown?” “Bobbles the Clown is allergic to glitter. I have a bag of it in my pocket. If he sneezes, we have a five-second window of chaos.” “You carry glitter for emergencies?” “I carry glitter for all occasions. This is just a targeted use.” Bill sighed, rubbing his temples. “This is the weirdest heist we’ve ever done. The banker’s vault was simpler. Even the antique store. But children’s party… it’s got moral landmines.” “Moral landmines? Bill, last week you stole a prosthetic leg from a man who was napping at the park.” “He was napping on my bench! And he left it unattended! That’s entrapment.” “You sold it at a flea market for twelve dollars.” “It was a good leg. Okay. Fine. We do this. But if I get glitter in my eyes, you’re carrying me out.” “Deal. And Bill?” “Yeah?” “If you see a unicorn piñata with rainbow wings, don’t puncture it. It’s not worth the emotional baggage.” “I’ll let the kids do that. They’re the vandals. We’re just the facilitators.” Molly nodded, a grim smile on her face. “Let’s go steal some presents, Bill. And try not to make a child cry.” “That’s your job,” Bill said, pulling a pair of sunglasses from his pocket. “I’m the silent type.” “You haven’t been silent for forty-five minutes.” “That’s talking. I’m silent in the moment.” “Just get under the bouncy castle.” “I’m going.” “You’re not going.” “I’m going.” “Fine. Go.”
created_at
"2026-04-25T14:58:50.61Z"
4/25/2026, 2:58:50 PM
model
"deepseek/deepseek-v4-flash-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
446
moderation_latency
(null)
generation_time
11340
tokens_prompt
77
tokens_completion
890
native_tokens_prompt
67
native_tokens_completion
800
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00023338
router
(null)
provider_responses
0
endpoint_id
"722e0746-2034-4d6e-a08f-e6271849ee9e"
id
"aa20d6b2-58c5-4dae-82b1-30ef9d95abf6"
is_byok
false
latency
234
model_permaslug
"deepseek/deepseek-v4-flash-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777129130-Bzleip4GGRRnleMvStWp"
session_id
(null)
api_type
"completions"
id
"gen-1777129130-XbubF8b7wp4mcxd1qNY6"
upstream_id
"aa20d6b2-58c5-4dae-82b1-30ef9d95abf6"
total_cost
0.00023338
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
35.00%
Matches word count
n/a
neededClean
false
words
518
0.00%
Dialogue to Total Word Ratio
Ratio: 89.48%, Deviation: 39.48%
neededClean
false
wordsTotal
523
wordsDialogue
468
17.5011%