NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2FRSB3VHF7N2CGKK78RSG8
Run Details
Model:
DeepSeek V4 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
10.2s
Cost:
$0.000197
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The clipboard was a lie. Bill held it like a shield, squinting at the inflatable castle that dominated the backyard like a neglected, bouncy lung. “The target is the ‘Gift Grotto,’” he said, jabbing a thumb at a gazebo swathed in crepe paper that had already started to droop. “Access through the main buffet line. Neutralize the magician.” Molly, crouched behind a parked minivan, adjusted her fake glasses. They were thick and smudged with what she hoped was icing. “Neutralize him? Bill, he’s sixty-two and his act is a bunny that keeps escaping. I saw it trying to get into the chip dip.” “A distraction is a distraction. Once the balloon animals go up, we go in. I’ll take the high-value items—the Lego kits, the drones, the screaming remote-control car.” “You can’t steal a scream, Bill.” “I can steal the car that makes it. It’s about asset acquisition.” He had a laminated map of the lawn, drawn on a napkin. “You handle the soft targets. Plush. The unicorns. The very large, unsettling giraffe.” Molly peered at the giraffe. It was wearing a party hat. “Why is the giraffe unsettling? It’s a plush toy.” “Its eyes follow the kids. I saw it watch a toddler eat a cupcake. It was judging him.” “That’s just polyester, Bill. You’re projecting.” He checked his watch, a cheap digital with a cartoon dinosaur on it. “In sixty seconds, Sheila, the mother, will attempt to light the grill. It will not light. She will panic. That’s our window.” “The kid’s name is Timmy, not ‘Target Alpha.’ And we’re robbing a nine-year-old.” “We are liberating items from a poorly managed supply chain. The child will bond with the experience. It builds character. Now, the cupcake table is self-serve. I’ll drop a smoke pellet in the punch bowl. You grab the loot and meet me by the bounce house’s deflation valve. We flood the yard with a gentle sigh of disappointment.” “And we escape looking like… what? Two jilted clowns?” “We look like Aunt Carol and her friend Steve who lost his way to the bathroom. The disguise is confidence.” He straightened his own badge, which read “Party Logistics.” It was a lie, and the lanyard was a repurposed bootlace. “Remember: if a kid asks you a question, tell them you are the ‘Late Present Specialist.’ It explains the bulging bags.” Molly looked at the crowd of shrieking children, their faces smeared with that terrifying, blue-dyed frosting. She watched a little girl hug the giraffe. It did not hug back. Bill was right. It was judging. “Fine,” she sighed. “But I’m not touching the giraffe. That thing is cursed.” “Noted. Cursed asset is non-operative. Steal the rainbow-colored slime that screams instead. It has a higher resale value.”
created_at
"2026-04-25T14:12:48.615Z"
4/25/2026, 2:12:48 PM
model
"deepseek/deepseek-v4-flash-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
466
moderation_latency
(null)
generation_time
10077
tokens_prompt
77
tokens_completion
740
native_tokens_prompt
67
native_tokens_completion
671
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00019726
router
(null)
provider_responses
0
endpoint_id
"722e0746-2034-4d6e-a08f-e6271849ee9e"
id
"e8896a36-dff2-4c6b-8d72-0fbf64fa05f9"
is_byok
false
latency
215
model_permaslug
"deepseek/deepseek-v4-flash-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777126368-8Li81uBFuF2rxMGLpWRK"
session_id
(null)
api_type
"completions"
id
"gen-1777126368-3GtpajogyuCyeKaiyGNB"
upstream_id
"e8896a36-dff2-4c6b-8d72-0fbf64fa05f9"
total_cost
0.00019726
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
460
0.00%
Dialogue to Total Word Ratio
Ratio: 66.74%, Deviation: 36.74%
neededClean
false
wordsTotal
469
wordsDialogue
313
0.0000%