NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2J3QFTN301BHD1GJ74SZSY
Run Details
Model:
DeepSeek V4 Flash (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
12.6s
Cost:
$0.000242
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
60.28%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
30.1376%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“I’m telling you, Bill, it’s a goldmine,” Molly whispered, peering through the bushes at the bouncing castle. “Look at that pile. Unwrapped. Accessible. No security guard over seventy.” Bill squinted. “They’re children, Mol. Children have parents. Parents have phones. Phones have cameras.” “And cameras have blind spots,” she said, tapping her temple. “I’ve mapped the party layout. The piñata is a distraction. The clown is a liability—he’s allergic to face paint, I saw him sneeze into a balloon. We go in during ‘musical chairs.’” “Musical chairs?” Bill repeated, scratching his neck. “That’s when they’re all sitting down, looking at each other. Highly observant moments.” “They’re three feet tall, Bill. Their observation is limited to whether the cake has sprinkles.” “I’m not arguing the height advantage. I’m arguing the moral advantage. We’re stealing presents from a six-year-old named Timmy.” “Timmy’s parents spent three hundred dollars on a remote-control dinosaur he’ll forget about by Tuesday. We’re providing a life lesson in impermanence. Buddhist, really.” Bill rubbed his face. “I don’t think the Dalai Lama starts with a crowbar.” “That’s for the piñata,” she said, patting her duffel. “Planned decoy. We crack it open, chaos ensues. You grab the gift bags near the cooler. I take the stack under the table with the unicorn tablecloth.” “The unicorn tablecloth. Of course. Because our professionalism requires identifying linens.” “Details, Bill. You lack details. Did you notice the aunt near the grill? She’s drinking boxed wine from a sippy cup. That’s a woman who will not notice a missing Lego set.” Bill sighed. “Okay. Say we get the loot. Where do we fence a Hello Kitty nightlight and a box of artisanal crayons?” “eBay. Tragic backstory. ‘My nephew passed away, selling his unopened treasures to fund his memorial slushie fund.’ You’ll get a tear-jerking premium.” “That’s evil. That’s beautiful evil.” “Thank you. Now, the exit strategy: we put on the party hats. Blend in with the adults. The mom is tipsy, the dad is on the grill arguing with a hot dog. We walk out, smiling, holding the bags like we’re taking them to the car.” “What if a kid asks where their present went?” “We point at the clown. ‘Ask Uncle Bubbles, sweetie.’ The clown will blame the magician. Cascade of lies.” Bill stared at the bouncy castle. “I can’t believe I’m doing this. I used to steal cars.” “Cars don’t have glitter on them. This is more fun. Ready?” He took a breath. “One condition. I get the dinosaur.” “Deal. On three. One... two...” “Wait—what about the bubble machine?” “What about it?” “It’s right next to the piñata. If we hit it, we’re covered in soap. Slip hazard.” Molly grinned. “Perfect. If we slip, we look like clumsy parents. They’ll feel bad for us. Sympathy loot.” “You’ve thought of everything.” “I thought of you, Bill. That’s why we’re partners.” He almost smiled. “Let’s go break a dinosaur’s heart.”
created_at
"2026-04-25T14:53:44.318Z"
4/25/2026, 2:53:44 PM
model
"deepseek/deepseek-v4-flash-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
567
moderation_latency
(null)
generation_time
12525
tokens_prompt
77
tokens_completion
926
native_tokens_prompt
67
native_tokens_completion
831
native_tokens_completion_images
(null)
native_tokens_reasoning
58
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00024206
router
(null)
provider_responses
0
endpoint_id
"722e0746-2034-4d6e-a08f-e6271849ee9e"
id
"9a301871-8d69-4c7a-92e8-0308e1eee98f"
is_byok
false
latency
274
model_permaslug
"deepseek/deepseek-v4-flash-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777128824-ocAQ8feDdIEtX0goaVyQ"
session_id
(null)
api_type
"completions"
id
"gen-1777128824-4MUlhb6cpWpztTskaRri"
upstream_id
"9a301871-8d69-4c7a-92e8-0308e1eee98f"
total_cost
0.00024206
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
60.28%
Matches word count
n/a
neededClean
false
words
485
0.00%
Dialogue to Total Word Ratio
Ratio: 90.18%, Deviation: 40.18%
neededClean
false
wordsTotal
489
wordsDialogue
441
30.1376%