NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2FKJNCBVV0VV1H66K04Z9K
Run Details
Model:
DeepSeek V4 Flash (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
11.3s
Cost:
$0.000220
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
0.08%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0424%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The basement light flickered, casting jumpy shadows across the cluttered table. Molly adjusted her balaclava, which was already riding up over her chin because the elastic was shot. Bill was trying to unfold a creased printout of a birthday invitation while balancing a half-eaten bag of cheese puffs on his knee. “Okay, so the target is the ‘Magical Unicorn Spectacular’ for a kid named Timmy,” Bill said, spitting a puff crumb onto the guest list. Molly squinted. “Timmy? That’s the mark? A kid named Timmy? We’re stealing from a child named Timmy. I feel like we need a grittier target. Like a museum. Or a supervillain’s lair.” “Museum security is tighter. Supervillains shoot lasers. Timmy’s mom is renting a bouncy castle,” Bill said, tapping the paper. “The presents will be under a fake rainbow arch. We go in as clown entertainers.” “Clowns?” Molly’s voice cracked. “Bill, clowns are terrifying. Half the kids will cry before we even get to the loot. And I can’t juggle. I dropped an orange once and it bruised my foot.” “You don’t need to juggle. You just need to distract. I’ll do the balloon animals.” “You make balloon animals that look like malignant tumors.” Bill ignored her. “The plan: We arrive at 2:30. The parents will be drunk on cheap punch by then. You do a magic trick that goes wrong—maybe pull a stuffed rabbit out of a hat, except the rabbit is dead. They’ll all scream. I scoop the presents into a laundry bag. We exit through the back gate.” Molly rubbed her forehead, smearing a streak of glitter from a previous failed job. “A dead rabbit? Where are we getting a dead rabbit at 2:15 on a Saturday? And who’s cleaning that up?” “I was being metaphorical. Fine. A live rabbit that bites you. Chaos. Then I do balloon poodles. The kids will be so distracted by the screaming and the biting, they won’t notice the presents vanishing.” “And if the parents call the cops? They’ll describe two terrifying clowns. One with a dead rabbit, one making balloon tumors. That’s a very specific description, Bill.” Bill sighed, crumbling the invitation. “Fine. New plan. We pose as gift-wrapping specialists. I’ll wear a bow tie. You wear a bow tie. We offer to wrap presents on-site for free. The parents love free. We bundle the gifts into our own boxes. We walk out with the loot inside ‘wrapping supplies.’” Molly considered it. “That’s less visually traumatic. But don’t they already wrap presents before the party?” “No, they’re lazy. They bring them unwrapped in gift bags. This is a service. We’re helping. And then we’re robbing them. It’s ethical.” “It’s the opposite of ethical.” “It’s comedic,” Bill said, grinning. “Now, let’s go practice tying a bow. I’ve got a shoelace and a dying sense of pragmatism.”
created_at
"2026-04-25T14:09:57.936Z"
4/25/2026, 2:09:57 PM
model
"deepseek/deepseek-v4-flash-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
609
moderation_latency
(null)
generation_time
10894
tokens_prompt
77
tokens_completion
824
native_tokens_prompt
67
native_tokens_completion
753
native_tokens_completion_images
(null)
native_tokens_reasoning
63
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00022022
router
(null)
provider_responses
0
endpoint_id
"722e0746-2034-4d6e-a08f-e6271849ee9e"
id
"3309d08a-00fc-4392-8b7c-eac043ae3305"
is_byok
false
latency
305
model_permaslug
"deepseek/deepseek-v4-flash-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777126197-0rEGFHrvd5qKes17TZyI"
session_id
(null)
api_type
"completions"
id
"gen-1777126197-WG6vYxBsArbSFYpgxUFl"
upstream_id
"3309d08a-00fc-4392-8b7c-eac043ae3305"
total_cost
0.00022022
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.08%
Matches word count
n/a
neededClean
false
words
471
0.00%
Dialogue to Total Word Ratio
Ratio: 78.90%, Deviation: 48.90%
neededClean
false
wordsTotal
474
wordsDialogue
374
0.0424%