NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2EFKJBS6W77WX8XWQQ3TMQ
Run Details
Model:
DeepSeek V4 Flash (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
11.9s
Cost:
$0.000235
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
75.16%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
37.5778%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly tapped her pen against the blueprint of the Johnson backyard, a crinkled map she’d swiped from the recycling bin. “The bounce house is a chokepoint. We come in through the hedge here, behind the grill.” Bill squinted at the drawing. “That’s the spot where Dave Johnson keeps his gnomes. They’re creepy. One is holding a tiny rake.” “Ignore the gnomes. Focus on the pile.” Molly circled a cluster of X’s near a picnic table. “The presents will be stacked here, gift bags and boxes, all taped up with shiny bows. We grab, we go, we’re out before the piñata gets whacked.” “I’m just saying, a gnome with a rake implies he’s prepared for gardening violence. What if he’s a lookout?” “Bill, he’s a ceramic gnome. He’s not calling the cops.” Molly sighed. “Look, the party starts at two. Parents drop kids off. They play pass-the-parcel for forty minutes, then cake, then presents. We hit during ‘Pin the Tail on the Donkey’.” “That’s a classic. I feel bad stealing from a kid’s birthday. It feels… low.” “The kid is turning six. He won’t remember. He’ll get more stuff next year. We’re redistributing wealth to our own pockets. It’s ethical, in a sideways way.” “Ethical? We’re going to fill duffel bags with LEGO sets and bicycles.” “And that’s wrong?” Molly looked up from the blueprint, genuinely puzzled. “Have you seen the price of LEGO? It’s highway robbery. We’re just taking a small percentage back.” Bill rubbed his neck. “What about the adults? The hosts?” “The Johnsons are insurance adjusters. They’ll file a claim and get a payout. Probably make a profit.” Molly drew an arrow through the hedge. “I’ll be dressed as a clown. You’re the balloon animal guy.” “A balloon animal guy? I can’t even tie a balloon knot. Last time I tried, I gave my nephew a rabid poodle.” “That’s perfect. You’re supposed to be bad. Everyone will laugh, no one will suspect. While you’re fumbling with a balloon giraffe, I’ll be palming gift cards and stuffing iPads into my pants.” “Your pants? Where are you keeping iPads?” “Juggling pockets. Custom sewn. Very roomy.” Molly patted her hip. “The kids will be mesmerized by your incompetence. The parents will be drinking cheap wine. We have a window of eight minutes, max.” Bill stared at the gnome. “I still don’t like that rake.” “Fine. I’ll distract the gnome. You grab the presents.” “How do you distract a gnome?” “I’ll give it a tiny high-five. Now stop worrying.” Molly folded the blueprint. “We case the yard tonight. Then tomorrow, we become the life of the party. And after the party, we become the life of our escape vehicle.” “And if we get caught?” “We don’t.” Molly grinned. “Because no one expects a clown and a terrible balloon animal guy to be professional thieves. That’s the comedy, Bill. We’re the joke that nobody laughs at until it’s too late.”
created_at
"2026-04-25T13:50:19.214Z"
4/25/2026, 1:50:19 PM
model
"deepseek/deepseek-v4-flash-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
713
moderation_latency
(null)
generation_time
11335
tokens_prompt
77
tokens_completion
885
native_tokens_prompt
67
native_tokens_completion
806
native_tokens_completion_images
(null)
native_tokens_reasoning
72
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00023506
router
(null)
provider_responses
0
endpoint_id
"722e0746-2034-4d6e-a08f-e6271849ee9e"
id
"2c7c5545-ef1b-4eb5-9026-bbf2336fcba1"
is_byok
false
latency
444
model_permaslug
"deepseek/deepseek-v4-flash-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777125019-RMhyMBylQFnvuY2oZwIY"
session_id
(null)
api_type
"completions"
id
"gen-1777125019-o4eIGz8AVRy1kIiSLouC"
upstream_id
"2c7c5545-ef1b-4eb5-9026-bbf2336fcba1"
total_cost
0.00023506
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
75.16%
Matches word count
n/a
neededClean
false
words
487
0.00%
Dialogue to Total Word Ratio
Ratio: 85.51%, Deviation: 55.51%
neededClean
false
wordsTotal
490
wordsDialogue
419
37.5778%