NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2HEDZT9NS5HTZ17DSFMK2N
Run Details
Model:
DeepSeek V4 Pro (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
37.3s
Cost:
$0.000928
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Bill, I’m not wearing the duck costume again. The beak whistles every time I breathe and it’s a dead giveaway.” “It’s a giant yellow duck, Molly. Who’s going to suspect a duck of grand larceny? That’s the genius. We waddle in, we waddle out, we’re quack-addicted masterminds. The kids will just think you’re the entertainment.” “The entertainment was booked six weeks ago. It’s Princess Sparklehoof, a pony who does magic. I have the confirmation email right here. Security will spot a rogue waterfowl in seconds.” Bill drummed his fingers on the stolen catering van’s steering wheel. “Okay, fine. New plan. I go as the pony. You’re my handler. We load the presents into a giant glittery birthday cake we’ve hollowed out. The cake is on wheels. Classic.” “The cake is a fire hazard, Bill, I’ve told you. The wheels squeak. And you’re six-foot-two. You’ll look less like a magical pony and more like a hostage situation involving a horse.” “So we pivot. We pivot hard. What’s the one thing a children’s party never questions?” “A bouncy castle losing air pressure and slowly asphyxiating a clown?” “A clown! You’re a clown, Molly! I’m the clown car. We’re a team. The car pulls up, you pile out with a sack that says ‘Recycled Dreams,’ and start twisting balloon animals while subtly bagging all the tablet-shaped packages. By the time they realize their presents have been ethically redistributed, we’re a block away in a vehicle that’s technically not a car.” Molly stared at him. “First, a clown car does not pull up. It is the clown. It’s a tiny car that many clowns exit. If you are the car, what are you, crouching on a skateboard making engine noises?” “Vroom, Molly. Vroom, and I’m very good at it. Second, it’s about emotional camouflage. Parents are so afraid of clowns they’ll look anywhere else. We exploit the coulrophobia economy.” “No clown. The last time we did a clown bit, you got your oversized shoe caught in a floor vent and started crying real tears that smeared your makeup, and a four-year-old tried to comfort you with a juice box. We lost the score because you found your inner child.” Bill sniffed, wounded. “That was a transformative moment. But you’re right, sentiment is the enemy of profit. So I’m taking us back to the cake. A more aerodynamic cake. A cake that says, ‘Nothing to see here, just dessert with a suspiciously high centre of gravity.’ We call it Operation Unfrosted.” “Where do we get an aerodynamic cake on two hours’ notice?” “I know a guy. He owes me for the bouncy castle incident. He’s a baker. A discreet baker. A man who understands that the structural integrity of fondant is a moral grey area.” Molly pinched the bridge of her nose. “Fine. The cake. But I’m not hiding inside. The last time I hid in a baked good, I smelled of vanilla extract for a week and every dog in a three-mile radius tried to propose to me. I’ll be the bewildered party mom who arrives with the emergency backup cake. I’ll have a clipboard. Nobody questions a woman with a clipboard and a sense of urgency.” “That’s brilliant. I’ll be inside the cake, breathing through a bendy straw, waiting for your signal. What’s the signal?” “I’ll say, ‘Gosh, I hope nobody steals this cake while I turn my back for exactly ninety seconds to have a very loud, distracting conversation with the pony.’” “The pony is our decoy. See? We’ve come full circle. What could possibly go wrong?”
created_at
"2026-04-25T14:42:11.557Z"
4/25/2026, 2:42:11 PM
model
"deepseek/deepseek-v4-pro-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1230
moderation_latency
(null)
generation_time
31991
tokens_prompt
77
tokens_completion
1148
native_tokens_prompt
67
native_tokens_completion
1033
native_tokens_completion_images
(null)
native_tokens_reasoning
180
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.000927855
router
(null)
provider_responses
0
endpoint_id
"d084170b-dd99-4e53-9ae1-b63881f3cfbf"
id
"8e34d864-7cef-4e5b-81e6-df2abe93f883"
is_byok
false
latency
270
model_permaslug
"deepseek/deepseek-v4-pro-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777128131-CIoJqOWF8P9zuzY1JxOm"
session_id
(null)
api_type
"completions"
id
"gen-1777128131-YyjqW6vQcqCNgMz3Inic"
upstream_id
"8e34d864-7cef-4e5b-81e6-df2abe93f883"
total_cost
0.000927855
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
595
0.00%
Dialogue to Total Word Ratio
Ratio: 95.85%, Deviation: 25.85%
neededClean
false
wordsTotal
602
wordsDialogue
577
0.0000%