NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2ECXE7B632AW38RBKH4Y66
Run Details
Model:
DeepSeek V4 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
10.9s
Cost:
$0.000224
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
97.63%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
48.8138%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“You’re sure about this?” Bill’s whisper was a dry crackle, like a bad radio signal in the van’s stale air. Molly adjusted the Hello Kitty mask over her face. “Positive. The bounce house will be a perfect distraction.” “A distraction from what? Us? We’ll look like two giant, depressed kittens.” “It’s a theme party, Bill. No one questions a grown woman in a feline onesie. Now, remind me. What’s the target?” “All the presents. Gathered under a canopy by the cake table.” “Right. Classic ‘pile of loot’ scenario. They want us to succeed.” “Or they’re children,” Bill said, sighing. “With small, sticky hands and loud, piercing voices. That’s a security system we can’t hack, Molly.” “We don’t hack them. We exploit the birthday turmoil. Little Timmy turns six. The sugar high hits at two-fifteen. The adults will be weeping by two-thirty. We have a forty-minute window.” “Forty minutes to navigate a minefield of training-pants and face-paint.” “You have the bag of decoy gifts?” Bill held up a sack filled with old VHS tapes and a broken blender. “The crowning glory is a stuffed bear that smells like my uncle’s shed. It’s horrifying.” “Perfect. We swap the bags. They open the decoys. The chaos buys us twenty more minutes. They’ll think ‘Aunt Carol’ is on a cheap kick.” “What if Aunt Carol is actually here?” Molly paused. “Then she’s a helpful accomplice. Or a liability. We’ll plant a booger on her sleeve. Framed. Either way.” “I don’t see a window into the cake,” Bill muttered, peering through binoculars. “It’s a three-tiered fortress of frosting. Emergency exits are the French doors and a doggie door.” “The doggie door is for the miniature poodle named ‘Sir Barks-a-Lot.’ I have his biography. He likes cream cheese. I have a tube of it in my fanny pack.” “You’re going to bribe a poodle?” “It’s called threat neutralization. You contain the mutt. I get the gifts. Then we meet by the bounce house. We’ll look like two exhausted parents retrieving a lost shoe.” “And if Sir Barks-a-Lot doesn’t like cream cheese?” “Then he gets the taser.” “You’re tasering a poodle at a six-year-old’s party?” “Only if he’s a narc. Last check: party favors?” “Miniature slinkies and noisemakers. The noisemakers are our cover. We shake them, they think we’re having fun. We stop shaking, they think we’re in a conversation about daycare costs.” “Brilliant. The slinkies can be used as impromptu handcuffs for any rogue toddler.” “Molly, I think this is the dumbest plan we’ve ever had.” “That, my friend, is why it’s perfect. No one expects two idiots in cat suits. Except maybe the cat. But he’s on the payroll. The actual cat, not Sir Barks-a-Lot.” Bill rubbed his temple. “The cat on the party flyer. The one with the party hat. You bribed a cartoon?” “Details, Bill. Now put on your tail. The piñata has a soft opening at quarter past two. We move at its first crack.”
created_at
"2026-04-25T13:48:51.021Z"
4/25/2026, 1:48:51 PM
model
"deepseek/deepseek-v4-flash-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
529
moderation_latency
(null)
generation_time
10609
tokens_prompt
77
tokens_completion
837
native_tokens_prompt
67
native_tokens_completion
766
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00022386
router
(null)
provider_responses
0
endpoint_id
"722e0746-2034-4d6e-a08f-e6271849ee9e"
id
"785cc6e8-5cc5-4848-928a-a4da55804728"
is_byok
false
latency
273
model_permaslug
"deepseek/deepseek-v4-flash-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777124931-MEDi5gd4bcYYOW3grI6e"
session_id
(null)
api_type
"completions"
id
"gen-1777124931-VBtiGEK5z8IUipA7FiqJ"
upstream_id
"785cc6e8-5cc5-4848-928a-a4da55804728"
total_cost
0.00022386
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
97.63%
Matches word count
n/a
neededClean
false
words
493
0.00%
Dialogue to Total Word Ratio
Ratio: 89.55%, Deviation: 19.55%
neededClean
false
wordsTotal
507
wordsDialogue
454
48.8138%