NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2GXA8T80FR9VBZE5F33VXB
Run Details
Model:
DeepSeek V4 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
11.4s
Cost:
$0.000214
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
86.38%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
43.1902%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Bill, for the last time, we are not wearing clown suits.” “Molly, for the last time, why not? Clowns are invisible at a kid’s party. It’s basic bird-watching logic.” “Bird-watching logic? You’re comparing children to puffins?” “No, I’m comparing our getaway to a flock of seagulls. A clown distracts, a clown blends, and a clown has big, floppy shoes perfect for stepping over tiny, sticky hands.” “We are career professionals, Bill. We have a rubric. A clown suit has too many buttons. One button pops, the whole gig goes south.” “Fine. What’s your grand plan? A trench coat and a fake mustache made of dryer lint?” “Better. We go as the birthday girl’s favorite characters. I checked the invitation. The party is ‘Princess Sparklehoof and the Pirate Penguin.’ I’m Sparklehoof. You’re the penguin.” Bill stopped pacing the cramped, snack-crumb-covered living room. “I’m a penguin. A pirate penguin. Do I get an eye patch?” “You get a cardboard beak and a felt hat. We waddle in, we sing the stupid song they’ll be screaming for, and while everyone is crying with joy, we swap the gift bags with our decoy bags.” “Decoy bags filled with what?” “Thrift store board games missing half the pieces. And a single, angry-looking sock.” Bill rubbed his chin. “I like the sock. But what if the birthday girl, little Brittany McFluffernutter, recognizes us? She’s seen us in the neighborhood. She waves at me from her tricycle. She called me ‘funny man with the fungus on his porch.’” “That was a garden toad, Bill. And you’re wearing a penguin suit. Her brain will not compute. Children have the cognitive ability of a golden retriever with a head cold.” “Okay, okay. But the haul. The invitation said ‘no electronics, just love and donations.’ What kind of score is that?” “Donations, Bill, is code for cash-stuffed envelopes from grandma. And ‘love’ means a whole lot of unopened, returnable gift cards. We hit the ceiling in plastic.” “Fine. But I draw the line at the balloon animal station. I am not twisting a balloon weiner dog. My hands shake. I’ll pop it. Children will cry. The heist is over.” “You won’t be at the balloon station. You’ll be in the bounce house, creating a diversion.” “A diversion? In a bounce house? How? I’ll just be bouncing. That’s the only activity.” “Exactly. You bounce so hard you tip the whole thing over. A glorious, inflatable avalanche of screaming toddlers. I grab the bags.” Bill stared at her, a slow, awful grin spreading. “You know, that’s actually genius. A bounce house collapse. It’s chaos. It’s pure, unadulterated property damage.” “It’s art, Bill. Now, do you have a clean pair of socks? The penguin suit is rented, and I refuse to be the one who brings back a foot funk.” “I have one. It has a hole in the toe. Puts my big toe on display.” “Perfect. It’ll match the decoy sock.”
created_at
"2026-04-25T14:32:45.599Z"
4/25/2026, 2:32:45 PM
model
"deepseek/deepseek-v4-flash-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
519
moderation_latency
(null)
generation_time
11300
tokens_prompt
77
tokens_completion
802
native_tokens_prompt
67
native_tokens_completion
729
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0002135
router
(null)
provider_responses
0
endpoint_id
"722e0746-2034-4d6e-a08f-e6271849ee9e"
id
"7c860478-ba58-4768-afd0-7a3ce9c59e5e"
is_byok
false
latency
272
model_permaslug
"deepseek/deepseek-v4-flash-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777127565-hUFliy4YqWWaTQqRnqp2"
session_id
(null)
api_type
"completions"
id
"gen-1777127565-LxZ4XSph6GfQwD3bx5V7"
upstream_id
"7c860478-ba58-4768-afd0-7a3ce9c59e5e"
total_cost
0.0002135
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
86.38%
Matches word count
n/a
neededClean
false
words
489
0.00%
Dialogue to Total Word Ratio
Ratio: 95.34%, Deviation: 45.34%
neededClean
false
wordsTotal
494
wordsDialogue
471
43.1902%