NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2JNMFA6RTTKSEGPR2DJDWK
Run Details
Model:
DeepSeek V4 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
12.1s
Cost:
$0.000235
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
81.27%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
40.6365%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“I’m telling you, it’s a goldmine,” Bill whispered, jabbing a greasy finger at the hand-drawn map spread across the sticky bar table. “Every kid in the tri-county area is bringing a gift. Pure, unadulterated loot.” Molly took a long drag of her soda, eyes narrowing. “Unadulterated? Bill, it’s a five-year-old’s birthday party. The ‘loot’ is a glow-in-the-dark slime kit and a book about a constipated triceratops.” “You’re not seeing the big picture,” Bill insisted, leaning in. “Think about the secondary market. Slime kits? Etsy resellers eat that stuff up. And that triceratops book? Signed by the author who only does children’s parties now. Rarity, Molls. Rarity.” “His name is Dr. Poot. The dinosaur’s name is Dr. Poot. You want to fence Dr. Poot?” “I want to fence the entire gift table,” Bill said, his voice dropping to a theatrical hush. “The piñata is a decoy. The real value is in the sock-puppet station. Those are artisan fleece snacks.” Molly set her glass down with a thunk. “Artisan. Fleece. Snacks. You want to heist a sock-puppet station from a party where the only security is a dad in a sweaty Spider-Man costume blowing up balloons wrong.” “Exactly! No cameras. No armed guards. Just a sugar-crazed mob of toddlers. It’s the perfect mark.” “And how, pray tell, do you suggest we slip past the sugar-crazed mob? Cartoon disguise? We dress up as oversized juice boxes?” “Better,” Bill said, his eyes gleaming. “We dress up as the entertainment. I’ve got a cousin who’s a magician. He gets booked for these things. We just show up, do a few lousy tricks, and while everyone’s watching the ‘vanishing coin,’ we vanish the gift table.” Molly blinked. “You know how to do lousy tricks?” “No. But I do know how to pull a tablecloth out from under a pile of presents. I saw it on a baking show.” “That’s not a magic trick. That’s a felony with extra tableware.” “Same difference. Look, the plan is airtight. We get in. We get the haul. We’re out before the Chuck E. Cheese equivalent can say ‘happy birthday, Jeffrey.’” “It’s a girl’s party. The invitation says ‘Princess Poppy’s Unicorn Sparkle Parade.’” Bill froze. “Sparkle Parade?” “Rainbow vomit of glitter, Bill. Our shoes will be covered in it for a month. You can’t fence a pair of glitter-encrusted sneakers without looking like you escaped a drag show from hell.” “We wear booties. Disposable. Like a clean room. But for crime.” Molly rubbed her temples. “And what do we do with forty-seven copies of ‘Dr. Poot’s Potty Patience’? Open a black-market library?” “No. We wait until the market dips, then we buy low, sell high on eBay for the third birthday season. It’s the long con, Molls. The child-gift economy is cyclical.” She stared at him for a long moment. “You’re a madman.” “I’m an entrepreneur.” “You’re a madman with a plan to steal a unicorn piñata from a child named Poppy. And I’m in.” She grabbed her jacket. “But if I end up with glitter in my hair, you’re buying the methadone.”
created_at
"2026-04-25T15:03:31.055Z"
4/25/2026, 3:03:31 PM
model
"deepseek/deepseek-v4-flash-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
670
moderation_latency
(null)
generation_time
12005
tokens_prompt
77
tokens_completion
847
native_tokens_prompt
67
native_tokens_completion
805
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00023478
router
(null)
provider_responses
0
endpoint_id
"722e0746-2034-4d6e-a08f-e6271849ee9e"
id
"6ff2cf5d-6d3a-49bf-869d-1d66d6b2fecf"
is_byok
false
latency
352
model_permaslug
"deepseek/deepseek-v4-flash-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777129411-bq0a8GJqRz2sWlWlCB0n"
session_id
(null)
api_type
"completions"
id
"gen-1777129411-49W84HOAMREHKBhtbOeW"
upstream_id
"6ff2cf5d-6d3a-49bf-869d-1d66d6b2fecf"
total_cost
0.00023478
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
81.27%
Matches word count
n/a
neededClean
false
words
512
0.00%
Dialogue to Total Word Ratio
Ratio: 86.15%, Deviation: 16.15%
neededClean
false
wordsTotal
527
wordsDialogue
454
40.6365%