NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2E2G0NDF0QJEA5GBXJ48GW
Run Details
Model:
DeepSeek V4 Flash (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
13.4s
Cost:
$0.000258
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
0.49%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.2460%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“This is insane,” Bill said, his voice muffled by the fake beard he’d glued on for disguise practice. He tugged at it, wincing. “It’s genius,” Molly corrected, spreading a crayon-scrawled flyer across the stolen card table. “Look at the layout. The bounce house blocks the patio door. We come in through the kitchen.” “We’re stealing presents from a five-year-old’s birthday party, Molly. That’s a new low.” “It’s not stealing. It’s ethical redistribution. That kid has forty-three gifts. Do you know how many kids in this city have zero? We’re Robin Hoods.” “Robin Hood gave to the poor. We’re selling them on Craigslist.” “Semantics.” She tapped the flyer. “The party’s at three. The parents will be distracted by the magician. The magician is already our inside man.” Bill stopped tugging. “Our what?” “Marvin the Marvelous. He’s an out-of-work clown. I offered him five hundred and a case of balloons.” “You bribed a magician?” “I offered him a career change. He’s going to do a big final trick—making the cake disappear. While everyone’s gasping, we slide in, grab the gift table, and out the side gate.” “Slide in? With a table? In broad daylight?” “We’ll have cover. I’ve rented a unicorn costume.” Bill stared at her. “You’re going to be a unicorn.” “A highly decorative unicorn with a hollow horn for lock picks and a saddlebag for luxury-grade loot. You’ll be the assistant. The spotted dog.” “I’m not wearing a dog costume.” “You’ll love it. It has floppy ears. It’ll hide your face.” She held up a child-sized spotted onesie with a zipper down the front. “Try it on.” He didn’t move. “What about security?” “The only security is a grandmother with a bad hip and a fondness for rosé. She’ll be three glasses in by three-fifteen. I’ve timed it.” Bill picked up the onesie. It smelled faintly of feet. “And if someone sees us?” “We improvise. We’re party entertainers. If confronted, I’ll say we’re lost. I’ll ask for directions to the petting zoo.” “There is no petting zoo.” “There isn’t. So they’ll assume we’re stupid, not criminals. It’s the perfect crime—so dumb it loops back to genius.” He held the costume at arm’s length. “Why can’t we just hit a bank?” “Banks have alarms. Children’s parties have gift bags full of bubble wands and iTunes cards. And no one locks their back door when there’s a magician on the lawn.” Bill sighed. “Fine. But if I have to bark, I’m out.” “You won’t have to bark. You’ll just have to carry the table while I distract with sparkle magic.” “You don’t know sparkle magic.” “I have a glitter cannon. Close enough.” He slowly pulled the onesie over his jeans. The ears flopped. “I feel ridiculous.” “You look essential. Now, let’s go over the exit. We dump the gifts in the van, change in the alley, and meet at Denny’s by four. I want Grand Slam.” Bill zipped up. “This is the worst plan you’ve ever had.” “And that’s exactly why it’s going to work.” Molly smiled, her face half-hidden behind the unicorn muzzle. “Ready, Spot?” “I hate you.” “Love you too. Let’s go steal some presents.”
created_at
"2026-04-25T13:43:09.592Z"
4/25/2026, 1:43:09 PM
model
"deepseek/deepseek-v4-flash-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
558
moderation_latency
(null)
generation_time
13334
tokens_prompt
77
tokens_completion
979
native_tokens_prompt
67
native_tokens_completion
887
native_tokens_completion_images
(null)
native_tokens_reasoning
67
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00025774
router
(null)
provider_responses
0
endpoint_id
"722e0746-2034-4d6e-a08f-e6271849ee9e"
id
"f7eee67a-baa4-4deb-8b1e-fa4c74dd5ea5"
is_byok
false
latency
271
model_permaslug
"deepseek/deepseek-v4-flash-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777124589-CZC8VlZEDnMzl6Ox3UGR"
session_id
(null)
api_type
"completions"
id
"gen-1777124589-60lYH9ajqWjVFnfNTwJX"
upstream_id
"f7eee67a-baa4-4deb-8b1e-fa4c74dd5ea5"
total_cost
0.00025774
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.49%
Matches word count
n/a
neededClean
false
words
527
0.00%
Dialogue to Total Word Ratio
Ratio: 80.75%, Deviation: 30.75%
neededClean
false
wordsTotal
535
wordsDialogue
432
0.2460%