NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2G5E71P3GCJWDJHJ7G1XEB
Run Details
Model:
DeepSeek V4 Flash (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
12.3s
Cost:
$0.000246
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
0.01%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0049%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“You’re telling me we’re robbing a bunch of six-year-olds?” Bill said, shifting his weight from one foot to the other. He was wearing a black tracksuit that smelled faintly of regret and microwaved lasagna. “No, Bill. We’re not robbing children. We’re liberating gifts from a poorly secured party venue that happens to be populated by children,” Molly said. She was already sketching a map on a napkin with a stolen eyeliner pencil. “That’s the same thing.” “It’s not. There’s a moral distinction. We’re not taking the toys. We’re taking the opportunities those toys represent. And we’re doing it for the greater good.” “What greater good? My rent?” “Precisely. Your rent is a noble cause. Now focus.” She tapped the napkin. “The party is at the ‘Magic Castle Bounce Palace.’ It’s a two-story tank of germs and screaming. They’ll have a designated gift table by the ball pit. The mother in charge, a woman named Carol, always puts it there. I’ve been staking it out for three weeks.” “Three weeks? You’ve been casing a four-year-old’s birthday party for three weeks?” “I had to memorize the drop-off patterns. Carol’s sister shows up at 2:17 with a giant box. The grandmother arrives twenty minutes early and rearranges everything. It’s a symphony of chaos, Bill. And we’re the maestros.” Bill rubbed his face. “Okay. So what’s the plan? We walk in wearing clown outfits and a trench coat?” “No. That’s ridiculous. We’re going as parents.” “We don’t have kids.” “We’ll borrow one.” “Borrow a kid? From where? There’s a kid store?” “There’s a park,” Molly said flatly. “We pick one that looks sad and compliant. We tell them we’re doing a fun game where they get to sit quietly in a corner while we do a scavenger hunt. If they cry, we abandon them. It’s a win-win.” Bill stared at her. “That’s kidnapping.” “It’s temporary custodial acquisition. With a snack incentive.” She pulled a granola bar from her pocket. “See? Prepared.” “Molly, I don’t think a granola bar is a legal gray area eraser.” “Fine. We don’t borrow a kid. We use a decoy. I’ll dress like a mom—frazzled, carrying a giant diaper bag that’s actually a duffel for hauling loot. You’ll be the dad who’s clearly on his phone the whole time. We blend in.” “And the presents?” “We wait for the piñata moment. That’s when Carol is blindfolded and swinging a broom at a papier-mâché donkey. Every adult is watching for candy. We empty the table into the diaper bag. We walk out slowly. We slip the bouncer a $20 and tell him we’re getting the kid who’s allergic to gluten.” Bill nodded slowly. “I have a $20. I was going to use it for a sandwich.” “Consider it an investment. A sandwich is temporary. Stolen dollhouse money is forever.” “What if someone sees us?” “No one sees the parents. They’re background noise. We are the least interesting people in that room. We will be invisible.” Bill sighed. “And what happens when we get home and I have to unwrap a My Little Pony that smells like a stranger’s house?” Molly smiled. “Then we laugh. And we sell it on the internet.”
created_at
"2026-04-25T14:19:43.209Z"
4/25/2026, 2:19:43 PM
model
"deepseek/deepseek-v4-flash-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
721
moderation_latency
(null)
generation_time
11611
tokens_prompt
77
tokens_completion
936
native_tokens_prompt
67
native_tokens_completion
846
native_tokens_completion_images
(null)
native_tokens_reasoning
50
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00024626
router
(null)
provider_responses
0
endpoint_id
"722e0746-2034-4d6e-a08f-e6271849ee9e"
id
"06e12f40-7ffa-4ec1-a780-22aa9ab07667"
is_byok
false
latency
420
model_permaslug
"deepseek/deepseek-v4-flash-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777126783-3GefMJq1tirKzwGHTtkK"
session_id
(null)
api_type
"completions"
id
"gen-1777126783-92DH6QmU7z3ow6MIYV9O"
upstream_id
"06e12f40-7ffa-4ec1-a780-22aa9ab07667"
total_cost
0.00024626
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.01%
Matches word count
n/a
neededClean
false
words
531
0.00%
Dialogue to Total Word Ratio
Ratio: 86.83%, Deviation: 16.83%
neededClean
false
wordsTotal
539
wordsDialogue
468
0.0049%