NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2GQ532Y9NQ2005SYJQ1JCG
Run Details
Model:
DeepSeek V4 Flash (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
12.0s
Cost:
$0.000248
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
27.17%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
13.5829%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“So, let me get this straight,” Bill said, pressing his fingers to his temples. “We’re casing a five-year-old’s birthday party.” “It’s not just any party,” Molly corrected, peering through binoculars from the back of the van. “It’s Chloe’s party. Chloe with the trust fund parents and the life-sized unicorn piñata. The loot is in the gift tent. I saw them stacking it.” Bill sighed. “Gift tent. We’re stealing from toddlers.” “Toddlers with excellent taste,” Molly said. “There’s a limited-edition robotic hamster in there. Retails for four hundred.” “And what’s our plan? Knock over the bouncy castle? Use a candy necklace as a garrote?” “Better,” Molly said, pulling out a clipboard. “I’ve acquired the party planner’s schedule. At 2:15, there’s a magic show. The magician, ‘The Amazing Carlito,’ does a thirty-minute act. That’s our window.” Bill squinted at her. “You want to sneak in during a magic show?” “We dress as clowns. Blend right in.” “We don’t have clown costumes.” Molly reached into a duffel bag and tossed him a red nose and a floppy hat. “We do now. I raided a party supply store last night. Got a unicycle too, if you want the full effect.” Bill held the hat like it was radioactive. “I am not riding a unicycle.” “Fine. You’ll be the sad clown. Just stand in a corner and look pathetic. No one questions a sad clown. They just give him a balloon.” “I hate this plan,” Bill muttered. “You hate every plan. Remember the bank job? You said the vault was too shiny.” “It was distracting!” Molly ignored him and continued. “The gift tent is guarded by a woman named Brenda. She’s the mom of the birthday boy—Chloe’s brother. She’s intense. I saw her shoo a pigeon with a glare.” “So we’re battling a suburban mom?” “We’re outsmarting her. The magician’s grand finale involves making a rabbit disappear. I swapped the rabbit with a smoke bomb. When it goes off, panic ensues. We grab the loot and exit through the hedge maze.” Bill stared. “There’s a hedge maze?” “It’s a very fancy party.” He rubbed his face. “And what’s our distraction? Besides the smoke bomb?” Molly grinned. “I brought a trained parrot. It’s programmed to scream ‘Free cake!’ over and over. Kids love cake. They’ll stampede.” “You have a trained parrot.” “I have a parrot that says ‘Free cake’ because I fed it cake for three days. It’s the same thing.” Bill leaned back, defeated. “Okay. Even if this works—Clown costumes, smoke bomb, cake-parrot—how do we get out without being seen?” “We blend back into the crowd,” Molly said. “After the chaos, we join the parents taking selfies. No one looks twice at a clown with a bag of presents. And if anyone asks, we’re delivering the loot to the birthday girl.” “But we’re stealing it.” “Details.” Molly checked her watch. “Party starts in twenty minutes. Put on the nose.” Bill put on the red nose. It squeaked. “I hate you.” “You love me,” Molly said, handing him a giant orange wig. “Now let’s go make a toddler cry. In a funny way.”
created_at
"2026-04-25T14:29:23.686Z"
4/25/2026, 2:29:23 PM
model
"deepseek/deepseek-v4-flash-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
527
moderation_latency
(null)
generation_time
11946
tokens_prompt
77
tokens_completion
963
native_tokens_prompt
67
native_tokens_completion
853
native_tokens_completion_images
(null)
native_tokens_reasoning
76
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00024822
router
(null)
provider_responses
0
endpoint_id
"722e0746-2034-4d6e-a08f-e6271849ee9e"
id
"e4449dce-c418-4b33-bc58-f810759a487a"
is_byok
false
latency
310
model_permaslug
"deepseek/deepseek-v4-flash-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777127363-LxMhGZVx3sDc2IePyNgn"
session_id
(null)
api_type
"completions"
id
"gen-1777127363-zlmjxool0X9dJoOWo68K"
upstream_id
"e4449dce-c418-4b33-bc58-f810759a487a"
total_cost
0.00024822
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
27.17%
Matches word count
n/a
neededClean
false
words
519
0.00%
Dialogue to Total Word Ratio
Ratio: 81.37%, Deviation: 31.37%
neededClean
false
wordsTotal
526
wordsDialogue
428
13.5829%