NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2G9VQ94S7S70F1ZFAS58A0
Run Details
Model:
DeepSeek V4 Flash
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
4.6s
Cost:
$0.000086
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
99.38%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.6885%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Bill thumbed the edge of a creased map, the kind from a party supply store with cartoon balloons arrows pointing to the bouncy castle. Molly was already in the other room, testing the weight of a pink unicorn piñata strapped to her belt. “The ‘Unpack the Magic’ wrapping paper is tissue-thin,” she said, her voice a low hum against the squeak of her chalk-dusted gloves. “Three good swipes and the pile’s ours.” Bill nodded, marking the quickest route from the patio door to the gift table. He traced a dotted line around the cake. “And the parents? They’re not going to just watch us load up a minivan with Dino-Roar battle sets.” “You let me handle the parents,” Molly replied, adjusting a novelty party hat that read ‘7 Today!’ on her forehead. “I’ll be the aunt from Cleveland with a sprained ankle. They’ll be handing me presents to hold.” Bill tapped the map. “The loot’s by the lemonade cooler. Once I give the signal—a triple sneeze—you pull the fire alarm.” “Perfect,” Molly whispered, and her smile was all sharp little teeth. “Let’s go make some seven-year-old have a very expensive and emotional breakdown.”
created_at
"2026-04-25T14:22:08.109Z"
4/25/2026, 2:22:08 PM
model
"deepseek/deepseek-v4-flash-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
579
moderation_latency
(null)
generation_time
4112
tokens_prompt
77
tokens_completion
311
native_tokens_prompt
67
native_tokens_completion
273
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00008582
router
(null)
provider_responses
0
endpoint_id
"722e0746-2034-4d6e-a08f-e6271849ee9e"
id
"e68b0137-b32f-40e7-8e94-12cd2e0de5b3"
is_byok
false
latency
272
model_permaslug
"deepseek/deepseek-v4-flash-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777126928-TlMnvYeJpoPO9mg8foK7"
session_id
(null)
api_type
"completions"
id
"gen-1777126928-ULbG3NJNzvn4srvGZKna"
upstream_id
"e68b0137-b32f-40e7-8e94-12cd2e0de5b3"
total_cost
0.00008582
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
99.38%
Matches word count
n/a
neededClean
false
words
195
0.00%
Dialogue to Total Word Ratio
Ratio: 48.50%, Deviation: 38.50%
neededClean
false
wordsTotal
200
wordsDialogue
97
49.6885%