NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-25-01KQ2HS8V66H1GM84HA6XTZ343
Run Details
Model:
DeepSeek V4 Pro (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
31.7s
Cost:
$0.000913
Provider:
DeepSeek
Evaluation summary
Score
Evaluator
Details Link
0.08%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0424%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the blueprint across the dashboard, her elbow nudging the glove compartment open and sending a cascade of fast-food napkins into Bill’s lap. Through the windscreen, the target house glowed with primary-coloured bunting and the occasional shriek of an overdressed five-year-old. “Right, the bouncy castle is the key,” she said, tapping a pink rectangle sketched in glitter pen. “It creates a sound baffle. We enter via the side gate during the piñata. That’s when the parental supervision collapses into a ring of camera phones.” Bill squinted at the diagram, then at the inflatable monstrosity lurching on the lawn like a deranged lung. “You’ve drawn a unicorn vomiting a safe-cracking sequence. I’m not saying it’s not art, Molly, but I need specifics. What’s the guest-to-clown ratio?” “Three-to-one, leaning heavily toward the sugar-crazed. There’s a professional fairy-ballerina scheduled for four o’clock, and a man called Uncle Derek who’s been on the prosecco since noon and thinks he can do magic. The presents are stacked in the conservatory. Thirty-seven boxes of varying sizes, wrapped in paper that probably costs more per square inch than our last take. We’ll be in and out before Derek finds the second coin behind someone’s ear.” Bill retrieved a half-eaten lollipop from the cup holder, unwrapped it, and stuck it in his mouth for dramatic effect. It was peach-flavoured, and clashed terribly with his cynical outlook. “And our costumes? I’m not wearing the squirrel suit again. The tail has a mind of its own, and last time I got trapped in a revolving door for eleven minutes.” Molly grinned, pulling two laminated badges from her bag. They read ‘Precious Moments Party Logistics’ above a stock photo of a smiling man in a cap. “We’re dialling it back. Utilitarian chic. We’re the lost-and-found coordination team. I’ve got a clipboard, you’ve got a high-vis vest. Our cover is that we’re collecting the gifts for ‘safe post-party transport’ at the request of the birthday boy’s mother, who is too frazzled to remember if she actually hired us.” “So we’re just walking in and taking them.” “I’ve also arranged a diversion. I’ve paid a local teen to release a box of hamsters into the pass-the-parcel circle. In the ensuing pandemonium, we load the haul into a branded van I’ve parked two streets over. It’s got a magnetic sign that says ‘Bouncy Castle Deflation Services’. Nobody questions a man in a high-vis vest carrying a large cardboard box during a rodent-based emergency.” Bill chewed the lollipop stem, thinking. “What about the piñata? If Derek the magician misses and clocks a child, the whole timeline shifts.” “Exactly why you’re on backup piñata duty. I have a spare candy-filled dinosaur in the trunk. If the mood sours, you step in and scream ‘Surprise bonus piñata!’ and we buy ourselves another ninety seconds of anarchy.” She took the lollipop out of his mouth and pointed it at the house. “The real beauty? No alarms. They’ve left the conservatory windows open to let out the smell of fish fingers. It’s a moral outrage how easy this is.” Bill straightened his imaginary tie. “Fine. But I’m not deflating the actual bouncy castle. I have my limits.”
created_at
"2026-04-25T14:48:01.642Z"
4/25/2026, 2:48:01 PM
model
"deepseek/deepseek-v4-pro-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
761
moderation_latency
(null)
generation_time
31686
tokens_prompt
77
tokens_completion
1136
native_tokens_prompt
67
native_tokens_completion
1016
native_tokens_completion_images
(null)
native_tokens_reasoning
244
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.000913065
router
(null)
provider_responses
0
endpoint_id
"d084170b-dd99-4e53-9ae1-b63881f3cfbf"
id
"92142c49-c593-4ce4-a3ba-75ec53cc6f2f"
is_byok
false
latency
267
model_permaslug
"deepseek/deepseek-v4-pro-20260423"
provider_name
"DeepSeek"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777128481-T6MCxJfq2KWBc7P2Lfyd"
session_id
(null)
api_type
"completions"
id
"gen-1777128481-hxc31aiOZ6hSLvSfLEe8"
upstream_id
"92142c49-c593-4ce4-a3ba-75ec53cc6f2f"
total_cost
0.000913065
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepSeek"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.08%
Matches word count
n/a
neededClean
false
words
529
0.00%
Dialogue to Total Word Ratio
Ratio: 74.18%, Deviation: 44.18%
neededClean
false
wordsTotal
550
wordsDialogue
408
0.0424%