NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHM5QWPQMZ9NG1V361DRPVP3
Run Details
Model:
DeepSeek V3.1
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
12.1s
Cost:
$0.001340
Provider:
Fireworks
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The air in the cramped van was thick with the smell of old coffee and desperation. Molly, her hair tied back in a severe ponytail, stabbed a finger at a hand-drawn map spread across the gearshift console. Bill, a man whose brow seemed permanently furrowed in confusion, squinted at the crayon lines. “Okay,” Molly said, her voice a low, tactical rasp. “The target is a bouncy castle. The mark—a seven-year-old named Timothy—will be distracted by the cake at approximately 14:30 hours. That’s our window.” Bill scratched his chin. “A bouncy castle, huh? Extraction could be problematic. Those things are wobbly. What’s the security detail?” “Two parents, heavily caffeinated but complacent. A grandmother who may or may not be napping. The primary physical security is a cavalcade of toddlers, highly unpredictable and prone to loud, emotional outbursts.” “So a standard soft-target op,” Bill nodded sagely. “What’s the exfil route?” Molly pointed a grease-stained finger. “We go in through the side gate, masked as the party clowns they regrettably hired online. We secure the loot from the gift table, stash it in the giant, comically oversized pockets of our clown suits, and exit via the bouncy castle, using the chaotic energy of the jumpers for cover.” Bill was silent for a long moment, staring at the map as if it held the secrets of Fort Knox. “The bouncy castle is a bold choice. High risk, high reward. But what’s the contingency for a juice-box spill? That’s a slip-and-fall hazard I’m not willing to accept without hazard pay.” “The hazard pay is the entire haul of Lego sets and action figures, Bill!” Molly hissed. “We’re not negotiating a union contract here!” “I’m just saying, my lower back isn’t what it used to be. All that unregulated bouncing.” He leaned closer, lowering his voice. “And what about the clown identities? Do we have backstories? What are our clown names?” “I’m ‘Sparkles,’ you’re ‘Tootles.’” Bill’s face fell. “Tootles? Why do you get to be Sparkles? Sparkles has main character energy. Tootles sounds like a clown who’s given up on his dreams.” “Fine! You can be Sparkles! I don’t care!” Molly snapped, massaging her temples. “The point is the presents. The shiny, unopened presents. We’re in and out before anyone misses the third-tier superhero wrapping paper.” “Do we have a gadget guy for this?” Bill asked, suddenly hopeful. “Maybe someone who can rig the pinata to explode as a distraction?” “We *are* the gadget guys, Bill! Our gadget is a big net and a lot of hope!” She took a deep, steadying breath. “Look. The plan is simple. We show up, we look cheerful and non-threatening, we stuff a plastic tea set and a few dozen squishy toys into our voluminous trousers, and we leave. It’s a milk run.” Bill frowned, his painted-on clown smile looking increasingly sinister. “I don’t like it. The variables are all over the place. What if someone wants a balloon animal? My giraffe is shoddy at best. It looks… depressive.” Molly slowly put her head down on the cool metal of the dashboard. “Just be ready to bounce at 14:30,” she mumbled into the vinyl. “And for the love of all that is holy, try to look like you’re having fun.”
created_at
"2026-02-16T21:29:55.196969+00:00"
2/16/2026, 9:29:55 PM
model
"deepseek/deepseek-chat-v3.1"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
671
moderation_latency
(null)
generation_time
12049
tokens_prompt
95
tokens_completion
872
native_tokens_prompt
68
native_tokens_completion
786
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
67
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0013398
router
(null)
provider_responses
0
id
"f8e7f528-5550-4eb7-8313-357826cc6a7d"
status
200
is_byok
false
latency
671
endpoint_id
"0c00f0d2-8aad-4700-bd36-11cb9a717fde"
provider_name
"Fireworks"
model_permaslug
"deepseek/deepseek-chat-v3.1"
api_type
"completions"
id
"gen-1771277382-47phQY8xGLrkNKuwJCiv"
upstream_id
"f8e7f528-5550-4eb7-8313-357826cc6a7d"
total_cost
0.0013398
cache_discount
0.00001876
upstream_inference_cost
0
provider_name
"Fireworks"
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
553
0.0000%
Dialogue to Total Word Ratio
Ratio: 73.24%, Deviation: 23.24%
neededClean
false
wordsTotal
553
wordsDialogue
405
0.0000%