NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHM5PKD9ZK44R6JGXVC9CBT6
Run Details
Model:
DeepSeek V3.1
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
18.4s
Cost:
$0.001298
Provider:
Fireworks
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The warehouse was, in Bill’s professional opinion, a dump. It smelled of damp concrete and forgotten ambitions. In the center of the cavernous space, Molly had constructed a model of the target using a child’s playskool castle, several action figures, and a disturbingly realistic stuffed pony. “Right,” Molly said, pointing a laser pen at the castle. “This is the Henderson residence. The objective is here, under the ‘My First Christmas’ tree in the living room. Estimated haul: a mountain of poorly wrapped educational toys and at least one ridiculously expensive electronic thing their aunt bought to show up the parents.” Bill stared at the stuffed pony. “Why is the pony looking at me like that? It’s judging my life choices.” “Focus, Bill. The party starts at 2 PM. Noise discipline will be critical. The target environment will be saturated with high-pitched, erratic audio signatures.” “You mean screaming kids?” “I mean sonic chaos,” Molly corrected. “Our window is during the peak sugar-crash phase, approximately 3:45 PM. That’s when the tiny humans become lethargic and the parents are comatose from small talk.” Bill picked up a action figure dressed as a fireman. “So what’s the entry plan? Do we go in as caterers? Clowns? Deeply disappointed relatives?” Molly smirked. “Better. We’re the entertainment.” She held up two garish, multi-colored costumes. “I got us a two-for-one deal on ‘Princess Sparklehoof and her Trusty Sidekick, Barnaby the Bear.’” Bill dropped the fireman. “Sidekick? I’m not a sidekick. I’ve cased five national banks. My nickname is ‘The Lockpick Lorax!’” “And now it’s Barnaby the Bear,” Molly said, her voice dangerously cheerful. “You’ll be handing out balloons and causing distractions. While the little darlings are mesmerized by your mediocre balloon-animal skills, I, as Princess Sparklehoof, will be ‘feeling faint’ and need to lie down on the sofa right next to the presents.” “Balloon animals? Molly, I specialize in thermal lances and frequency jammers. The most complex thing I can make is a snake that looks like a slightly sad sausage.” “Perfect. Their disappointment will be a potent smokescreen.” She tossed him the bear costume. It was covered in what felt like cheap polyester and regret. “You’ll be sweaty and humiliated. It’s the perfect disguise.” Bill pulled on the giant bear head. His voice was muffled. “I can’t see. And it smells like someone else’ panic.” He took the head off. “What about extraction?” “Simple. We load the loot into Barnaby’s ‘gift bag’—which is just a reinforced duffel bag with cartoon ducks on it—and make our excuses. We have a prior engagement with a fictional sick hamster.” Bill sighed, staring at the judgmental pony once more. “A children’s party, Molly? This is a new low. Even for us. We’re scraping the bottom of the criminal barrel.” Molly clapped him on his shoulder. “Bill, look at it this way: it’s a soft target with zero security, the witnesses have unreliable memories, and the worst thing we’ll have to neutralize is a five-year-old armed with a sticky lollipop. It’s a milk run. Now, practice your balloon poodle. It’s showtime.”
created_at
"2026-02-16T21:29:19.513465+00:00"
2/16/2026, 9:29:19 PM
model
"deepseek/deepseek-chat-v3.1"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
498
moderation_latency
(null)
generation_time
18302
tokens_prompt
95
tokens_completion
866
native_tokens_prompt
68
native_tokens_completion
750
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00129808
router
(null)
provider_responses
0
id
"5fe703fb-5e9a-4a9e-b6b7-c6b8e86c4f95"
status
200
is_byok
false
latency
498
endpoint_id
"0c00f0d2-8aad-4700-bd36-11cb9a717fde"
provider_name
"Fireworks"
model_permaslug
"deepseek/deepseek-chat-v3.1"
api_type
"completions"
id
"gen-1771277340-FLmeg3y3QdYh31ot0JY4"
upstream_id
"5fe703fb-5e9a-4a9e-b6b7-c6b8e86c4f95"
total_cost
0.00129808
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Fireworks"
Evaluation details
Result
Evaluator
Details
Meta Data
35.0022%
Matches word count
n/a
neededClean
false
words
518
0.0000%
Dialogue to Total Word Ratio
Ratio: 72.97%, Deviation: 22.97%
neededClean
false
wordsTotal
518
wordsDialogue
378
17.5011%