NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHM5QQBNGAAFDM2JD1SD6Q7E
Run Details
Model:
DeepSeek V3.1
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
13.3s
Cost:
$0.001523
Provider:
Fireworks
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The planning table was a rickety card table, covered in a thin film of dust and the greasy ghosts of past meals. Upon it sat the crown jewel of their next operation: an intricately drawn map of a suburban backyard, rendered in crayon. Bill, a man built like a disappointed fridge, pointed a meaty finger at a squiggle labeled ‘Bouncy Castle.’ “The perimeter’s soft,” he grunted. “No fence. Just that flimsy hedge. But the primary obstacle is the… inflatable sentry. It’s a tactical nightmare.” Molly, perched on a stool and twirling a lockpick disguised as a bobby pin, squinted at the schematic. “It’s a bouncy castle, Bill. The primary obstacle is a five-year-old hyped up on sugar and the belief he’s a T-Rex. Our point of entry is here.” She tapped a corner of the drawing. “The lemonade stand. Distraction and refreshment in one.” “A two-man job,” Bill said, his voice a low rumble. “I create a diversion. Something big. Something that draws every tiny eye.” “What’d you have in mind? A controlled detonation of the piñata?” “Too messy. The shrapnel is unpredictable. I’m thinking I fake an injury. A tragic stumble onto the slip ‘n slide. The pathos will be overwhelming.” Molly stared at him. “You, a two-hundred-and-fifty-pound man, crying on a slip ‘n slide, is not pathos, Bill. It’s a reason for someone to call animal control. No. We go in quiet. We blend.” “Blend? Molly, look at me. My face is a wanted poster that forgot to smile. I don’t ‘blend’ at a unicorn-and-rainbows-themed birthday party.” “Which is why you’ll be on lookout from the treehouse. It offers a clean sightline to the gift table. Your job is to monitor the primary threat.” “The dad with the video camera?” “Worse. The birthday girl’s grandmother. She’s a former librarian. She sees everything. She’s the hawk.” Bill leaned in, the card table groaning in protest. “And the extraction? The payload is bulky. Wrapped in cartoon paper and hope.” “We use their own vehicle against them,” Molly said, a grin spreading across her face. She pointed to a wobbly circle with wheels. “The Little Tikes Cozy Coupe. We load it up. It’s the perfect Trojan horse. Who’s going to question a toddler fleeing the scene of the crime with a car full of loot?” “The engine is foot-powered, Molly. My feet won’t fit.” “Minor logistical hiccup. We’ll pivot. We abseil from the treehouse with the goods, using a rope woven from crepe paper streamers.” Bill sighed, a sound like gravel in a dryer. “I just… I never thought my criminal nadir would involve outsmarting a child named ‘Braxlynn’ to steal a pile of Play-Doh and light-up sneakers.” “The client pays in untraceable cash, Bill,” Molly said, her tone suddenly serious. “The mark is always where the money is. And the most vulnerable marks are the ones who think a ‘No Parents Allowed’ sign on a blanket fort is real security. This isn’t a nadir. It’s a masterpiece.” Bill studied the crayon map again, a frown deepening on his face. “One more thing. The dog.” “The Shih Tzu? ‘Mr. Waggles’?” “His yap could sink the whole operation. We need a muzzle.” Molly patted his arm. “We’ll bribe him with cake. Everyone has a price, Bill. Even Mr. Waggles.” She rolled up the crayon blueprint with the gravity of a general. “Now, let’s go over the timings again. The target blows out the candles at 3:15. That’s when the chaos peaks. That’s our window.” “Roger that,” Bill muttered. “I’ll get my party hat.”
created_at
"2026-02-16T21:29:51.422883+00:00"
2/16/2026, 9:29:51 PM
model
"deepseek/deepseek-chat-v3.1"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
390
moderation_latency
(null)
generation_time
13208
tokens_prompt
95
tokens_completion
962
native_tokens_prompt
68
native_tokens_completion
895
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
67
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00152292
router
(null)
provider_responses
0
id
"d964d71d-ef05-4d08-bed7-99658b294691"
status
200
is_byok
false
latency
390
endpoint_id
"0c00f0d2-8aad-4700-bd36-11cb9a717fde"
provider_name
"Fireworks"
model_permaslug
"deepseek/deepseek-chat-v3.1"
api_type
"completions"
id
"gen-1771277376-ixOY2GqWvziMHZpZRxn8"
upstream_id
"d964d71d-ef05-4d08-bed7-99658b294691"
total_cost
0.00152292
cache_discount
0.00001876
upstream_inference_cost
0
provider_name
"Fireworks"
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
604
0.0000%
Dialogue to Total Word Ratio
Ratio: 72.35%, Deviation: 22.35%
neededClean
false
wordsTotal
604
wordsDialogue
437
0.0000%