NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHM5P1TBMWG16H2WQ860K46C
Run Details
Model:
DeepSeek V3 (2024-12-26)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
18.4s
Cost:
$0.000913
Provider:
Novita
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
**INT. ABANDONED WAREHOUSE – NIGHT** Molly tossed a crumpled blueprint onto the rickety table. "Alright, genius, explain this mess again. Slowly." Bill grinned, pointing at the paper. "Easy. We hit the Thompson kid’s birthday party. Loads of presents, zero security. It’s a goldmine." Molly squinted. "You want us to rob a bunch of six-year-olds?" "Technically, we’re robbing their *parents*," Bill corrected. "The kids won’t even notice. They’ll be too busy crying over cake or whatever." Molly rubbed her temples. "And how do we get in? You planning to dress as clowns?" Bill snapped his fingers. "Better. We go as *magicians*." He pulled out two sequined vests. "The Great Billini and his assistant, Molly the Magnificent." Molly stared. "You’re joking." "Dead serious. We dazzle ‘em with card tricks, then *poof*—gifts disappear. Classic misdirection." Molly sighed. "Fine. But if one of those brats asks for a rabbit out of a hat, I’m out." Bill waved her off. "Relax. Worst case, we stuff a kid in there." Molly groaned. "You’re the worst." Bill unfolded the blueprint. "Okay, here’s the plan. Party starts at 3 PM. We show up at 3:15—prime chaos time. Distract the rugrats with cheap tricks while I case the gift table." Molly crossed her arms. "What’s *my* job? Other than looking ridiculous?" "Keep the parents busy. Compliment their terrible parenting. Maybe flirt with a dad or two." Molly rolled her eyes. "I’d rather juggle knives." Bill shrugged. "Knives work too. Just don’t stab anyone." Molly leaned in. "And the exit strategy?" Bill grinned. "Balloon animals." Molly blinked. "What?" "We fill a duffel with loot, tie it to a bunch of balloons, and float it out the window. Then we ‘accidentally’ pop them during our grand finale. Confetti everywhere, kids screaming—total cover." Molly stared. "That’s the dumbest thing I’ve ever heard." Bill winked. "Dumb enough to work." Molly sighed. "Fine. But if we get caught, I’m telling them *you* stole the cake too." Bill gasped. "You wouldn’t." Molly smirked. "Watch me." Bill shook his head. "You’re ruthless." Molly adjusted her sequined vest. "Let’s just get this over with. I’ve got a reputation to ruin." Bill clapped. "That’s the spirit! Operation: Pin the Theft on the Donkey starts now!" Molly groaned. "We are *not* calling it that." Bill ignored her, already practicing his wand twirl. "Abracadabra, suckers."
created_at
"2026-02-16T21:29:01.33628+00:00"
2/16/2026, 9:29:01 PM
model
"deepseek/deepseek-chat-v3"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
860
moderation_latency
(null)
generation_time
18339
tokens_prompt
77
tokens_completion
684
native_tokens_prompt
66
native_tokens_completion
682
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.000913
router
(null)
provider_responses
0
id
"677b80a9bfc6f90e294004817ce16256"
status
200
is_byok
false
latency
860
endpoint_id
"cefb5a1e-9fea-4496-9a64-4e1b11a7cc8c"
provider_name
"Novita"
model_permaslug
"deepseek/deepseek-chat-v3"
api_type
"completions"
id
"gen-1771277322-qNxlpmn8lsgumglWnPrk"
upstream_id
"677b80a9bfc6f90e294004817ce16256"
total_cost
0.000913
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Novita"
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
387
37.6739%
Dialogue to Total Word Ratio
Ratio: 74.42%, Deviation: 4.42%
neededClean
false
wordsTotal
387
wordsDialogue
288
18.8369%