NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0ATWD0516N2E3CKR4SBF8P
Run Details
Model:
Mistral Small 4
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
7.4s
Cost:
$0.000443
Provider:
Mistral
Evaluation summary
Score
Evaluator
Details Link
60.28%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
30.1376%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted the red clown nose she’d borrowed from a gag shop—clean bill of sale, of course—while eyeing the bouncy castle where six-year-olds in face paint were kicking each other over sugary punch. She turned to Bill, who was currently wrapping himself in a dollar-store pirate costume like a possum tangled in tinsel. "Blue’s not the color of panic," Molly said, tugging the fake mustache off his face before he snapped it in half. "It’s the color of caution tape, and right now we’re standing in the middle of it." Bill spat out a thread from the sleeve of his borrowed cape. "I’m blending. Kids see pirates, they see fun. You see pirates, you see ‘shop Classroom of the Elite Season 2 spoilers.’ Where’s the blend?" Molly grabbed his wrist and yanked him behind a piñata shaped like a unicorn that was leaking glitter onto three traumatized party guests. "Blending would be if we weren’t here at all. Instead, we’re in highlighter-orange cone shoes under strobe lights shaped like lollipops. The cake alone costs more than our last heist." Bill peeked over the unicorn’s papier-mâché hump. "So what’s the move, genie?" Molly sighed. "The move is we wait for the grand finale where they all do the birthday freeze dance. You ever try freezing mid-dance when you’ve got three cups of fruit punch and a sugar rush? They don’t just freeze—they collapse. We sashay past the presents like it’s Black Friday at Target." Bill rubbed his temples. "Freeze dance. Right. Because nothing says stealth like everyone singing along to Baby Shark while a guy in a clown nose and a guy dressed like a pirate toddle off with two hundred dollars’ worth of Paw Patrol figurines." Molly nudged him with her elbow. "Paw Patrol hadn’t even launched when this party was planned. They’re still using My Little Pony. Half these kids will mistake a German Shepherd for a threat. We’re golden." Bill stared at the mountain of gifts under that sad, deflating bouncy castle. "What if one kid doesn’t freeze? What if one kid points and goes ‘MOM, SUSPICIOUS CLOWN’?" Molly pulled two plastic safes from the front pocket of her inflatable banana costume. "Then we run like hell and hope the allergy parents mistake our backpacks for epi-pen cases hanging off a doorknob. Also, I stole these two safes out of a hotel lobby. They’ve got fake combination locks—perfect for toddler hands that can’t remember their own shoe size." Bill examined the safes, chewing his lip. "We split the take, yeah?" Molly grinned. "Split the booty like it’s Blackbeard’s treasure map. Half your share buys you actual pirate pants instead of another belt that looks like license plates from a lawn mower." Bill gave a slow nod. "Fine. But when we hit the presents, if I grab the Elsa doll, I’m keeping her." "Deal. But if she flies off the shelf mid-job, you’re holding the bag too long and we both get caught." Bill cracked his knuckles. "Then I’ll hug her tight. Commitment faces execution better when hugging frozen assets."
created_at
"2026-03-18T11:21:00.076Z"
3/18/2026, 11:21:00 AM
model
"mistralai/mistral-small-2603"
1/1/2603, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
349
moderation_latency
(null)
generation_time
7301
tokens_prompt
77
tokens_completion
797
native_tokens_prompt
81
native_tokens_completion
721
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
16
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00044259
router
(null)
provider_responses
0
endpoint_id
"f5ed6283-4e17-457d-98cd-2ec0c318171d"
id
"ab5699c89b2c44f6bebc8bc00850ad5b"
is_byok
false
latency
349
model_permaslug
"mistralai/mistral-small-2603"
1/1/2603, 12:00:00 AM
provider_name
"Mistral"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832860-l8YVfScfDggQ1Is61QJa"
upstream_id
"ab5699c89b2c44f6bebc8bc00850ad5b"
total_cost
0.00044259
cache_discount
0.00000216
upstream_inference_cost
0
provider_name
"Mistral"
Evaluation details
Result
Evaluator
Details
Meta Data
60.28%
Matches word count
n/a
neededClean
false
words
515
0.00%
Dialogue to Total Word Ratio
Ratio: 66.92%, Deviation: 16.92%
neededClean
false
wordsTotal
523
wordsDialogue
350
30.1376%