NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0A2152YVS2AXS20K6GQ23G
Run Details
Model:
Mistral Small 4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
24.1s
Cost:
$0.002356
Provider:
Mistral
Evaluation summary
Score
Evaluator
Details Link
95.99%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
97.9931%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her bright red nose and surveyed the scene from behind her clown mask. The party was in full swing, with kids running around and parents sipping lukewarm coffee. Bill, meanwhile, adjusted his oversized polka-dot wig, causing the fake security badge pinned to his chest to tilt dangerously to one side. He scratched at his collar and groaned. "You're sure this is gonna work?" The badge slipped further to the side. "Last time we tried this, I ended up in a bouncy castle. A bouncy castle, Molly." She snatched the badge and re-pinned it firmly. "This time will be different. We're clowns—not exactly a profile the cops keep on file at the station." Bill peeked over his shoulder at a group of face-painted toddlers. "They don't look stupid, Molly. Kids are sharp." Molly grabbed his shoulders and locked eyes. "Bill, we're not actually stealing from the kids. We're redistributing wealth from spoiled brats to ourselves. It's practically Robin Hood." Bill blinked. "Robin Hood had a horse and a bow, not a squirting flower and a booger spray." Molly sighed and checked her fake red nose for wobbliness. "Let's just go before I decide to dump you in the clown car with the rest of the trash."
created_at
"2026-03-18T11:07:25.744Z"
3/18/2026, 11:07:25 AM
model
"mistralai/mistral-small-2603"
1/1/2603, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
363
moderation_latency
(null)
generation_time
24036
tokens_prompt
77
tokens_completion
3929
native_tokens_prompt
81
native_tokens_completion
3910
native_tokens_completion_images
(null)
native_tokens_reasoning
3618
native_tokens_cached
16
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00235599
router
(null)
provider_responses
0
endpoint_id
"f5ed6283-4e17-457d-98cd-2ec0c318171d"
id
"d1e1783ad4e9407fafc30f29aed5ce9e"
is_byok
false
latency
352
model_permaslug
"mistralai/mistral-small-2603"
1/1/2603, 12:00:00 AM
provider_name
"Mistral"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832045-A1KiMnrlQhtCdtjyofHg"
upstream_id
"d1e1783ad4e9407fafc30f29aed5ce9e"
total_cost
0.00235599
cache_discount
0.00000216
upstream_inference_cost
0
provider_name
"Mistral"
Evaluation details
Result
Evaluator
Details
Meta Data
95.99%
Matches word count
n/a
neededClean
false
words
208
100.00%
Dialogue to Total Word Ratio
Ratio: 49.52%, Deviation: 0.48%
neededClean
false
wordsTotal
210
wordsDialogue
104
97.9931%