NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM089G83MBT594QKZ7TSC735
Run Details
Model:
Mistral Small 4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
8.3s
Cost:
$0.000703
Provider:
Mistral
Evaluation summary
Score
Evaluator
Details Link
9.61%
Matches word count
(details)
98.33%
Dialogue to Total Word Ratio
(details)
53.9680%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill crouched behind a bush, peering at the McAllister's house. Strings of blinking lights crisscrossed the yard, and a sign read "Billy's 7th Birthday Bash." "So, the plan is solid?" Bill whispered, adjusting his ski mask which kept slipping over his eyes. Molly nodded, unfolding a blueprint of the house. "We go in through the kitchen window. It’s old, so it’ll pop right open—" Bill interrupted, "With what? A hairdryer?" She shot him a look. "No, Bill. With this." Molly pulled out a credit card and wiggled it. Bill blinked. "That’s not even bent." "It’s magnetic," Molly insisted. "Like how I pick up weird vibes from your dating choices." Bill groaned. "This isn’t going to work." "Sure it will," Molly said, slapping his shoulder. "We’ll be in and out before the cake’s cut. And we don’t even have to worry about being seen—everyone’s attention will be on the clown." Bill’s face paled. "There’s a clown?" Molly nodded cheerfully. "Yep. Randy the Ridiculous. Sounds like a serial killer." Bill’s mask slipped completely over his face. He gasped for air as Molly peeled it off. "We’re doomed," he wheezed. Molly patted his back. "Nah, we’ve got this. Besides, worst case, we pretend we’re costumed party guests." Bill’s eyes widened. "We didn’t bring costumes." Molly grinned. "Then we’re the world’s worst ninjas."
created_at
"2026-03-18T10:36:33.426Z"
3/18/2026, 10:36:33 AM
model
"mistralai/mistral-small-2603"
1/1/2603, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
356
moderation_latency
(null)
generation_time
8231
tokens_prompt
77
tokens_completion
1244
native_tokens_prompt
81
native_tokens_completion
1152
native_tokens_completion_images
(null)
native_tokens_reasoning
870
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00070335
router
(null)
provider_responses
0
endpoint_id
"f5ed6283-4e17-457d-98cd-2ec0c318171d"
id
"1863f3be56eb4fa6ba19bd8eb07e6d89"
is_byok
false
latency
347
model_permaslug
"mistralai/mistral-small-2603"
1/1/2603, 12:00:00 AM
provider_name
"Mistral"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830193-vBdOk99aU97fnkiQIjJx"
upstream_id
"1863f3be56eb4fa6ba19bd8eb07e6d89"
total_cost
0.00070335
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Mistral"
Evaluation details
Result
Evaluator
Details
Meta Data
9.61%
Matches word count
n/a
neededClean
false
words
222
98.33%
Dialogue to Total Word Ratio
Ratio: 53.60%, Deviation: 3.60%
neededClean
false
wordsTotal
222
wordsDialogue
119
53.9680%