NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-25-01KMJ91CZWXJMXJNRTMMJPM9EN
Run Details
Model:
Qwen 3 32B
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
34.8s
Cost:
$0.000594
Provider:
Novita
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
74.49%
Dialogue to Total Word Ratio
(details)
37.2436%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly leaned over the table, her eyes narrowing at the blueprint of the party barn. “The presents are stored in the hayloft. Guarded by a twinkle-lighted mannequin wearing a unicorn head.” Bill adjusted his sunglasses, which were way too big for his face. “So we just, what? Scale the wall of balloons, dodge the cake shooter, and outwit the balloon artist who’s definitely been using performance-enhancing glue?” “We *could*,” Molly said, “or we could use the element of surprise. You’re the one who suggested dressing as clowns.” “I said *juggling* clowns! Not *disguised* clowns! There’s a difference!” “Sure there is. And I’m difference.” She spun a switchblade into her palm. “We’ll wear party worker badges. You’ll man the ‘balloon toss’ booth. I’ll sabotage the motion sensors with a distraction.” “What’s the distraction?” “The glitter cannon.” She pointed to the DJ’s booth. “It goes off every time the birthday kid yells ‘Surprise!’ You’ll be ‘surprised’ into the hayloft. Classic.” Bill winced. “What if the kid yells ‘Surprise!’ *before* we’re ready? We’ll be showered in glitter, which is basically spiritual ecstasy for a five-year-old but a nightmare for me. I can’t breathe in that stuff.” “Then don’t breathe. Literally. You’ll be wearing a gas mask. Duh.” “A gas mask! At a children’s party! You want me to blend in, not start a riot!” Molly tossed him a whoopee cushion. “It’s a *non-lethal* prank gas. Just makes you sparkle. Now, practice your alibi: ‘I’m not a clown, I’m a clown *scientist*.’” “Clown scientist? That’s not a job.” “It is now. And if anyone asks, we both have a thing for ‘parties.’” She grinned. “Face it, Bill. This is the easiest heist since we liberally ‘adjusted’ the Robin Hood look-a-like contest.” Bill stared at the glitter cannon. “What if the unicorn mannequin comes to life? I’ve seen those things. They’re haunted by consumerism.” “Then you’ll haunten’t it with *fear*.” She patted his shoulder. “Ready?” “Ready to regret every life choice that led me here?” “Same. Now, let’s go. The balloon artist’s already tweeting about us.” “Great. My nemesis is 250 characters and a sparkler away.”
created_at
"2026-03-25T10:35:56.288Z"
3/25/2026, 10:35:56 AM
model
"qwen/qwen3-32b-04-28"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
996
moderation_latency
(null)
generation_time
34757
tokens_prompt
93
tokens_completion
1513
native_tokens_prompt
73
native_tokens_completion
1304
native_tokens_completion_images
(null)
native_tokens_reasoning
875
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0005941
router
(null)
provider_responses
0
endpoint_id
"d342f1c5-34b5-4a5c-b2ea-a0cdcbf72e70"
id
"de8e2ab9-48aa-4a24-9da2-c95fb60305e9"
is_byok
false
latency
840
model_permaslug
"qwen/qwen3-32b-04-28"
provider_name
"Novita"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1774434956-RnnyLm4ivhGWNySBcrxT"
upstream_id
"de8e2ab9-48aa-4a24-9da2-c95fb60305e9"
total_cost
0.0005941
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Novita"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
351
74.49%
Dialogue to Total Word Ratio
Ratio: 82.63%, Deviation: 7.37%
neededClean
false
wordsTotal
357
wordsDialogue
295
37.2436%