NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-02-25-01KJBH684MWTD8FEPXMDVQK8C3
Run Details
Model:
Qwen 3.5 122B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
56.0s
Cost:
$0.022766
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
97.63%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
48.8138%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted the balaclava in the rearview mirror. Bill counted plastic swords on the seat. “You realize this is morally gray, right?” Molly asked. “Gray is a spectrum, Mol. We’re more like charcoal.” “Charcoal? That’s not a color.” “It is in the criminal underworld. Look, the parents bought these things with tax-free money. We’re just redistributing wealth.” “To whom? Us?” “To the resale market. It’s a service. Think of us as gift logistics.” Bill tossed a sword. It clattered against a crate of glow sticks. “What about the kids? They’re innocent.” “They’ll get new ones next Christmas. It’s a cycle. A gift economy.” “That’s terrible.” “It’s efficient. Now, the plan. I go in as a clown. You go in as a magician.” “I hate magicians. They’re liars.” “Exactly. You’re perfect. I’ll juggle the security system while you make the gifts disappear.” “I can’t juggle.” “You can juggle stress balls. That counts. What’s the exit strategy?” “Back door. Through the kitchen. The mom is making punch.” “Punch is a distraction. Alcohol in the system, parents get woozy.” “There’s no alcohol. It’s apple juice.” “Then we’re doomed. Sugar rush means they’re hyper. Hyper kids are like ninja security guards.” Molly sighed. “We need a distraction. Something big.” “I brought a piñata.” “You brought a piñata?” “It’s full of fireworks. We light it, chaos ensues, we grab the loot, we vanish.” “That’s arson.” “It’s a controlled burn. Look, Bill, do you want the cash or not? The presents are worth four grand.” “Four grand in LEGO?” “LEGO is currency. It never depreciates. Unlike stocks. Or dignity.” “Okay. But if a kid cries, I’m leaving. I have a soft spot.” “Fair enough. If a kid cries, I’ll distract them with a balloon animal.” “You can’t blow up balloons.” “I can fake it. Just look sad and hold the rubber. They’ll think it’s art.” Bill nodded slowly. He picked up a red sword. “One last thing. No stealing the cake.” “Why not? It looks delicious.” “It’s sticky. Ruins the van upholstery. I just had it detailed.” “Fine. No cake. Just the toys.” “And the cash under the tree. The envelope.” “Obviously. The envelope is the real prize.” “Do you have the suit?” “Yes. It’s itchy.” “Good. Pain keeps you focused.” “I hate pain.” “Then steal faster.” “What about the Wii?” “Leave it. Too heavy.” “The iPad?” “Pocketable. Take it.” “What about the baby monitor?” “Leave it. We don’t need surveillance.” Molly started the engine. The van rumbled to life. “Ready to be the worst adults ever?” “Born ready. But if a kid asks for my autograph, I’m bailing.” “Noted. Let’s go steal some childhoods.” “What if they have a pet?” “Then we leave the pet. We aren’t monsters.” “What if the dog barks?” “I have treats. Beef jerky.” “That’s bribery.” “It’s diplomacy.” “Fine. But if the dog bites, you’re paying for the vet.” “Deal. Now shut up and drive.” “I’m driving. You’re the clown.” “Right. Clown. Got it.”
created_at
"2026-02-25T23:12:19.1Z"
2/25/2026, 11:12:19 PM
model
"qwen/qwen3.5-122b-a10b-20260224"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
456
moderation_latency
(null)
generation_time
55866
tokens_prompt
77
tokens_completion
7460
native_tokens_prompt
76
native_tokens_completion
7105
native_tokens_completion_images
(null)
native_tokens_reasoning
6303
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0227664
router
(null)
provider_responses
0
endpoint_id
"33d466d1-d2f7-45ed-8ff3-fda7d2e32351"
id
"chatcmpl-4734b4ea-ef7c-91e1-835b-3709bc8e53bc"
is_byok
false
latency
456
model_permaslug
"qwen/qwen3.5-122b-a10b-20260224"
provider_name
"Alibaba"
status
200
api_type
"completions"
id
"gen-1772061139-b4cPPa2PUNh3NZY6FWXq"
upstream_id
"chatcmpl-4734b4ea-ef7c-91e1-835b-3709bc8e53bc"
total_cost
0.0227664
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
Evaluation details
Result
Evaluator
Details
Meta Data
97.63%
Matches word count
n/a
neededClean
false
words
493
0.00%
Dialogue to Total Word Ratio
Ratio: 90.08%, Deviation: 20.08%
neededClean
false
wordsTotal
494
wordsDialogue
445
48.8138%