NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-02-25-01KJBG9TD9DA8GRHK8E60921VG
Run Details
Model:
Qwen 3.5 122B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
36.1s
Cost:
$0.015803
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly smoothed the crumpled napkin on the diner table, her eyes scanning the scribbled floor plan with intense focus. “So, the target is the Smith residence. Full house. Saturday afternoon.” Bill stirred his coffee, watching the liquid swirl in the ceramic mug. “What exactly is the payload, Molly? We need specifics before we commit.” “Everything under the tree. Legos, dolls, gift cards, the works. We clear the room completely.” “Are we sure about this? It’s a children’s party. It feels… low.” “It’s not low, Bill. It’s high yield. Kids don’t know the value of things until it’s gone.” “What’s the security situation? I assume there’s a perimeter guard.” “One dog. A golden retriever named Buster. He’s friendly, but he barks at strangers.” “We’ll need a distraction. A treat? A toy? Something to keep him busy.” “A tennis ball. I have a tennis ball in the glovebox. It’s perfect.” “Perfect. A tennis ball is non-lethal and effective. I like it.” “What about the parents? They’re the real threat to the operation.” “They’re at the buffet. They won’t notice until the wrapping paper is gone.” “And the exit strategy? How do we leave without being seen by the neighbors.” “Back window. Open. They’re too busy cutting cake to check the perimeter.” Bill frowned, leaning back in the vinyl booth. “This feels wrong. Stealing from kids is risky.” “It feels profitable. Do you want the Ferrari or do you want the guilt?” “I want the Ferrari. I need the payments. The insurance is due.” “Then stop worrying and start packing. We move fast. We move clean.” Molly slid a black duffel bag across the sticky table. “Inside are the tools. Wire cutters, lock picks, and a slingshot.” “A slingshot? Really? For a heist of this magnitude? That seems excessive.” “For the dog. We don’t hurt him. We just… redirect him to the yard.” “Right. Redirect. Got it. No harm to the dog. That’s the rule.” “Timing is everything. The cake cutting is at three. We strike at three-oh-five.” “Why five minutes after? Why not immediately? Isn’t that risky.” “Everyone is distracted by the sugar rush. It’s the chaos window.” “Brilliant. I love the chaos window. It’s undetectable. No one looks.” “Just don’t take the stuffed bear. It’s too big. We can’t fit it in the getaway van.” “What if the kid cries? That’s a liability. That draws attention.” “Then he cries. We’re professionals, Bill. We don’t get attached to the merchandise.” “Okay. One bear left behind. Noted. That’s a hard rule. No bear.” “And leave the gift cards. Too traceable. Just take the cash. Keep it simple.” “Cash is good. Cash is quiet. Cash is king. I like cash.” “Exactly. Now, do you have the van? Is it ready to roll.” “It’s parked around the corner. Unmarked. Grey. Very grey. No logos.” “Grey is good. Grey is invisible. No one looks at a grey van. Perfect.” “I painted over the bumper stickers myself. It’s blank. Just like a ghost.” “Good man. Alright, any last questions before we roll out.” “What if there’s a piñata? Is that considered loot? Is it worth it.” Molly sighed, rubbing her temples. “We leave the piñata. It’s too messy. Candy is low value.” “Understood. Low value candy. High value toys. Got it. We skip the candy.” “That’s the spirit. We move at noon. Dress code is casual. No masks. Just… subtle.” “Subtle. Like ninjas, but softer. Like accountants. I can be subtle.” “Like ninjas who sell insurance. That’s the vibe. That’s the look.” “I can do that. I look trustworthy. People trust me. I look nice.” “You look like you’d buy a timeshare. Perfect. That’s the look. That’s it.” “Then let’s go get those presents. Let’s make it happen. Let’s go.” “Let’s go get those presents. The Ferrari awaits. The money is ours.”
created_at
"2026-02-25T22:56:47.537Z"
2/25/2026, 10:56:47 PM
model
"qwen/qwen3.5-122b-a10b-20260224"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
473
moderation_latency
(null)
generation_time
35963
tokens_prompt
77
tokens_completion
5193
native_tokens_prompt
76
native_tokens_completion
4929
native_tokens_completion_images
(null)
native_tokens_reasoning
3957
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0158032
router
(null)
provider_responses
0
endpoint_id
"33d466d1-d2f7-45ed-8ff3-fda7d2e32351"
id
"chatcmpl-11e54ab4-451c-92e0-afd1-cad8dbf3f5c5"
is_byok
false
latency
473
model_permaslug
"qwen/qwen3.5-122b-a10b-20260224"
provider_name
"Alibaba"
status
200
api_type
"completions"
id
"gen-1772060207-wYbmkVTnEf8OKDXtiEBA"
upstream_id
"chatcmpl-11e54ab4-451c-92e0-afd1-cad8dbf3f5c5"
total_cost
0.0158032
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
635
0.00%
Dialogue to Total Word Ratio
Ratio: 91.54%, Deviation: 21.54%
neededClean
false
wordsTotal
638
wordsDialogue
584
0.0000%