NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHK2WACHHXSZKGPWS271J0DB
Run Details
Model:
Qwen 3.5 397B A17B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
4m 4s
Cost:
$0.030714
Provider:
Alibaba
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly peered through the high-powered binoculars, her breath fogging the cold lenses slightly as she scanned the perimeter. The suburban house glowed with warm, deceptive light against the dark night sky. Target acquired, she whispered, her voice tight with professional anticipation and greed. Bill adjusted his itchy beanie and sighed heavily into the cold air. It's a bounce house, Mol. Not Fort Knox. It's a vault of unsecured assets. Look at those wrapping papers. The sheer volume is staggering. Those are for five-year-olds. They smell like plastic and despair. High demand market. Resale value is astronomical on the secondary market. Who buys used LEGO sets from strangers? Collectors. Desperate parents. The black market is vast and hungry. I feel like a monster sitting here. You feel like an amateur. Focus on the objective. There is a magician inside performing tricks. Single opponent. Likely unarmed and relying on sleight of hand. He has doves in a cage. Birds are unpredictable. Pack noise cancellers in your ear. What about the parents drinking wine? Sedated by chardonnay. They won't wake until morning light. And the kids running around? Sugar crash imminent. They will be unconscious within the hour. This is morally wrong on so many levels. This is work. Did you bring the sacks for the loot? I brought reusable tote bags from the grocery store. Tote bags? Are you serious right now? Canvas. Eco-friendly. We need to reduce our footprint. We are stealing birthday gifts, not organic kale. I'm trying to balance my karma while committing felonies. Focus on the profit margin instead of your soul. The profit margin involves crying children in pajamas. Collateral damage. Necessary evil for the greater good. I can hear them singing Happy Birthday inside. Happy Birthday. The cue for extraction is imminent. Extraction sounds like kidnapping to my ears. Semantics. We take the goods. We leave the kids. What about the giant piƱata hanging there? Secondary objective. High candy yield for personal consumption. I want the candy mostly. Personal loot is permitted after the primary haul is secure. Okay. But I'm not touching the pony in the yard. There is a pony involved in this job? In the backyard. Small. Very judgemental eyes. Leave the animal. It complicates the fence operation. How do you even fence a pony legally? You don't. You leave it strictly alone. Fine. What's the exit strategy for us? The slide. High velocity exit vector. I'm too old for the playground slide. Then use the front door. Run like the wind. Past the clown making animals? He's inflating a balloon. He won't notice a thing. His shoes are huge and squeaky. Slow movement. Big target. We are ghosts in the night. I'm wearing squeaky sneakers unfortunately. Why would you wear those shoes? Comfort is key for long stakeouts. Wrap them in tape. Immediately before we move. This plan is falling apart completely. It is evolving. Adaptability is key to success. The mom is looking out the window directly. Freeze. Act natural like you belong. I'm hiding in a bush secretly. Then hide better behind the leaves. I'm rustling the dry leaves. Stop rustling immediately. They know we are here watching. They think it's a stray cat. It's very dark outside. Cats are nocturnal hunters. I'm going home to bed. Think of the retirement fund building up. I want to retire with a clear conscience. Consciences don't pay rent bills. Maybe where you live they do. Grab the bags. We move on the cake cutting. I'm keeping the cake for myself. Take the cake. Just get the LEGO sets. Deal. But if I see a clown, I'm running. Agreed. No heroics today. Let's do this thing. Wait. Did you bring the duct tape? No. I forgot it. Great. Just great. We are doomed.
created_at
"2026-02-16T11:24:31.838475+00:00"
2/16/2026, 11:24:31 AM
model
"qwen/qwen3.5-397b-a17b-20260216"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2123
moderation_latency
(null)
generation_time
237761
tokens_prompt
77
tokens_completion
8817
native_tokens_prompt
76
native_tokens_completion
8519
native_tokens_completion_images
(null)
native_tokens_reasoning
7605
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.030714
router
(null)
provider_responses
0
id
"chatcmpl-26a85acb-5a71-9c54-88a8-a5a46e390610"
status
200
is_byok
false
latency
2123
endpoint_id
"180484a2-e24b-495a-ba41-b34b680d5a9f"
provider_name
"Alibaba"
model_permaslug
"qwen/qwen3.5-397b-a17b-20260216"
api_type
"completions"
id
"gen-1771240833-ptNEWLl8Z37We5fp5oZ5"
upstream_id
"chatcmpl-26a85acb-5a71-9c54-88a8-a5a46e390610"
total_cost
0.030714
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
634
0.0000%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 70.00%
neededClean
false
wordsTotal
634
wordsDialogue
0
0.0000%