NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ774TQ593YZSGW4SX1H1M5N
Run Details
Model:
Qwen3.6 Max Preview
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
2m 36s
Cost:
$0.041829
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
98.71%
Matches word count
(details)
99.99%
Dialogue to Total Word Ratio
(details)
99.3521%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The laminate diner table held their entire operation, which currently consisted of three crayon sketches and a half eaten basket of fries. Molly smoothed the wrinkled paper with her palm and circled a lopsided rectangle using a black ballpoint pen. She looked up with the grim determination of a general surveying a battlefield. The fluorescent lights buzzed overhead, casting long shadows across the vinyl seats. "Target acquired. The bouncy castle sits here, acting as a natural windbreak and primary distraction. We breach during the musical chairs elimination round. Chaos peaks at approximately four twelve." Bill leaned forward, squinting at the wax markings through his reading glasses. He traced a yellow spiral with his thumbnail and frowned. "That is a drawing of a sun wearing sunglasses, Molly." "It is a tactical overview. Focus. The gift table is positioned near the snack station. High value items wrapped in dinosaur paper and cartoon trucks. We need two duffel bags and a plausible cover story." "I was thinking we dress as clowns." Molly pinched the bridge of her nose and exhaled slowly. She pushed the basket of fries toward him as a peace offering. "Absolutely not. Clowns draw attention. We go as exhausted parents. It is the perfect camouflage. Nobody questions a tired adult holding a tote bag near a buffet. We blend, we sweep, we vanish before the cake cutting." "What about the piƱata?" "Secondary objective. Candy haul splits sixty forty. My lower back still aches from the retirement home job." "Fine. But how do we handle the entertainment? Last time a magician spotted me palming a watch, I got lectured about honesty for twenty minutes." "We neutralize him by requesting balloon swords. Keeps his hands occupied. You work the left flank of the present table. I take the right. We use a juice box spill as a diversion. Classic misdirection." "Who spills the juice?" "The toddler in the triceratops costume. Unpredictable movement patterns. Highly volatile. We position a full cup near his trajectory and let physics do the heavy lifting." "This feels overly complicated for board games and plush toys." "Do not underestimate the loot. That wrapped fortress contains limited edition building sets and remote control vehicles. The fence on Elm Street pays top dollar for unboxed merchandise." "What if they hired off duty security?" "Then we deploy the ultimate countermeasure. A signed permission slip and gluten free cupcakes. Authority crumbles when faced with paperwork and dietary restrictions. Synchronize watches. We move at four oh five. Check the perimeter for balloon animals. They pop loud and ruin stealth approaches." "Noted. I will bring earplugs and a spare tote bag." Bill folded the schematic into his breast pocket and adjusted his collar. He offered a weak smile. "I still think the clown idea had merit." Molly slid out of the vinyl booth and dropped exact change on the counter. She buttoned her coat against the afternoon chill. "Remind me to update your personnel file under active liabilities."
created_at
"2026-04-27T10:18:18.219Z"
4/27/2026, 10:18:18 AM
model
"qwen/qwen3.6-max-preview-20260420"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1215
moderation_latency
(null)
generation_time
156076
tokens_prompt
77
tokens_completion
5999
native_tokens_prompt
76
native_tokens_completion
5350
native_tokens_completion_images
(null)
native_tokens_reasoning
4711
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0418288
router
(null)
provider_responses
0
endpoint_id
"ca7fc0c7-f23d-4ac5-b2e0-945be5b61787"
id
"chatcmpl-e3530dd0-2196-950f-8491-0f4ae1cd26af"
is_byok
false
latency
1214
model_permaslug
"qwen/qwen3.6-max-preview-20260420"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777285098-CAhN3qGgv1irr6YtaWu5"
session_id
(null)
api_type
"completions"
id
"gen-1777285098-zWTW4f28crWxbc55OYR4"
upstream_id
"chatcmpl-e3530dd0-2196-950f-8491-0f4ae1cd26af"
total_cost
0.0418288
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
98.71%
Matches word count
n/a
neededClean
false
words
494
99.99%
Dialogue to Total Word Ratio
Ratio: 70.04%, Deviation: 0.04%
neededClean
false
wordsTotal
494
wordsDialogue
346
99.3521%