NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ780CDYEF5WB825Q4JX2FK1
Run Details
Model:
Qwen 3.6 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
59.0s
Cost:
$0.018031
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
0.21%
Matches word count
(details)
37.67%
Dialogue to Total Word Ratio
(details)
18.9440%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the rhododendron bush, adjusting the ghillie suit she'd frantically knitted from recycled birthday streamers and sticky tape. The texture was abrasive, a nauseating clash of neon pink and electric blue, but it served the camouflage against the suburban backdrop. She checked her watch, then turned to Bill, who was currently wrestling with a rubber chicken beak mask that kept slipping down his nose. Bill was sweating through his utility belt, a collection of pouches that jingled suspiciously with every nervous twitch, and he looked less like a hardened criminal and more like an accountant who had accidentally wandered onto a crime scene. He finally yanked the mask up, revealing his wide, terrified eyes, and gestured wildly at the sprawling backyard where the real target lay. A cardboard fortress stood near the patio, surrounded by a lawn littered with deflated balloons and the debris of sugary chaos. In the distance, the muffled roar of screaming children echoed like a tidal wave of pure, unadulterated energy. Molly pulled a blueprint from her pocket, though it was clearly just a pizza box napkin stained with frosting and what might have been glitter glue. She smoothed it out against her knee, ignoring the way the grease smeared across the diagram of the ventilation shaft. The plan was theoretically sound, provided one ignored the moral implications and the sheer likelihood of death by piƱata stick. Bill was already checking his gear, running a thumb over a grappling hook fashioned entirely from licorice whips, checking the tensile strength with a grimace. Molly adjusted her grip on a map, the grease already seeping into her palm. She glanced at the patio, where the hostess was lighting the candles on a flammable cake. The air smelled of wax and anticipation. "This mask is a chicken," Bill whispered, tugging at the beak. "Professional reputation risk." "Psychological warfare. Hostess expects a wolf. Chicken lowers guard." Bill grumbled, checking his licorice grappling hook. "Un-cut strings ahead. And the dog?" "Asleep. Bribed with ham and a napkin promising free haircuts." Bill nodded, eyeing the toddler fortress. "Cake warning?" "No breaking the cake. Fondant flower trajectory is lethal." "Mask tight?" "Straps secure. Chicken head feels light." "Vent clear?" "Balloon animal inflated. Exit route open." Bill gripped his belt, knuckles white. "Distraction speaker?" "Ready. Zipper sound, bathroom swarm, eight minutes. No eye contact with the glue wand." "Dog?" "Snoring. Ham secured." "Cake position?" "Single layer. No hidden gifts. Green light." "Glue wand range?" "Six feet. Stay out of splatter zone." Loot bags?" "Void. Capacity max." "Timer?" "Five minutes to chaos. Move." "Streamers?" "Glistening. Hard to grip." "Adjust tension." "For the glory?" "For the cards." "Go." Bill saluted, though the chicken beak clacked softly. Molly smirked, tightening a knot, and they crept toward the shadows, silent as two deflated balloons drifting in the wind.
created_at
"2026-04-27T10:33:21.093Z"
4/27/2026, 10:33:21 AM
model
"qwen/qwen3.6-flash"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
594
moderation_latency
(null)
generation_time
58866
tokens_prompt
77
tokens_completion
11409
native_tokens_prompt
76
native_tokens_completion
12008
native_tokens_completion_images
(null)
native_tokens_reasoning
11311
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.018031
router
(null)
provider_responses
0
endpoint_id
"2435bec9-c808-4f27-9fc8-12f3460d7b66"
id
"chatcmpl-03ef0b8f-07dd-9ab4-a3be-05a00f97bdaa"
is_byok
false
latency
594
model_permaslug
"qwen/qwen3.6-flash"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777286001-ybhTUVFjDAEM3cpocV6I"
session_id
(null)
api_type
"completions"
id
"gen-1777286001-NdzwqK75NxnNSu0ZH5Me"
upstream_id
"chatcmpl-03ef0b8f-07dd-9ab4-a3be-05a00f97bdaa"
total_cost
0.018031
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.21%
Matches word count
n/a
neededClean
false
words
472
37.67%
Dialogue to Total Word Ratio
Ratio: 25.58%, Deviation: 4.42%
neededClean
false
wordsTotal
473
wordsDialogue
121
18.9440%