NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ7909TNTTREPYJ34DXMXPC6
Run Details
Model:
Qwen 3.6 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
53.4s
Cost:
$0.018406
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
81.27%
Matches word count
(details)
43.09%
Dialogue to Total Word Ratio
(details)
62.1794%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly unrolled the laminated blueprint across the greasy diner counter. The photograph of a suburban living room showed an inflatable castle, a dangerously overdecorated cake, and a mountain of wrapped boxes. Bill pushed his coffee cup aside and tapped a grease-stained finger on the photo. He adjusted his wire-rimmed glasses and smoothed a crisscross diagram drawn in red marker. The paper smelled faintly of bacon and desperation. This was it. The operation of the century. Or at least the operation that would pay for his dry cleaning bills. Outside, rain drummed against the fluorescent windows like nervous fingers. "Alright, listen up. The inflatable castle drops the barrier at seven-fifteen. We go in at seven-thirty sharp." Bill shook his head and uncapped a fine-tipped pen. "Seven-thirty? That’s when the cake hits the table. I’m not risking the icing phase. Too much sugar rush." "Fine. Seven-twenty. But the piñata distraction only lasts four minutes before the mom with the clipboard notices." "I mapped the guard rotation. Three dads in matching polo shirts. I call them the Polo Squad. They rotate near the buffet table every six minutes." "And the presents?" "Clustered under the tree. We sweep left to right, bag them, and exit through the laundry room. I’ve modified the duffels with velvet lining for the cardboard boxes." "You lined duffels with velvet?" "It’s called professionalism. About the face-painting distraction. Can you handle a tiger mask for ninety seconds?" "I did the summer camp circuit. I’ll be a feline for two minutes if I have to. But what if the birthday kid demands a unicorn?" "Improvise then. Don’t let the brush touch your left ear. Last time you sneezed, we lost the van and the projector." "Noted. What’s our cut?" "Fifty-fifty after liquidation. LEGO sets go to the syndicate. Plushies to boutique resellers. Discount gym books? Burn them." "Fair. What if the parents start dancing during the song?" "We move during the chorus. The beat masks the back door hinges. I oiled them with candle wax. It ruins the carpet, but who’s going to know? We’re already out." "I still think we’re underestimating the cake. It’s a two-tiered monster with fondant dragons." "Let it breathe. No sweets. Stick to the script. Seven-twenty. Tiger mask. Don’t engage with the kids. If a four-year-old asks for a hug, you give them your phone and walk away." "You’re weirdly prepared." "I read the invitation four times. This isn’t a vault. It’s toddlers and exhausted parents. We have all the advantage. Just don’t trip over the balloon arch." "I never trip." "Last Tuesday you tripped over a hose and knocked over a bin." "Tactical stumble. Fine. Seven-twenty. Tiger mask. I’ll bring the wax." Bill stood up, adjusted his blazer, and checked his watch. He grabbed his keys and gave her a sharp nod. "See you in the field. And Molly?" "Yeah?" "If anyone offers you a juice box, say no. Sugar makes us sloppy."
created_at
"2026-04-27T10:50:47.005Z"
4/27/2026, 10:50:47 AM
model
"qwen/qwen3.6-flash"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
628
moderation_latency
(null)
generation_time
53074
tokens_prompt
77
tokens_completion
10943
native_tokens_prompt
76
native_tokens_completion
12258
native_tokens_completion_images
(null)
native_tokens_reasoning
11556
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.018406
router
(null)
provider_responses
0
endpoint_id
"2435bec9-c808-4f27-9fc8-12f3460d7b66"
id
"chatcmpl-1c179794-0768-9e97-91fa-bbb4f560107c"
is_byok
false
latency
628
model_permaslug
"qwen/qwen3.6-flash"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777287047-b2tiCf3HxjoGW2fb7Nng"
session_id
(null)
api_type
"completions"
id
"gen-1777287047-vRTsBURTWdzxJjegadn5"
upstream_id
"chatcmpl-1c179794-0768-9e97-91fa-bbb4f560107c"
total_cost
0.018406
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
81.27%
Matches word count
n/a
neededClean
false
words
488
43.09%
Dialogue to Total Word Ratio
Ratio: 74.10%, Deviation: 4.10%
neededClean
false
wordsTotal
502
wordsDialogue
372
62.1794%