NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ79WMJX37R82YVMAB715XAH
Run Details
Model:
Qwen 3.6 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
56.0s
Cost:
$0.016952
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
99.92%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.9595%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The narrow alley behind the sunflower school smelled heavily of wet mulch and poorly conceived birthday choices. Molly adjusted her heavy wire cutters and squinted directly at the paper plate decorations fluttering above the wooden fence. Bill stood quietly beside her, carefully testing the precise weight of a stolen burlap sack that now held a traffic cone and exactly three rubber ducks. The pale moon hung low above the quiet neighborhood like a half forgotten cupcake. Molly checked her battered analog watch, then tapped the side of her head where a stray hairnet threatened to escape her careful bun. She mentally ran through the checklist for the third time, noting that their entire strategy relied heavily on theatrical misdirection and the inherent chaos of sugary six year olds. The perimeter was poorly secured, which usually meant either extreme incompetence or a carefully laid trap involving angry relatives. Molly preferred incompetence almost immediately. Bill smoothed his hand drawn blueprint firmly against the brick wall, his fingers trembling slightly with caffeine and tactical adrenaline. He pointed decisively at a circled area near the kitchen service door. I am running point on the bouncy castle entry. We slide the sack under the door. No hesitation. If the pug moves, you distract it with pretzels. Molly nodded slowly, eyeing a suspiciously still garden gnome wearing a party hat. Do not engage the dog. Just toss the bait and move. I have the latch tool. Keep your eyes on the clock. Thirty seconds max. It is a lot of plastic for two people carrying sacks. Bill shifted his stance and adjusted the duct tape wrapped around his wrist. That is the point. More plastic means higher volume. We maximize the haul. The insurance payout would be zero. Our reward is purely psychological. Psychology pays the rent eventually. A distant voice carried over the fence where parents were loudly instructing children to line up for piƱata duty. The wind carried the faint jingle of balloon strings from the adjacent yard. Somewhere inside, a cartoon character blasted cheerful music through thick walls. Bill adjusted his gloves, which were slightly too large for his nimble fingers. He had practiced this exact sequence seven separate times in his garage. The concrete floor felt cold through his sneakers. He imagined the neatly stacked boxes behind the door. Molly checked her watch again and straightened her slightly damp collar. The sugar rush is our cover. Once the kids start swinging sticks, everyone becomes a liability. If the parents turn around, we freeze. Act like children. We look like adults in masks. Bill snapped the burlap sack shut and tested the knot with serious concentration. I will take the left corner. You take the right. Do not drop the stuffed dinosaur. It weighs twenty pounds. On three. One. Two. They bolted toward the service entrance, boots slipping on damp leaves, leaving only the sound of rustling hedges and the faint squeak of a plastic trowel falling from the fence.
created_at
"2026-04-27T11:06:15.526Z"
4/27/2026, 11:06:15 AM
model
"qwen/qwen3.6-flash"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
623
moderation_latency
(null)
generation_time
55870
tokens_prompt
77
tokens_completion
10641
native_tokens_prompt
76
native_tokens_completion
11289
native_tokens_completion_images
(null)
native_tokens_reasoning
10690
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0169525
router
(null)
provider_responses
0
endpoint_id
"2435bec9-c808-4f27-9fc8-12f3460d7b66"
id
"chatcmpl-0f5ae201-073b-9cd2-9cf0-b33b1f65ab91"
is_byok
false
latency
623
model_permaslug
"qwen/qwen3.6-flash"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777287975-QQeX64ZbAwF2H5WxznwK"
session_id
(null)
api_type
"completions"
id
"gen-1777287975-CvDNXmSzDtSFgpuuMZu0"
upstream_id
"chatcmpl-0f5ae201-073b-9cd2-9cf0-b33b1f65ab91"
total_cost
0.0169525
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
99.92%
Matches word count
n/a
neededClean
false
words
497
0.00%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 30.00%
neededClean
false
wordsTotal
497
wordsDialogue
0
49.9595%