NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ7902E6QEDPTVGGJ0VRKA0B
Run Details
Model:
Qwen3.6 Max Preview
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
3m 54s
Cost:
$0.066383
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
98.71%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.3562%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The laminated floor plan of the suburban ranch house lay spread across the coffee table, weighed down by pretzels and a yellow rubber chicken. Molly traced a red marker along the hallway. Bill leaned over her shoulder, adjusting his reading glasses while tapping a pen against his chin. The afternoon sun caught dust motes dancing above their makeshift command center. Empty coffee cups formed a defensive perimeter around the blueprints. A ceiling fan clicked rhythmically overhead, keeping time with their criminal ambitions. The neighborhood remained blissfully unaware of the impending raid. Walk me through the entry again, Molly said. And please tell me it does not involve the bouncy castle. The inflatable fortress is a tactical marvel, Bill replied. It absorbs sound, masks footsteps, and provides excellent cover from adult sightlines. We are stealing wrapped gifts from six year olds, Bill. We do not need tactical rubber. You say that until a face painter corners you. Those people have eyes everywhere. We go through the side gate during musical chairs. Maximum chaos, minimum supervision. What about the parents? Distracted by cheap wine and aggressive small talk. I mapped the beverage station. It is a blind spot the size of a minivan. Fine. Then we hit the gift table. How many boxes? Roughly twenty-four, plus three oversized ride-on vehicles. I brought collapsible duffels and a foldable hand truck. You brought a hand truck to a birthday party. Efficiency matters, Molly. Do you know how much a plastic kitchen set weighs? Keep the noise down. If a kid cries, we abort immediately. Understood. Though the pinata presents a complication. How does a paper mache donkey complicate a grab and run? It hangs directly above the target zone. One stray swing from a sugar-fueled child and we are buried in candy and debris. I recommend cutting the rope early. You want to drop the pinata before the party does? It creates a diversion. Kids scramble for sweets, adults panic, we slip out the back with the loot. Classic misdirection. That is genuinely diabolical. I love it. What about the clown? Neutralized. I bribed him with a twenty and a promise to ignore his unlicensed balloon operation. You bribed a clown. He drives a Honda Civic, Molly. Everyone has a price. Synchronize watches. We move at three fifteen, right after the cake song begins. Parental guard drops to historic lows. Check the weather forecast. Partly cloudy with a chance of screaming toddlers. Three fifteen. Got it. And if we get caught? We claim we are the entertainment. I packed juggling pins and a profound sense of regret. Just stick to the duffels. And Bill? Yeah? Leave the rubber chicken at home. Molly capped the marker and rolled up the floor plan with a sharp snap. Bill sighed, pocketing his glasses as the distant sound of a neighbor lawnmower drifted through the window. She checked her watch and nodded toward the door. The heist was set. Tomorrow would bring either a haul of plastic treasures or a lengthy community service sentence.
created_at
"2026-04-27T10:50:39.436Z"
4/27/2026, 10:50:39 AM
model
"qwen/qwen3.6-max-preview-20260420"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2142
moderation_latency
(null)
generation_time
234394
tokens_prompt
77
tokens_completion
8792
native_tokens_prompt
76
native_tokens_completion
8498
native_tokens_completion_images
(null)
native_tokens_reasoning
7827
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0663832
router
(null)
provider_responses
0
endpoint_id
"ca7fc0c7-f23d-4ac5-b2e0-945be5b61787"
id
"chatcmpl-f818f28a-c239-9674-8c6c-eccdb767468a"
is_byok
false
latency
2142
model_permaslug
"qwen/qwen3.6-max-preview-20260420"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777287039-ybiQh0DWfnhz660g9iFH"
session_id
(null)
api_type
"completions"
id
"gen-1777287039-K6cFPAO0aXC7Wujl2maO"
upstream_id
"chatcmpl-f818f28a-c239-9674-8c6c-eccdb767468a"
total_cost
0.0663832
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
98.71%
Matches word count
n/a
neededClean
false
words
506
0.00%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 70.00%
neededClean
false
wordsTotal
509
wordsDialogue
0
49.3562%