NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ70H57JWM4WN0JDECSKJRTC
Run Details
Model:
Qwen 3.5 Plus (2026-04-20)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
2m 12s
Cost:
$0.020990
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
0.03%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0152%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Bill wedged himself between a cooler and a box marked Fragile Egos, his fingers trembling as he adjusted a plastic grappling hook taped to his wrist. The device looked less like espionage gear and more like a toilet plunger that had lost a fight with a stapler. Molly was on her knees, organizing three black trash bags with the intensity of a bomb disposal expert. She circled the word PiƱata on a list taped to the cracked dashboard. Bill tried to suck a suction cup off his thumb. The grapple is weeping, Mol. It's leaking hydraulic fluid everywhere. Bill rubbed his temple, smearing blue grease that matched their mandatory clown makeup, and tapped the gadget until a small spring ejected, pinged the windshield, and vanished into the floor mats. Ignore it. We aren't using the grapple. The bouncy castle on the west patio changed the approach vector. If we go through the azaleas, the clown sees us. If we breach the hedge, we trigger the motion sensor on the karaoke machine. Molly brushed lint from her tactical spandex and checked the loadout. Three duffels lay ready, stuffed with insulated gloves, a laser pointer for the dogs, and a bag of organic grapes to bribe General Timmy. We need the karaoke distraction. When Despacito hits the chorus, parents close their eyes to judge the pitch. That window is all we get to sweep the table and grab the console. Bill groaned, leaning back until his head thudded the seat. He pulled a crumpled photo from his pocket showing a four-year-old boy in overalls holding a wooden spoon like a broadsword, guarding a mountain of wrapped boxes. General Timmy has eyes in the back of his head. I scouted him. He's wearing noise-canceling headphones. He's immune to the music. He sees the glitter flash he starts the scream protocol. Once he screams, Dad comes out with the weed whacker, and that thing ignores stealth. Molly snatched the photo and shoved it into her pocket. She grabbed a handful of glitter from a mason jar, rubbing the toxic dust between her fingers as flakes fell like disco snow. Timmy is a variable. That's why we carry the ferret sock. We deploy the puppet, he gets curious, he drops the spoon, we grab the loot, we go. No screaming. Just a ferret and a missing Lego set. Bill stared at the sock puppet hanging from her belt. It had mismatched googly eyes, giving it an unnerving expression of pure madness. That ferret doesn't look like a distraction. It looks like it knows things. It's going to traumatize the kid. We'll need a priest, not a getaway driver. And honestly, Mol, that sock has teeth. Actual teeth. Where did you get teeth on a sock? Psychological warfare, Bill. Now check your zip ties. We're parking.
created_at
"2026-04-27T08:22:42.167Z"
4/27/2026, 8:22:42 AM
model
"qwen/qwen3.5-plus-20260420"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1131
moderation_latency
(null)
generation_time
132106
tokens_prompt
77
tokens_completion
7960
native_tokens_prompt
76
native_tokens_completion
8733
native_tokens_completion_images
(null)
native_tokens_reasoning
8115
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0209896
router
(null)
provider_responses
0
endpoint_id
"8b4a0fbb-6705-4f8d-a7f1-ad18ecc0edbc"
id
"chatcmpl-e229e358-9363-91d6-91b0-83fc768a06ac"
is_byok
false
latency
1131
model_permaslug
"qwen/qwen3.5-plus-20260420"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777278162-5fhFjTyASKOGzsdIXTEs"
session_id
(null)
api_type
"completions"
id
"gen-1777278162-KhNYRPcOTqsvi196dGOI"
upstream_id
"chatcmpl-e229e358-9363-91d6-91b0-83fc768a06ac"
total_cost
0.0209896
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.03%
Matches word count
n/a
neededClean
false
words
470
0.00%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 50.00%
neededClean
false
wordsTotal
473
wordsDialogue
0
0.0152%