NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ6ZA76MRNNJFG0NJCT105RJ
Run Details
Model:
Qwen 3.5 Plus (2026-04-20)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
1m 51s
Cost:
$0.017613
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
20.19%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
10.0948%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly tapped the map on a pizza box. "Bill, the invite says 'Under the Tree,' but the site has a Spider-Man piñata." Bill snapped his suspenders. "Details. We adapt. That piñata is a decoy. The loot is inside the arachnid." "We're stealing glitter glue from a six-year-old. This is the heist?" "High-value targets. The glue is industrial grade. Also, Timothy's father collects vintage Pokémon cards. The card is in the safe, disguised as a teddy bear." "How do we crack a stuffed bear? Do you have the laser cutter?" "Too loud. I brought a stethoscope. Or I tickle it until it confesses." "Tickling is not a skill set." "Everything is a skill set if you charge enough. The perimeter has three adults and a corgi named Barnaby. Talk fast." "Barnaby is manageable. I brought a sausage." "You brought a sausage?" "Premium bratwurst. It buys forty seconds of distraction. I scale the gift table, you work the balloon station, we extract the card." "And the timing?" Molly asked, chewing her lip. "Two-thirty sharp. Right after the cake. Digestion makes guests lethargic. Plus, the sugar crash induces naps by three. We have a forty-minute window before the parents wake up." "Forty minutes. Tight. What about the camera in the hallway?" "Fake. I scanned it. It's a Wi-Fi extender with a blinking LED. Amateur hour." "You're sure? Last time you said the laser grid was a spiderweb." "That spiderweb had voltage spikes! It was a booby trap. I respect the craftsmanship. You just panicked." "I got electrocuted." "I got inspired. Now, about the clown nose..." "I'm drawing the line at the nose." "It's essential for the camouflage. Parents see a clown, they assume incompetence. We use their prejudice against them. It's psychological warfare." "It's humiliation warfare. I look like a tomato with a fear of commitment." "You're the face of the operation. I'm the muscle. Well, the brain-muscle hybrid. You're the distraction. Smile, hold the balloon animal, and try not to make eye contact with the birthday boy. He has the eyes of a hawk." "Hawks see movement. I'll be statue-still in the corner." "Good. What's in the duffel?" "Sausage, nose, balloon pump, lockpicks, grappling hook made of bent wire hangers, and emergency lollipop." "Lollipop?" "Bribe for the corgi. If the sausage fails, the sugar hits. Barnaby is a known sugar-hound. I profiled the whole house. The father is betting on horses via his phone. If we feed him a winning tip on 'Crazy Mule,' he'll hand us the safe himself." "You're fixing the race?" "I'm ensuring a payout. Chaos theory. Now, stairs or slide?" "Stairs. Slide is Option B. The tux reduces friction, though." "Greased tux. I like it. Practical. Grab the bratwurst. It's getting lukewarm." "Lukewarm bratwurst? Bill, that's a health violation." "Cold sausage, warm loot. Move out." Molly grabbed a jacket. "If we get arrested, I'm blaming the clown nose."
created_at
"2026-04-27T08:01:26.233Z"
4/27/2026, 8:01:26 AM
model
"qwen/qwen3.5-plus-20260420"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
699
moderation_latency
(null)
generation_time
110549
tokens_prompt
77
tokens_completion
7346
native_tokens_prompt
76
native_tokens_completion
7326
native_tokens_completion_images
(null)
native_tokens_reasoning
6569
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0176128
router
(null)
provider_responses
0
endpoint_id
"8b4a0fbb-6705-4f8d-a7f1-ad18ecc0edbc"
id
"chatcmpl-9c44acbd-daa8-9e5b-85b6-c76b0bd9c400"
is_byok
false
latency
699
model_permaslug
"qwen/qwen3.5-plus-20260420"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777276886-pawCXdGYG8RsOdrTHgOi"
session_id
(null)
api_type
"completions"
id
"gen-1777276886-isDiCe1HhS0cY6wNRkbr"
upstream_id
"chatcmpl-9c44acbd-daa8-9e5b-85b6-c76b0bd9c400"
total_cost
0.0176128
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
20.19%
Matches word count
n/a
neededClean
false
words
480
0.00%
Dialogue to Total Word Ratio
Ratio: 95.71%, Deviation: 25.71%
neededClean
false
wordsTotal
490
wordsDialogue
469
10.0948%