NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ70HBSBVSGVXBMDGDB4V2XD
Run Details
Model:
Qwen 3.5 Plus (2026-04-20)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
2m 11s
Cost:
$0.019079
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
6.09%
Matches word count
(details)
22.48%
Dialogue to Total Word Ratio
(details)
14.2834%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly unrolled the crinkled blueprint on the grease-stained table, pressing a heavy wrench onto the corner to stop the paper from curling. The schematic showed the backyard, highlighting the castle, trampoline, and gift table near the adults. Bill paced the cramped safehouse, his knuckles cracking with a rhythm that suggested imminent panic. He stopped to peer at the drawing, adjusting his crooked glasses and wiping grease from his forehead while Molly marked the trellis approach in red ink, humming a tune that sounded suspiciously like the theme song to a cartoon about spies. "The piƱata?" Bill asked. Molly kept her marker moving, circling the gift table three times. "It's a distraction asset. While adults swing at the beast, we execute Phase Two. Silent entry via the trellis, secure the goods, exit through the rhododendron bushes." Bill slumped onto a milk crate, burying his face in his hands until his knuckles turned white. The idea of robbing a seven-year-old's birthday felt morally porous, a line they had never crossed before. Their resume featured casinos, not goody bags and finger sandwiches filled with soggy cucumber. He imagined the weeping child, the crumpled wrapping paper, and the sheer embarrassment of explaining this downgrade to the underworld community. The safehouse smelled of stale coffee and diesel, a stark contrast to the vanilla-scented heist awaiting them, and surfaces held discarded plans for jobs without balloon animals and cheap plastic swords. "What if there's a clown?" Bill whispered. "Clown defenses are unpredictable. Joy-based psychological warfare could compromise the op." Molly reached into her pocket and produced a pair of heavy-duty thermal goggles, sliding them across the table. She checked the lockpick set hidden inside a juice box, ensuring the seals were intact before holstering the device near her hip. "I have a squirt gun loaded with high-viscosity glitter glue. If the clown approaches, he gets blinded. We move fast. The loot is Lego, electronics, and junk, but the Lego pays rent for six months." "Lego?" Bill asked. Molly shrugged, sliding a crowbar into the pocket of a rainbow-striped clown suit waiting in the corner next to a box of unopened party favors they'd already lifted. The disguise invited laughter, fooling everyone until the bags were full, based on the belief that thieves could hide in plain sight inside silly costumes. Bill stared at the goggles, then back at the blueprint. Rent was due, and the landlord threatened eviction over stored converters in the shared hallway. The weight of financial ruin pressed down harder than the guilt. He straightened his tight jacket and swallowed hard, his stomach doing backflips at the thought of facing sticky fingers. "Glitter glue?" Bill muttered. "Right. But if I get a party hat jammed over my eyes, I'm screaming. I'm screaming loud." "Agreed. Check your grappling hook. The security system is themed. Stay sharp. We're stealing joy for liquidity."
created_at
"2026-04-27T08:23:00.274Z"
4/27/2026, 8:23:00 AM
model
"qwen/qwen3.5-plus-20260420"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
701
moderation_latency
(null)
generation_time
119738
tokens_prompt
77
tokens_completion
7792
native_tokens_prompt
76
native_tokens_completion
7937
native_tokens_completion_images
(null)
native_tokens_reasoning
7298
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0190792
router
(null)
provider_responses
0
endpoint_id
"8b4a0fbb-6705-4f8d-a7f1-ad18ecc0edbc"
id
"chatcmpl-3d432991-21f5-9847-a118-b47106a3eb31"
is_byok
false
latency
701
model_permaslug
"qwen/qwen3.5-plus-20260420"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777278180-YPNgYnQukiCvICl4CORl"
session_id
(null)
api_type
"completions"
id
"gen-1777278180-8jJRMoDwPxyRWG7Q3YlX"
upstream_id
"chatcmpl-3d432991-21f5-9847-a118-b47106a3eb31"
total_cost
0.0190792
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
6.09%
Matches word count
n/a
neededClean
false
words
477
22.48%
Dialogue to Total Word Ratio
Ratio: 24.54%, Deviation: 5.46%
neededClean
false
wordsTotal
485
wordsDialogue
119
14.2834%