NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ6YZXMTGQ15AGJ7YDEJQBMW
Run Details
Model:
Qwen 3.5 Plus (2026-04-20)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
1m 56s
Cost:
$0.016132
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
2.82%
Dialogue to Total Word Ratio
(details)
1.4078%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Bill unfurled the crumpled map of the Johnson residence across the sticky table of the diner, pressing down a salt shaker to keep it from rolling toward the sugar packets. His eyes widened with the manic gleam usually reserved for bank vaults and diamond exchanges, though currently, they were trained on a hand-drawn diagram of an inflatable slide labeled 'The Bouncy Beast of Doom.' Molly sipped her lukewarm coffee, staring at the layout with the dead-eyed resignation of a bomb disposal expert who had just been handed a banana instead of wire cutters. The operation was simple on paper, which was exactly why it terrified her. Infiltrate the backyard fortress, neutralize the guardians, and extract the payload before the cake cutting commenced. There were three million dollars in untold riches at stake, or at least enough Lego sets and plastic ponies to fence for a weekend's worth of ramen. "The ingress point is the doggy door," Bill whispered, tapping a red arrow that bypassed the fence entirely. "I can shimmy through in four seconds if I exhale fully." Molly set her cup down. "Bill, that doggy door is for a Yorkshire terrier. You will pop out like a cork from a champagne bottle and likely take the fence posts with you." "I can compress," he insisted, demonstrating by sucking in his cheeks until he resembled a deflating balloon animal. "It is a matter of core strength and will." Molly questioned the morality of robbing a child, her voice dropping to a whisper that carried the weight of a thousand ethical dilemmas, though she knew better than to stop him now. Bill waved off the concern, claiming that targets were targets regardless of age and that little Timmy lacked the emotional maturity to mourn three identical robotic dinosaurs anyway. Bill arranged salt packets into defensive formations, outlining the patrol routes of the parents, whom he referred to as 'The Chaperone Golems.' According to his intelligence, the maternal unit would be guarding the juice station while the paternal figure attempted to ignite a charcoal grill that had not worked since Obama's first term. This window of distraction was crucial. Molly pulled a small notebook from her pocket, scribbling notes on wind direction and the psychological impact of piƱata debris on stealth maneuvers. She knew the risks. The primary threat was not the parents; it was the chaotic variable of the guests. A brigade of sugar-fueled minors possessed a sensory detection system that could spot a sneeze from three zip codes away. One misplaced giggle or the clatter of a stolen Happy Meal box would trigger a cascade of shrieking that would bring the entire neighborhood down on their heads. "What about the party favors?" Molly asked, circling a cluster of blue packets near the exit. "If we take the loot, the kids get mad. Mobs of children are unpredictable. They have sticky hands and zero mercy." "We leave a diversion," Bill announced, producing a slightly crushed rubber duck from his pocket. "I release Mr. Quackers near the bouncy castle. The horde will follow. It is classic misdirection." "You are using a rubber duck to herd a swarm of feral kindergarteners?" Bill argued that the acoustic properties of the duck were irresistible, a defense Molly quickly dismantled by citing her experience raising a niece who would betray them all for less. Bill leaned back, satisfied with his masterstroke, confident that he had just orchestrated the greatest heist of the century. He imagined the headlines, the whispers in the criminal underworld of 'The Great Present Heist of Oak Street.' He failed to see the pity in Molly's eyes as she packed her notebook away, already calculating the time remaining until she could suggest they simply go to the store and purchase a single fruit basket, saving them from the inevitable humiliation of getting stuck in a doggy door while pursued by a mother wielding a bag of confetti poppers.
created_at
"2026-04-27T07:56:03.475Z"
4/27/2026, 7:56:03 AM
model
"qwen/qwen3.5-plus-20260420"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
639
moderation_latency
(null)
generation_time
101253
tokens_prompt
77
tokens_completion
7264
native_tokens_prompt
76
native_tokens_completion
6709
native_tokens_completion_images
(null)
native_tokens_reasoning
5893
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.016132
router
(null)
provider_responses
0
endpoint_id
"8b4a0fbb-6705-4f8d-a7f1-ad18ecc0edbc"
id
"chatcmpl-aa19af88-d8a3-9e56-b0bf-c14416ff2a74"
is_byok
false
latency
639
model_permaslug
"qwen/qwen3.5-plus-20260420"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777276563-u692lKs0ZpG9Izl9vWaZ"
session_id
(null)
api_type
"completions"
id
"gen-1777276563-2ahs0xJ3jAuLh1gFLh5Z"
upstream_id
"chatcmpl-aa19af88-d8a3-9e56-b0bf-c14416ff2a74"
total_cost
0.016132
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
655
2.82%
Dialogue to Total Word Ratio
Ratio: 38.45%, Deviation: 8.45%
neededClean
false
wordsTotal
658
wordsDialogue
253
1.4078%