NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ76WMSCYTB02057K60XDSM3
Run Details
Model:
Qwen 3.6 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
47.5s
Cost:
$0.014601
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her spectacles and tapped the laminated map against the oak table. Check the blueprints, Bill. I want every corner mapped. Bill stared at the glossy sheet for a long moment. I’m looking at it, Molly. It’s a bouncy castle. There’s no corner. It’s a triangle of plastic. She smirked and smoothed her tactical jacket. Perfect. The kid’s party starts at three. We slip in at two forty-five. What’s our primary objective? He unrolled a heavy canvas duffel bag on the rug. Presents. Under the tree. Wrapped in paper that definitely belongs to someone else. Molly checked her wrist gadgets and tapped the screen. Exactly. We need a distraction. The piñata? Bill leaned against the wall, studying the floorboards carefully. Too loud. The parents will investigate. We need something that keeps eyes forward, not backward. The magician. She glanced at her watch, noting the second hand sweeping downward. The clown with the unicycle? He pulled oversized gloves from a drawer. The very same. He pulls rabbits out of hats. We pull gift bags out of baskets. I’ll handle the sleight of hand. You secure the extraction route. Molly paced the rug, counting steps between the furniture. Which is the hedge? The one with the sprinklers? He flipped a silver key over his knuckles. Already disabled. I greased the mechanism with three tablespoons of butter. She raised an eyebrow and checked her belt clips. Brilliant. What about security? Bill pointed toward the ceiling. Nanny Cam. Top shelf, left of the fireplace. I’ll loop feeds using a modified tablet and a strong napkin. Molly snorted and adjusted her goggles. You taped a screen to a paper napkin? He snapped the duffel shut with a decisive clack. It’s called creative engineering, Molly. I learned it in the Academy of Mischief. Now, timing. When does the cake come out? She checked her phone, reading the notification. Exactly at three-oh-two. Right after the singing. He grinned and reached for his backpack, checking the zippers once. That’s our window. Three-oh-three. I drop the handkerchief of sorrow. You sprint. We fill the bag. We are out before the first note fades. She hesitated, twisting her gloves tightly between her fingers. What if they do the gift-opening ceremony early? He walked to the door and checked the lock. They won’t. The mother is overly schedule-driven. I watched her calendar. Pink means cake. Blue means presents. Red means panic. We hit during the transition. She followed him to the hallway. You studied a party schedule in hexadecimal? He turned the handle and pushed the door open. It’s hexagonal. Like honeycomb. Like wrapping paper patterns. Does it matter? The point is, we move when the snacks arrive. Distraction is guaranteed. She stepped onto the porch and felt the warm sunlight. I still worry about the toddlers. They’re unpredictable. He adjusted his straps and looked back over his shoulder. They’re our greatest asset. They scream at shadows and run in circles. If we trip a wire, the toddlers chase the laser pointer. She nodded and pulled a small toolkit from her pocket. Right. The laser pointer is set to red? He walked toward the driveway, scanning the windows. Obviously. I calibrated it to a startled hamster. It’ll draw every parent toward the shed. She quickened her pace and matched his stride. And the dog? The aggressive terrier? He unlocked the car and tossed a bag of snacks in the back. Already bribed with cheese. It will be asleep by the time we lift the wrapping paper. She opened the passenger door and slid inside. What’s our cut? He inserted the key and turned the ignition. Fifty-fifty split. She buckled her seatbelt and tapped the dashboard. I’ll settle for the crayons. We split the rest. He put the car in reverse and checked the mirrors. Fair. Keep it quiet. No whistles. If caught, we’re just enthusiastic volunteers. She rested her head against the window and sighed. Just volunteers helping a child unwrap a bicycle. He glanced at her and adjusted his collar. Exactly. Wipe your boots. He checked the side mirror. Yes? She tapped her soles against the floor mats. Do it, Bill. He put the car in drive and checked the road. Got it. She reached for the ignition switch. We’re walking. The minivan is still there.
created_at
"2026-04-27T10:13:50.002Z"
4/27/2026, 10:13:50 AM
model
"qwen/qwen3.6-flash"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
646
moderation_latency
(null)
generation_time
47439
tokens_prompt
77
tokens_completion
9686
native_tokens_prompt
76
native_tokens_completion
9721
native_tokens_completion_images
(null)
native_tokens_reasoning
8742
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0146005
router
(null)
provider_responses
0
endpoint_id
"2435bec9-c808-4f27-9fc8-12f3460d7b66"
id
"chatcmpl-43cb239f-4f78-9b34-920f-953c2ffc52d7"
is_byok
false
latency
646
model_permaslug
"qwen/qwen3.6-flash"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777284830-PXlziiorJEVzta2Yjtxg"
session_id
(null)
api_type
"completions"
id
"gen-1777284830-KQZmfku0GLNN16atmfk1"
upstream_id
"chatcmpl-43cb239f-4f78-9b34-920f-953c2ffc52d7"
total_cost
0.0146005
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
717
0.00%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 70.00%
neededClean
false
wordsTotal
725
wordsDialogue
0
0.0000%