NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ747HGDQ5NFG0YFQK8KF8JV
Run Details
Model:
Qwen 3.6 27B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
1m 39s
Cost:
$0.020378
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Rain drummed against the corrugated metal roof of the storage unit where Molly and Bill studied a hand-drawn map of a suburban driveway. The target was a sprawling backyard gathering packed with folding tables, plastic chairs, and enough wrapping paper to sustain a small paper mill. Molly tapped a red marker against the blueprint while Bill polished a pair of lockpicks with his sleeve. We stick to the extraction window, Molly said, pointing at a circled timestamp. Gift opening peaks at five thirty. The parents will be too busy cutting the cake and ignoring their phones to notice us. We have exactly ninety seconds before the drummer boy starts playing the same three songs on a loop. Ninety seconds feels tight for forty-two boxes, Bill muttered, adjusting his cap. Do you realize the logistical nightmare? We are talking about aggressive glitter, balloon animals with helium leaks, and at least three boxes of those talking dolls that wake up the entire neighborhood. Glitter is basically radioactive. I still find it in my coffee maker. Then we bypass the glitter, she replied, uncapping a second marker. Focus on the heavy cardboard. LEGO sets, board games, those ridiculous plush toys that record your voice and play it back at three in the morning. We grab, we load the van, we vanish. No heroics. No impromptu dancing. What if a toddler spots us? Bill lowered his voice. Do we deploy the emergency juice boxes? We rehearsed the enthusiastic uncle routine, but I still feel mildly traumatized by it. The way I had to clap and whistle while pretending to care about a plastic dinosaur was deeply compromising. We will not be an enthusiastic uncle, Molly emphasized, drawing a route through the garden hose maze. We will be a shadow. You take the east table near the fruit punch dispenser. I will clear the west stack beside the balloon arch. When the cake knife hits the frosting, we move. On three. The cake knife hits the frosting way too early, Bill pointed out. Parents always do it. They ignore the rules. They just want photos of the kid holding a spoonful of frosting like he conquered Rome. Then we wait for the second slice, she conceded, capping the marker. And please do not step on the sprinkler path. It is covered in crushed cookies. Cookies are nature anti-slip coating. I am not getting carted away because you tried to moonwalk over a butter biscuit. Understood, Bill said, sliding a canvas duffel over his shoulder. I packed the extra zip ties and the noise-canceling earmuffs. You know the toddler section tends to harmonize in minor keys when they get overstimulated. Perfect, Molly said, checking the tension on her grappling line, which was inexplicably woven from thick jump rope. Remember, this is not about force. It is about precision. We are not cracking vaults. We are infiltrating a seven-year-old's birthday. The security perimeter consists of a golden retriever and a very judgmental aunt holding a clipboard. The dog already knows my scent, Bill warned gently. He stole my left loafer last Tuesday. I still have a paper bag on my foot. Lure him with peanut butter, she said, slinging the rope over her shoulder. Stay low, move quiet, and do not open a single present. If I catch you peering inside the foil box, I will personally hand you to the piƱata operator. Noted, Bill said, stepping toward the rusted door. For the record, it is a mathematically sound operation. High reward, minimal resistance, and excellent loot-to-weight distribution. If we get caught, we can always claim we were municipal workers disposing of excessive consumer waste. Exactly, Molly said, flipping the lock with a smooth flick of her wrist. Now let us go liberate those toys before the parents remember they actually paid for the entertainment.
created_at
"2026-04-27T09:27:21.365Z"
4/27/2026, 9:27:21 AM
model
"qwen/qwen3.6-27b-20260422"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
635
moderation_latency
(null)
generation_time
87802
tokens_prompt
77
tokens_completion
6189
native_tokens_prompt
76
native_tokens_completion
5648
native_tokens_completion_images
(null)
native_tokens_reasoning
4835
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0203784
router
(null)
provider_responses
0
endpoint_id
"826cab4e-dc26-4472-83be-e4ddd5ebd2ab"
is_byok
false
latency
789
model_permaslug
"qwen/qwen3.6-27b-20260422"
provider_name
"Morph"
status
(null)
1
endpoint_id
"82097366-3a73-4790-ac36-f96ba9f6f577"
id
"chatcmpl-f99ef69c-0b99-9298-bb85-3d5940e4adf1"
is_byok
false
latency
635
model_permaslug
"qwen/qwen3.6-27b-20260422"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777282041-jOCw5vEkcsZRjLh78r2a"
session_id
(null)
api_type
"completions"
id
"gen-1777282041-yPDdzrEh5tqfE0x6E4En"
upstream_id
"chatcmpl-f99ef69c-0b99-9298-bb85-3d5940e4adf1"
total_cost
0.0203784
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
638
0.00%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 70.00%
neededClean
false
wordsTotal
645
wordsDialogue
0
0.0000%