NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ78CJDAHMDSHXW8XVVEQYKJ
Run Details
Model:
Qwen 3.6 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
29.8s
Cost:
$0.009168
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
6.09%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
3.0453%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The rain drummed against the alley dumpster while Bill adjusted his neon yellow visor. Molly tapped her stopwatch against her thigh. Are you wearing a birthday mascot suit to a covert infiltration? It is camouflage. Every child loves a giant rubber chicken. They instinctively lower their guard. Also, the inventory is terrible. The left wing chafes. Right. So the plan is still viable. I need the timeline. Molly flipped her waterproof notebook open. We breach at seven fifteen. The parents will be distracted by the inflatable bouncy castle. The structural integrity will fail exactly four minutes before cake. That is our window. Four minutes is tight. The presents are clustered by the fireplace. Wrapped in festive paper the kids will tear through like wolves. We need the grab bags sorted. Already handled. I requisitioned those oversized reusable totes from the office supply depot. They look like they belong to a yoga instructor. Very trustworthy. We move in pairs. You take the left flank near the gift table. I will secure the right against parental interference. What is the distraction protocol? Kids are notoriously observant. They also throw juice boxes with terrifying accuracy. Classic misdirection. I will deploy the novelty glow sticks and a recorded trumpet fanfare. Nothing says celebrate like a deafening audio cue that drowns out tactical communication. Brilliant. I will use that to bag the larger items. What about piƱatas? Irrelevant. We are not here for sugar based percussion targets. We are here for the premium electronics and the limited edition plush dinosaurs. The budget allows for twelve crates maximum. Twelve crates with twenty four assorted toys each? That is nearly three hundred units. The escape route requires a wide enough van. The van is parked two blocks over. I rented it under the name Mrs. Gable. Very sweet lady. No one questions Mrs. Gable. The kids will think we are charity workers distributing leftovers. Leftovers? We are stealing presents at a birthday party. Semantics. We are redistributing joy. Also, it is a felony if we take everything, but a charitable act if we pretend to clean up. We move out at seven ten. I am wearing the chicken suit for psychological advantage. Fine. But if a toddler grabs your tail, you drop the totes and vanish. Do not engage with the snack table. We have seen how that ends. I respect the rules. Just remember, the key to a perfect job is leaving no evidence, minimal noise, and absolutely no high fives. Understood. Seven oh five. Time to suit up. You first. I will guard the dumpster. It is the only reliable landmark in this neighborhood. And Molly? If you see the giant cookie, do not eat the cookie. We need your reflexes intact. I would not dream of it. Molly pushed off the wall. Let us go steal some birthdays. Let it be known. Operation Plush Dino commences in sixty seconds. Stay sharp, stay quiet, and for the love of craft supplies, watch your step. Do you have the wax paper ready for the slip bags? I have three rolls and an industrial staple gun. The heist is ready.
created_at
"2026-04-27T10:40:00.434Z"
4/27/2026, 10:40:00 AM
model
"qwen/qwen3.6-flash"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
621
moderation_latency
(null)
generation_time
29726
tokens_prompt
77
tokens_completion
6727
native_tokens_prompt
76
native_tokens_completion
6099
native_tokens_completion_images
(null)
native_tokens_reasoning
5423
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0091675
router
(null)
provider_responses
0
endpoint_id
"2435bec9-c808-4f27-9fc8-12f3460d7b66"
id
"chatcmpl-0e08bb8d-6358-9454-ba0c-26ab0a6b0a1c"
is_byok
false
latency
621
model_permaslug
"qwen/qwen3.6-flash"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777286400-blGlVqo4IgJBhffYQRYP"
session_id
(null)
api_type
"completions"
id
"gen-1777286400-O6Nw0iaTVPT7o3h2WxhX"
upstream_id
"chatcmpl-0e08bb8d-6358-9454-ba0c-26ab0a6b0a1c"
total_cost
0.0091675
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
6.09%
Matches word count
n/a
neededClean
false
words
523
0.00%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 70.00%
neededClean
false
wordsTotal
523
wordsDialogue
0
3.0453%