NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ7A23GGTDEV5MSBTZ7SGDM8
Run Details
Model:
Qwen 3.6 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
1m 18s
Cost:
$0.026169
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
97.63%
Matches word count
(details)
99.87%
Dialogue to Total Word Ratio
(details)
98.7483%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted the strap of her utility belt, which currently held nothing but a half-eaten granola bar and a spare roll of industrial adhesive tape. The inflatable unicorn bouncing across the damp backyard provided the only meaningful lighting as they crouched behind a meticulously trimmed hedge. Bill checked his digital watch, then reviewed the tactical schematic he had hastily drawn on a stained cocktail napkin. He looked thoroughly unprepared for any operation involving plastic bouncy castles, aggressive patio furniture, and lawn gnomes. Molly tapped her reinforced boot against the wet grass, patiently waiting for him to stop sweating through his bright neon green visor. The plastic headband was slowly melting into his forehead. She sighed, recalibrating her expectations. Targets who did not wear inflatable dinosaur costumes usually did not require tactical napkins. The billionaire son had apparently received every single electronic toy ever manufactured by major manufacturers worldwide. The payout for clearing the entire driveway was substantial enough to fund her retirement on a quiet Mediterranean island. Still, something about the premise grated heavily on her professional sensibilities. She had never stolen from a children’s birthday party before, and the sheer volume of wrapping paper made her hands itch. The heavy dew soaked through her tactical boots. A distant lawnmower droned on the adjacent street. She watched Bill fumble with a heavy zipper, his knuckles white against the dark fabric. The evening air smelled of cut flowers and charcoal. She knew the protocol cold. She had executed dozens of similar operations across the tri-state area. This mission felt distinctly different. The targets were smaller. The timeline was compressed. She took a steady breath and prepared to step forward. She adjusted her tactical gloves and nodded toward the side door. The perimeter was clear. The house was secured. They just needed to move before the cake cutting began and attracted too much attention. The speaker system began testing its volume levels through rapid static clicks. She checked her concealed pockets one last time. The plan was solid. Wind shifted left. He nodded back. "The guard schedule is tight," Bill whispered. "Adults gather near the punch bowl at six thirty. They ignore the kids until cake cuts. That gives us forty minutes to secure the loot and exfiltrate before the piñata breaks." "Forty minutes is an eternity. Have you accounted for the toddler blockade?" "I brought a bag of glitter and a recording of a crying baby. They will freeze in confusion." "Please tell me that is a joke." "I am serious. I tested it on my neighbor's hamster. It worked completely." "What if they catch us before deploying?" "Then we use the backup plan." "Which involves?" "I pretend to be an entropy inspector. I declare the yard compromised by joy. We carry the presents out on a dolly under municipal regulations." "Fine. But if we get caught, I blame you. You carry the inflatable dragon." "Understood. Now let us move carefully." "Cake melting now."
created_at
"2026-04-27T11:09:14.648Z"
4/27/2026, 11:09:14 AM
model
"qwen/qwen3.6-flash"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
631
moderation_latency
(null)
generation_time
77615
tokens_prompt
77
tokens_completion
12935
native_tokens_prompt
76
native_tokens_completion
17433
native_tokens_completion_images
(null)
native_tokens_reasoning
16814
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0261685
router
(null)
provider_responses
0
endpoint_id
"2435bec9-c808-4f27-9fc8-12f3460d7b66"
id
"chatcmpl-a6216c46-099c-93cc-b7f7-4fa806905e21"
is_byok
false
latency
631
model_permaslug
"qwen/qwen3.6-flash"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777288154-ZsVCRcQbsVxavJx2DDwX"
session_id
(null)
api_type
"completions"
id
"gen-1777288154-J859Ge2uMxA5eU0V8ZCh"
upstream_id
"chatcmpl-a6216c46-099c-93cc-b7f7-4fa806905e21"
total_cost
0.0261685
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
97.63%
Matches word count
n/a
neededClean
false
words
493
99.87%
Dialogue to Total Word Ratio
Ratio: 30.16%, Deviation: 0.16%
neededClean
false
wordsTotal
494
wordsDialogue
149
98.7483%