NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-02-25-01KJBFN1RN84TG3DY6WCTZSKF3
Run Details
Model:
Qwen 3.5 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
47.4s
Cost:
$0.003015
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.64%
Dialogue to Total Word Ratio
(details)
0.3195%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The humidity pressed down hard on the backyard like a heavy wet blanket. Molly adjusted the strap on her worn leather satchel carefully. Bill checked his antique pocket watch again. They stood frozen near the deep shade of the old oak tree. "Are we sure about this?" Bill asked. "It seems undignified. This is not a bank. It is not a vault." Molly rolled her eyes. She checked the shadows under the tree. "We are stealing presents, Bill. Dignity went out the door when we wore matching jumpsuits." "These are costumes for a specific theme party." "It is a clowns theme. Do you want to be a clown while I steal the new Wii console?" "I prefer not to be associated with circus animals." "Shut up and hold the bucket." He held the canvas bucket like it was sacred. A loud laugh erupted from the crowded front porch where the adults stood. They watched the cake closely. "The parents are distracted." Molly whispered. "The cake is out in two minutes. That is our window." "When exactly do we strike?" Bill asked. "We must have an exit strategy." "We strike when the lights go out." "Why? What is the value of darkness?" "It feels cooler. They won't see us moving." "Logic is flawed. Children blame candles." "It is a party. Nothing makes sense." "What is the escape route?" Bill asked. "We need a plan." "Treehouse." "Three-story drop? That is suicide." "With the slide. Safety feature." "Features fail. We need a contingency." "Use the slide. Simple." "If jammed?" "We won't be stuck." "If we do?" Molly looked at him. "We improvise. Climb fast?" "I have a knee." "Stand up. It works." "I hate cardio." "Roll the bags." "Full of plush." "Cash is heavier." "Where is cash?" "Under the tree." "No cash. Only ornaments." "Trust me." "Hallucinating. Leave." "Coming?" "Calculating risk." "Stalling." Bill stood up slowly. He brushed the dirt off his pants. "Move." They crept under the inflatable dinosaur. The plastic flapped loudly in the wind. They held their breath. "Shhh," Molly hissed. "Sorry. Belt buckle." "Walk." They reached the gift station. The boy blew candles. Dark instantly. "Go," Molly said. "Now?" "Now." They grabbed gifts. They ran to the slide. The night swallowed them whole. They were safe.
created_at
"2026-02-25T22:45:26.938Z"
2/25/2026, 10:45:26 PM
model
"qwen/qwen3.5-flash-20260224"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
401
moderation_latency
(null)
generation_time
47264
tokens_prompt
77
tokens_completion
7701
native_tokens_prompt
76
native_tokens_completion
7519
native_tokens_completion_images
(null)
native_tokens_reasoning
6939
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0030152
router
(null)
provider_responses
0
endpoint_id
"5f6b71b5-8672-4b27-8b2d-f674429f9e63"
id
"chatcmpl-f98ac1d1-e53c-9742-934d-e05245e08abe"
is_byok
false
latency
401
model_permaslug
"qwen/qwen3.5-flash-20260224"
provider_name
"Alibaba"
status
200
api_type
"completions"
id
"gen-1772059526-QSRqEtvYbxrFU02wa4ON"
upstream_id
"chatcmpl-f98ac1d1-e53c-9742-934d-e05245e08abe"
total_cost
0.0030152
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
377
0.64%
Dialogue to Total Word Ratio
Ratio: 60.05%, Deviation: 10.05%
neededClean
false
wordsTotal
378
wordsDialogue
227
0.3195%