NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-02-25-01KJBJJCWW21FPYRSQA9J15T71
Run Details
Model:
Qwen 3.5 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
1m 34s
Cost:
$0.004206
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
20.19%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
10.0948%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Bill wiped the rain from his glasses while staring at the house below. It was glowing softly like a giant candy wrapper in the dark sky. The music drifted up from the yard loudly. It sounded like children singing about happiness and joy and candy. Molly checked her watch and sighed deeply. She shook her head at Bill slowly. The rain fell even harder. They needed to move quickly. The bag was heavy. The gloves were red. The plan was simple. The house was secure. Are you sure we should take this risk at all, Bill? It feels very dangerous for us. I am not sure, but we cannot wait forever. They might move the gifts to a safe place. This is too much pressure on us, honestly. We are just stealing toys for fun. We are professionals. We do this every night to earn money. Not every night. Sometimes every night. The neighbors will see us. They will call the police right now. The neighbors are sleeping. They are not awake. They have guards. Security is weak here. We need a ladder to climb. The fence is low. Molly nodded. She looked at the back door. It was open now. The lock was broken. Are you coming in or staying out here with me? Molly asked. I am coming in. Give me a hand with this bag. Here is the bag. Watch your step on the floor. I always watch my step. I am very careful. You are clumsy sometimes. I am not clumsy. They moved inside the house. The floor was carpeted. It was soft and quiet. The lights were low. How many bags do we need to fill eventually? Bill asked. We need three bags total. Not more than that. Can we fit them all inside? Yes. There are many presents. What if the parents wake up? They will not wake up. We are stealing from kids. It is just business for us. It feels a bit wrong. It is our job. They filled the first bag. It was very heavy. The toys rattled. This is heavy. Bill said. It is full already. I think I have enough. Finish it. They moved to the second bag. It was lighter. The toys were soft. Is this a doll? It is a doll. Put it in. Okay. They moved to the third bag. It was full. This is last. Yes. Are we done now? Yes. Let us go. Okay. They got in the car. The engine started. Did we get it all? Yes. I am hungry. Go home. Yes. The rain stopped. The road was wet. Where do we go next time? To the mall. Why the mall? Because the gifts are there. Okay. Do we have a plan? Yes. We plan. Where is the plan? I have it. Show it. No. Okay. We have the car. Yes. The car is fast. Yes. We can drive fast. Yes. Lets drive. Okay. The moon was high. The house was dark. The presents were gone. The kids were sad. We were safe. We went home. The night was quiet. We won.
created_at
"2026-02-25T23:36:25.761Z"
2/25/2026, 11:36:25 PM
model
"qwen/qwen3.5-flash-20260224"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
460
moderation_latency
(null)
generation_time
94038
tokens_prompt
77
tokens_completion
10304
native_tokens_prompt
76
native_tokens_completion
10497
native_tokens_completion_images
(null)
native_tokens_reasoning
9762
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0042064
router
(null)
provider_responses
0
endpoint_id
"5f6b71b5-8672-4b27-8b2d-f674429f9e63"
id
"chatcmpl-17db71d3-eca6-9e96-b59a-aea64f936595"
is_byok
false
latency
460
model_permaslug
"qwen/qwen3.5-flash-20260224"
provider_name
"Alibaba"
status
200
api_type
"completions"
id
"gen-1772062585-NYjlahFWkOJ8sdbhgZh1"
upstream_id
"chatcmpl-17db71d3-eca6-9e96-b59a-aea64f936595"
total_cost
0.0042064
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
Evaluation details
Result
Evaluator
Details
Meta Data
20.19%
Matches word count
n/a
neededClean
false
words
520
0.00%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 50.00%
neededClean
false
wordsTotal
520
wordsDialogue
0
10.0948%