NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-02-25-01KJBDHSYCSA3RKT025J95KBQD
Run Details
Model:
Qwen 3.5 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
40.1s
Cost:
$0.003133
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The latch is rusted. If we don't crack it, the cake is cut. Does a criminal record include burning cake? Bill asked. I read the statutes. We need the presents. There are fifty. Fifty distinct packages. He poked the wood. This is a heist, not a baking class. Just pick the lock. What if I break the key? Then break the window. I do not want blood on the glass. Blood makes a great alibi. I cannot run with blood. Then do not run. Hide in the gift pile. They will look there. That is the point. Bill leaned back. My feet hurt. Sit taller. I am slouching. That is illegal? No. The window is open. I see that. I need a distraction. The dog is asleep. I do not believe in sleeping dogs. Just jump. He might wake. Say you have treats. Do we have treats? No. Then we are dead. The ladder is light. Bill held it. It is toy plastic. It is not a toy ladder. It is very light. Throw it. Too heavy. Throw it anyway. Bill tossed it. It landed on the porch. It hurts. Who cares? Molly pointed. Get the gifts. Which ones? All of them. They are wrapped. That makes it faster. Do you hear singing? Bill asked. No. The singing started. Happy Birthday to us. We are not happy. They sing for the kids. I want to be a kid. You are thirty. Thirty is new twenty. Just grab them. Bill took five boxes. My arms are short. Use your legs. That is not how it works. He dropped a box. They broke. Wrap them again. We do not have wrap. Wrap yourself. I am not going. Go. Bill climbed through. This is not a plan. The plan is the ladder. The ladder is gone. We adapt. He held five boxes. How do I leave? Slide down the roof. I hate roofs. Roofs are for birds. I am not a bird. Okay. Tie this. Molly handed a rope. Here? Down. Bill tied his feet. If I fall, I break. If you fall, you break. Go. Bill climbed out. Wait. The dog is awake. He is chasing us. Run. They ran. That was easier. I dropped them. We have time. Time is not money. We are thieves. Yes. We failed. Never steal from a birthday party. Good advice. Bill looked at the bags. Wait. What? There is a toy car. No there is not. It is right here. It is a toy. It is a real car. Nope. Bill drove away. I am going home. We failed. We have the time.
created_at
"2026-02-25T22:08:43.475Z"
2/25/2026, 10:08:43 PM
model
"qwen/qwen3.5-flash-20260224"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
382
moderation_latency
(null)
generation_time
39972
tokens_prompt
77
tokens_completion
7224
native_tokens_prompt
76
native_tokens_completion
7813
native_tokens_completion_images
(null)
native_tokens_reasoning
7232
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0031328
router
(null)
provider_responses
0
endpoint_id
"5f6b71b5-8672-4b27-8b2d-f674429f9e63"
id
"chatcmpl-3ac0c079-a623-963a-963a-6a3738911c3b"
is_byok
false
latency
382
model_permaslug
"qwen/qwen3.5-flash-20260224"
provider_name
"Alibaba"
status
200
api_type
"completions"
id
"gen-1772057323-Of6OenG45cjFPj3OK9rY"
upstream_id
"chatcmpl-3ac0c079-a623-963a-963a-6a3738911c3b"
total_cost
0.0031328
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
438
0.00%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 70.00%
neededClean
false
wordsTotal
438
wordsDialogue
0
0.0000%