NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-02-25-01KJBH9Z171MQ2XKPTQ4Q39R60
Run Details
Model:
Qwen 3.5 122B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
1m 1s
Cost:
$0.023080
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her red nose in the rearview mirror. "Are you absolutely sure the clown suit is the right call for this op?" Bill tightened his grip on the steering wheel. "It is the perfect camouflage. Nobody suspects a clown until it is too late." "Unless it is a birthday party," Molly countered, eyeing the quiet suburban street. "Then they scream for their mothers immediately." "Exactly. Chaos is our greatest ally in this business." "Right. But what about the presents? We have a specific list?" "The big boxes go in the van immediately. Small items stay on the table." "Why leave the small ones behind?" "Weight. We need speed on the extraction phase." "What about the parents? They might notice." "I brought taser darts disguised as piƱata sticks." Molly blinked rapidly. "That is incredibly illegal in three states." "It is tactical. We are professionals." "And the cake? Do we sample it?" "We do not touch the cake. That is culinary sacrilege." "Good point. So, we go in at three?" "The cake cutting is at three-thirty. That is when the chaos peaks." "Okay. What is the signal?" "When the kid blows out the candles." "That is a lot of noise for a signal." "It is a distraction. It covers the sound of the boxes." "Right. And if we get caught by security?" "We say we are from the entertainment agency." "Which one?" "The one that does not exist." "Brilliant. Very professional." "I know. We are the best." "What about the Lego set? The castle?" "Top priority. That is liquid gold on the black market." "The PS5?" "Obviously. That is the main target." "What about the giant teddy bear?" "Leave it. Too heavy for the bag." "But it looks expensive." "It is polyester, Molly. We want chips and circuits." "Fair enough. I hate polyester." "Do we need masks?" "No, the clown makeup is enough." "It is not much." "It is psychological warfare." "Okay. Last question before we roll." "Shoot. Make it quick." "What if the kid cries?" "Then we give them a balloon." "A balloon?" "It is standard procedure." "Right. Standard procedure." "Let's do this." "Let's do this." "What about the escape route?" "The alleyway behind the house." "Is it clear?" "It is clear of trash cans." "Good. Trash cans are a nightmare." "We load the van in under two minutes." "Two minutes is tight." "It is doable." "What if the dog wakes up?" "I have a bag of bacon." "Bacon?" "Distraction food." "Right."
created_at
"2026-02-25T23:14:20.843Z"
2/25/2026, 11:14:20 PM
model
"qwen/qwen3.5-122b-a10b-20260224"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
487
moderation_latency
(null)
generation_time
60435
tokens_prompt
77
tokens_completion
6904
native_tokens_prompt
76
native_tokens_completion
7203
native_tokens_completion_images
(null)
native_tokens_reasoning
6580
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.02308
router
(null)
provider_responses
0
endpoint_id
"33d466d1-d2f7-45ed-8ff3-fda7d2e32351"
id
"chatcmpl-45e1a20e-ed5c-92e9-931d-a31d6222a8e5"
is_byok
false
latency
487
model_permaslug
"qwen/qwen3.5-122b-a10b-20260224"
provider_name
"Alibaba"
status
200
api_type
"completions"
id
"gen-1772061260-eRoYcJ4h0MSKwONEBxDE"
upstream_id
"chatcmpl-45e1a20e-ed5c-92e9-931d-a31d6222a8e5"
total_cost
0.02308
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
412
0.00%
Dialogue to Total Word Ratio
Ratio: 93.46%, Deviation: 23.46%
neededClean
false
wordsTotal
413
wordsDialogue
386
0.0000%