NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ6YZXMP2SRD0JKM2FYGQTXJ
Run Details
Model:
Qwen 3.5 Plus (2026-04-20)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
2m 1s
Cost:
$0.016938
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
99.38%
Matches word count
(details)
92.87%
Dialogue to Total Word Ratio
(details)
96.1229%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly slammed the map on the hood. "Bill, listen closely. The fortress is a bouncy castle. The guard is a clown named Squeaky. This is gold." Bill winced audibly. "Molly, they're five-year-olds. We're stealing glitter bombs from toddlers? This lowers our standards." "The loot! A trampoline? Heavy assets! Also, the manifest lists a complete Millennium Falcon." "Six figures? You're lying." "The collector market is insane, Bill. We're funding our retirement on plastic bricks." "We stole the Emerald Cockroach from a moving train!" "Exactly. Now, how do we get past the dad guarding the cooler? We don't have invitations." "We have the suits. You're the magician. I'm the balloon artist. We hit the table, bag and drag. Standard heist procedure." "Wait, what about the goody bags?" "We're monsters, not ghouls. The goody bags stay. Also, the juice boxes are premium. Grab the grape ones. Bill, focus! Exit before the piƱata breaks. If a kid sees us, you do a card trick and I forge a poodle from chrome foil." "What if the poodle explodes?" "Then we improvise. Also, grab the sparklers. Fire hazard, but fun. Check your pockets for tape and grab the top hat. Love you too, you terrible partner. Let's steal a six-year-old's joy."
created_at
"2026-04-27T07:56:02.957Z"
4/27/2026, 7:56:02 AM
model
"qwen/qwen3.5-plus-20260420"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
689
moderation_latency
(null)
generation_time
106337
tokens_prompt
77
tokens_completion
6311
native_tokens_prompt
76
native_tokens_completion
7045
native_tokens_completion_images
(null)
native_tokens_reasoning
6728
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0169384
router
(null)
provider_responses
0
endpoint_id
"8b4a0fbb-6705-4f8d-a7f1-ad18ecc0edbc"
id
"chatcmpl-a24d65f8-2138-9794-87e5-07d2cc8625d7"
is_byok
false
latency
689
model_permaslug
"qwen/qwen3.5-plus-20260420"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777276562-dfsAOLaPrvydiTXGiDmX"
session_id
(null)
api_type
"completions"
id
"gen-1777276562-DPCREXmcsdS7oSk5BEp8"
upstream_id
"chatcmpl-a24d65f8-2138-9794-87e5-07d2cc8625d7"
total_cost
0.0169384
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
99.38%
Matches word count
n/a
neededClean
false
words
205
92.87%
Dialogue to Total Word Ratio
Ratio: 95.22%, Deviation: 5.22%
neededClean
false
wordsTotal
209
wordsDialogue
199
96.1229%