NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ6ZFR624ZZR1AJN4VPZCKRE
Run Details
Model:
Qwen 3.5 Plus (2026-04-20)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
2m 31s
Cost:
$0.024086
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
60.28%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
30.1376%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The cramped apartment smelled of stale coffee and questionable life choices. Bill hunched over the scarred wooden table, aligning toy soldiers along a crayon-drawn floor plan while Molly paced the linoleum floor in worn sneakers. She dragged a chair closer, kicked a stack of unopened mail aside, and finally sat down to review their amateurish blueprint. The afternoon light filtered through dusty blinds, illuminating floating dust motes that danced like tiny accomplices. Are you certain the perimeter is just a white fence and a yapping terrier? Bill asked. Worse. A trampoline moat and a mother with a clipboard. She checks gift receipts before toddlers can untie bows. Trampoline moat means we bounce. Clipboard mother means we require a cover story. We pose as hired entertainment. I arrive as an enthusiastic balloon sculptor. You follow with the giant clown horn. I refuse to honk. Honking draws attention. Attention guarantees sirens. Fine. You carry the reinforced canvas sacks. We swap the merchandise for mulch bags during cake service. Mulch creates dust. Dust triggers inhalers. Inhalers summon adults. Adults summon the police chief. What would you recommend, veteran? What about the schedule? The parents announce a singalong at three. They sing while toddlers throw punches. We cannot operate during noise. Noise draws neighbors. Neighbors call protective services. We need silence after juice boxes. The boxes expire at two forty. We gain twenty minutes. Twenty minutes is an eternity. Professionals do not panic. Professionals count exits. Professionals wear gloves to avoid leaving prints. Gloves reduce dexterity. Reduced dexterity increases wrapping noise. Increased noise wakes toddlers. Toddlers cry. Mothers carry phones. Phones dial dispatch. Fine. We go barehanded. We accept the risk. We trust our training. We trust each other. We leave before the magician. Magicians carry rabbits. Rabbits carry fleas. Fleas ruin everything. We substitute the merchandise with identical cardboard shells wrapped in metallic paper. We withdraw during the piƱata rotation. Controlled chaos masks our retreat. Molly stood up, erased a crooked arrow, and redrew it with deliberate precision. She tapped the marker twice against her palm and waited for his approval. Excellent. But what about the glitter cannon? Glitter is an ecological hazard. It permeates carpet fibers. It enters your lungs. We need industrial filtration and shop vacuums. I brought the heavy machinery. It hums quietly. It devours tissue paper and shattered dreams. Bill adjusted his glasses and tapped the table with a wooden soldier. And the terrier? We purchase three smoked bones. He collapses by minute four. The gate hinges inward. We retreat backward. That feels completely wrong. We drill tonight. We pack the hose, the shells, and the bones. You rehearse the animals. I map the escape. Tonight. Midnight. Absolutely no honking. Pack the mulch. Wait. The shells. Correct. Pack the shells. Understood. We execute before the paper hats distribute. We execute while guardians gulp lukewarm punch. May the nap cycle shield our footsteps. May the tape guns never jam. Molly closed the blueprint carefully. Bill straightened his cardigan and checked his watch. They packed the gear in silence. The planning phase concluded. The heist began.
created_at
"2026-04-27T08:04:27.465Z"
4/27/2026, 8:04:27 AM
model
"qwen/qwen3.5-plus-20260420"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
705
moderation_latency
(null)
generation_time
150997
tokens_prompt
77
tokens_completion
10666
native_tokens_prompt
76
native_tokens_completion
10023
native_tokens_completion_images
(null)
native_tokens_reasoning
9303
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0240856
router
(null)
provider_responses
0
endpoint_id
"8b4a0fbb-6705-4f8d-a7f1-ad18ecc0edbc"
id
"chatcmpl-5df04022-4ee5-9808-9e72-29401845060f"
is_byok
false
latency
705
model_permaslug
"qwen/qwen3.5-plus-20260420"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777277067-pY3EphFtkkXrwiKhla0P"
session_id
(null)
api_type
"completions"
id
"gen-1777277067-MMqN06bKHZXoWhD2Dl3G"
upstream_id
"chatcmpl-5df04022-4ee5-9808-9e72-29401845060f"
total_cost
0.0240856
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
60.28%
Matches word count
n/a
neededClean
false
words
515
0.00%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 70.00%
neededClean
false
wordsTotal
516
wordsDialogue
0
30.1376%