NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-27-01KQ777SZV6RB3P89BH96HTSY5
Run Details
Model:
Qwen 3.6 Flash
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
27.1s
Cost:
$0.009231
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
86.38%
Matches word count
(details)
64.59%
Dialogue to Total Word Ratio
(details)
75.4868%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Okay, the target is Timothy's seventh birthday at Jump 'n' Bounce," Molly whispered. "How many guards?" Bill hissed. "Three. A bored uncle guarding the cookie table and a ferocious toddler named Brayden with a lightsaber." "Brayden? Last time he ate my mask and broke my ankle." "We need a distraction. I'll release the rubber chicken." "Too obvious. Use the piƱata strategy." "Did you pack the glitter bombs? We can blind the parents." "Glitter is illegal in six zones. Pack the whoopee smoke." "The map says the presents are under the inflatable dinosaur. What if Brayden naps under it?" "We wake Brayden. Chaos covers the loot." "And the loot? All of it?" "Every wrapped box. I've got the wheelbarrow loaded with anti-gravity tape." "Anti-gravity? Molly, last time you used anti-gravity tape, we floated into the ceiling fan." "That was the cheap brand. This is industrial strength." "I've got the laser pointer mapped to the wrapping paper stash. But Molly, what if they're gluten-free? We can't steal non-vegan gifts." "Bill, focus! We want the Lego castle and the Switchable Power." "What if Brayden is awake?" "Then you're on crowd control with squeaky shoes." "Ready when you are, partner." "Don't call me partner. Call me the one who steals the biggest bag of loot."
created_at
"2026-04-27T10:19:55.786Z"
4/27/2026, 10:19:55 AM
model
"qwen/qwen3.6-flash"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
791
moderation_latency
(null)
generation_time
26950
tokens_prompt
77
tokens_completion
5335
native_tokens_prompt
76
native_tokens_completion
6141
native_tokens_completion_images
(null)
native_tokens_reasoning
5821
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0092305
router
(null)
provider_responses
0
endpoint_id
"2435bec9-c808-4f27-9fc8-12f3460d7b66"
id
"chatcmpl-57eff9f0-0305-9408-b938-18943064d843"
is_byok
false
latency
791
model_permaslug
"qwen/qwen3.6-flash"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777285195-f6XDXA5heMXxrSPny87A"
session_id
(null)
api_type
"completions"
id
"gen-1777285195-7e0zmmH3bfQwM9J1tnfx"
upstream_id
"chatcmpl-57eff9f0-0305-9408-b938-18943064d843"
total_cost
0.0092305
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
86.38%
Matches word count
n/a
neededClean
false
words
211
64.59%
Dialogue to Total Word Ratio
Ratio: 98.13%, Deviation: 8.13%
neededClean
false
wordsTotal
214
wordsDialogue
210
75.4868%