NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-25-01KMJ8J5KW1SCVKKM0T7NM7VP3
Run Details
Model:
Qwen3 235B A22B Instruct 2507
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
7.2s
Cost:
$0.000216
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
42.67%
Dialogue to Total Word Ratio
(details)
21.3370%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind a bush shaped like a cartoon unicorn, its plastic mane flickering in the breeze like a bad wig. Bill adjusted his oversized novelty sunglasses, despite it being dusk, and scratched his head with a squeaky rubber chicken. They watched the birthday fortress—a suburban split-level with streamers flapping like surrender flags—where laughter echoed like tiny banshees fueled on sugar and injustice. Balloons bobbed, suspiciously cheerful. A bounce house throbbed in the yard like a giant, pastel tumor. Molly unfurled a blueprint drawn on the back of a pizza coupon. It showed the cake table, piñata drop zone, and most importantly, the Gift Mountain: a card table sagging under poorly wrapped offerings and envelopes that jingled with grocery-store gift cards. Her finger traced a path through the snack zone, past the cotton candy machine (a known hazard—sticky fingers ruined three heists already). Bill nodded gravely, pulling a glitter-proof poncho from his duffel. He’d learned after the Great Craft Room Incident of ’22. They’d go in disguised as entertainers—Molly as a disgruntled magician’s assistant, Bill as a deflating dinosaur mascot. The exit route led through the neighbor’s yard, where an unlocked shed promised getaway cover and, hopefully, a working motor scooter. A tiny child screamed “PINATA!” and chaos erupted. Molly adjusted her fake mole and gave the signal. Bill wiggled into the dino suit, tail getting caught in a lawn sprinkler. The heist was afoot, fueled by greed, bad planning, and an alarming number of juice boxes.
created_at
"2026-03-25T10:27:37.216Z"
3/25/2026, 10:27:37 AM
model
"qwen/qwen3-235b-a22b-07-25"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
448
moderation_latency
(null)
generation_time
7211
tokens_prompt
77
tokens_completion
407
native_tokens_prompt
72
native_tokens_completion
343
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.000215878
router
(null)
provider_responses
0
endpoint_id
"813ce28a-1334-47e3-881e-a7f8833b34b4"
id
"chatcmpl-83e78c85-7348-9307-ac49-2d37bd689417"
is_byok
false
latency
448
model_permaslug
"qwen/qwen3-235b-a22b-07-25"
provider_name
"Alibaba"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1774434457-WHsFjywDg96pjlESjHdT"
upstream_id
"chatcmpl-83e78c85-7348-9307-ac49-2d37bd689417"
total_cost
0.000215878
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
251
42.67%
Dialogue to Total Word Ratio
Ratio: 0.39%, Deviation: 9.61%
neededClean
false
wordsTotal
254
wordsDialogue
1
21.3370%