NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-25-01KMJ8EQA6ASPNYT1GRRZGV5SQ
Run Details
Model:
Qwen3 235B A22B Instruct 2507
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
9.3s
Cost:
$0.000087
Provider:
WandB
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Snowflakes spiraled from the iron-gray sky, dusting the pavement outside the Bright Sprout Learning Center like powdered sugar on a failed soufflé. Molly adjusted her reindeer ears—felt, lopsided, one eye inexplicably winking—and squinted through the fogged window. Inside, a glitter-covered piñata shaped like a smiling unicorn dangled precariously above a sea of juice boxes and existential dread. Bill knelt beside her, tugging at the hem of his oversized elf costume. “You’re sure it’s just toddlers? Because last time you said ‘just a bake sale,’ there was a SWAT team and a woman with a tactical rolling pin.” Molly didn’t turn. “No SWAT. No tactical pastries. Just twelve kids, three harried parents, and approximately forty-three presents wrapped in cartoon animals judging us from beyond the paper.” “Forty-three? That’s not a heist. That’s a logistical nightmare.” “It’s efficiency, Bill. No alarms, no safes, no moral quandaries—kids don’t appreciate gifts past bedtime. By 5 p.m., these toys will be either abandoned or used as projectile weapons. We’re doing society a favor.” Bill scratched his neck, where the itchy elf collar was already raising a rebellion. “And the distraction? You said you had a distraction.” Molly grinned, pulling a Bluetooth speaker from her oversized Santa sack. “Troll music. Peak chaos. Two minutes of ‘It’s a Troll World’ and every adult will be weeping in the corner, questioning their life choices.” “That’s psychological warfare.” “Exactly.” They crouched behind a snowbank shaped suspiciously like a disgruntled badger. Bill unfolded a crinkled blueprint drawn on the back of a pizza menu. “You realize they’ve got a bounce house? That’s basically a trap. One wrong step and we’re launched into low orbit.” “We go in during the cake ceremony. Sugar high meets sensory overload. They’ll be too busy licking frosting off the ceiling to notice two slightly sweaty Christmas helpers redistributing wealth.” “What if someone recognizes us?” Molly handed him a lollipop. “Suck on this. Gives you that ‘harmless idiot’ aura. Works every time.” Bill popped it in. “Still think we should’ve gone with the ice cream truck angle.” “And miss the sheer poetry of Santa’s disgraced cousins cleaning out the loot? No. This is elegant. This is art.” A muffled cheer erupted from inside as someone—probably a child named Kevin—set fire to a napkin with a birthday candle. Molly stood, brushing snow off her polyester pants. “Showtime. Remember: in, grab, out. No emotional attachments. That talking dinosaur has been calling my name for weeks but we can’t get sentimental.” “I can’t promise anything. I’ve had dreams about that rainbow unicorn.” “No bonding with the merchandise, Bill. That’s how we end up fostering a sentient rocking horse again.” They crept toward the side entrance, where a balloon arch sagged under winter’s betrayal. Molly pulled out a hairpin, working the lock with the precision of someone who’d picked worse under sniper fire. Bill whispered, “Wait. What’s the code word if we get caught?” “*Pew-pew.*” “That’s not a code word. That’s a sound effect.” “It signals urgency and confusion. Now hush. I hear ‘Baby Shark’ starting. That’s our cue.” The door clicked open. They slipped inside just as a toddler in a superhero cape launched a plastic truck at the DJ’s head. Molly adjusted her reindeer antlers with grim determination. “Operation Naughty List is a go.” Bill sighed, clutching his lollipop like a talisman. “I still think we’re being judged by the stuffed animals.”
created_at
"2026-03-25T10:25:44.266Z"
3/25/2026, 10:25:44 AM
model
"qwen/qwen3-235b-a22b-07-25"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
67
moderation_latency
(null)
generation_time
9212
tokens_prompt
77
tokens_completion
997
native_tokens_prompt
72
native_tokens_completion
793
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0000865
router
(null)
provider_responses
0
endpoint_id
"c9f927db-d559-49eb-9846-80862c5e3b4e"
id
"chatcmpl-9c029f8766cd5951"
is_byok
false
latency
30
model_permaslug
"qwen/qwen3-235b-a22b-07-25"
provider_name
"WandB"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1774434344-786vDz1WgM4Rp2MimLB9"
upstream_id
"chatcmpl-9c029f8766cd5951"
total_cost
0.0000865
cache_discount
(null)
upstream_inference_cost
0
provider_name
"WandB"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
569
0.00%
Dialogue to Total Word Ratio
Ratio: 58.22%, Deviation: 28.22%
neededClean
false
wordsTotal
572
wordsDialogue
333
0.0000%