NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-25-01KMJ8JHC0PMSWSQNFW46HR6S6
Run Details
Model:
Qwen3 235B A22B Instruct 2507
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
16.7s
Cost:
$0.000089
Provider:
WandB
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0004%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The basement smelled of mildew, half-eaten peanut butter sandwiches, and the faint, lingering despair of a deflated balloon. Molly tapped her knuckles against the chipped cinderblock wall, squinting at a crudely drawn map pinned to the wall with pushpins shaped like tiny dinosaurs. “We go in through the back gate,” she muttered, tracing a path with her pinky. “Distract the clown—crucial. Never trust a clown. They’re trained in subterfuge. Look at Bozo. Look at the guy from Stephen King’s *It*. Coincidence? No.” Bill, seated on an overturned bucket labeled “Dad’s Failed Tomato Dreams,” slurped from a thermos of lukewarm coffee. “Molly, it’s a four-year-old’s birthday. The clown’s named Bubbles. His only skill is honking his nose while crying softly in the corner.” “He could be a sleeper agent,” she insisted, unfazed. “Remember the Great Candy Heist of ‘09? That gumdrop smuggling ring in Jersey? They used mimes.” “Mimes don’t talk,” Bill said, “and neither do I when you bring up New Jersey.” “That’s precisely why they’re dangerous,” Molly whispered, leaning in. “Silence is their weapon.” Bill sighed, long and theatrical. “We’re stealing birthday presents from a bunch of toddlers, Molly. If we get caught, we’ll be mocked by people in footie pajamas.” “They won’t even notice!” Molly snapped. “They’ll be too busy screaming or projectile vomiting cake. That’s our window—chaos.” “Or,” Bill offered, “we wait until nap time. Less risk, zero mimes.” “No naps! Timing is everything. The presents come out at 3:15, after the pinata and before the 'Happy Birthday' sing-off—which, I’ve learned from meticulous stalking of Pinterest boards, always ends in full-blown emotional collapse for at least three children.” “You really know your toddler psychology.” “I’ve studied the enemy,” Molly said, adjusting her utility belt—customarily designed for jewel thieves, now holding a juice box, a bag of goldfish, and a mini-flashlight shaped like a unicorn. Bill scratched his beard. “Fine. We go in at 3:10. Distract Bubbles with a whoopee cushion?” “No,” Molly said, eyes gleaming. “We weaponize cuteness.” “Ah. The puppy plan.” “Exactly. I’ll bring Muffins.” “Your taxidermied squirrel?” “He’s not taxidermied. He’s… preserved. And he’s adorable.” “He looks like he died mid-sneeze.” “He died with dignity,” Molly said, clutching her chest. “And he’ll charm the socks off that clown. While Bubbles is distracted by Muffins’s tragic backstory—*how he saved four orphaned acorns*—you sneak in and loot the gift table.” “And if the kids see me?” “Tell them you’re the ice cream man. Kids love ice cream. They also love secrets. Whisper, ‘Santa sent me,’ and hand out fake coupons. They’ll be too dazzled to notice their Paw Patrol backpacks are gone.” Bill stared into the middle distance. “I still think this is the dumbest plan since the Great Sock Puppet Uprising.” “It’s not dumb,” Molly said, “it’s *innovative*. By dawn, we’ll be rich in toys, snacks, and possibly a few glitter-glue masterpieces. Who knows? We could start a black-market daycare.” “And if we get arrested?” Molly shrugged. “We plead temporary insanity. It’s socially accepted during holiday seasons.” Bill groaned, standing up. “Fine. But if I get tackled by a child in a T-Rex costume, I’m blaming you, the squirrel, and the inherent unpredictability of snack time.”
created_at
"2026-03-25T10:27:49.253Z"
3/25/2026, 10:27:49 AM
model
"qwen/qwen3-235b-a22b-07-25"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
83
moderation_latency
(null)
generation_time
16681
tokens_prompt
77
tokens_completion
966
native_tokens_prompt
72
native_tokens_completion
820
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0000892
router
(null)
provider_responses
0
endpoint_id
"c9f927db-d559-49eb-9846-80862c5e3b4e"
id
"chatcmpl-9b258a515ed84d50"
is_byok
false
latency
29
model_permaslug
"qwen/qwen3-235b-a22b-07-25"
provider_name
"WandB"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1774434469-p2dXOnP2MV9LRHwnZhGs"
upstream_id
"chatcmpl-9b258a515ed84d50"
total_cost
0.0000892
cache_discount
(null)
upstream_inference_cost
0
provider_name
"WandB"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
533
0.00%
Dialogue to Total Word Ratio
Ratio: 74.40%, Deviation: 44.40%
neededClean
false
wordsTotal
543
wordsDialogue
404
0.0004%