NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-25-01KMJ8VE44RQA8M6PP34DNYTC3
Run Details
Model:
Qwen3 235B A22B Instruct 2507
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
14.3s
Cost:
$0.000083
Provider:
WandB
Evaluation summary
Score
Evaluator
Details Link
0.21%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.1070%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The alley behind Chuckles & Balloons Party Palace stank of wet confetti and expired cake frosting. Molly crouched behind a dumpster painted with a grinning cartoon clown, binoculars pressed to her eyes. Her knuckles were scraped, her beanie was crooked, and she had glitter in places glitter had no business being. Bill, meanwhile, stood three feet away, trying to blow up a balloon with a faulty pump. It made a sad, dying goose noise each time he squeezed the handle. “Stop that,” Molly hissed, not lowering the binoculars. “You’re going to summon some kid with separation anxiety.” “I’m getting into character,” Bill said, puffing his cheeks. “I’m a party pro. Balloon animals are my specialty. Especially giraffes. And also that one weird squid thing.” “You can’t even make the balloon inflate.” “That’s what she said.” Molly lowered the binoculars and stared at him. “You’re not getting near a child today, Bill. Not with that face. You look like a raccoon who’s seen too much.” “My charm is undeniable,” Bill said, tossing the deflated balloon aside. “Besides, I’ve studied. I can do the chicken dance. I can say 'pin the tail on the donkey' in five languages. I even have a party hat.” He produced a crumpled paper cone from his coat. “See?” “You look like a sad traffic cone having an existential crisis.” “Hey, respect the hat. It’s vintage.” Molly checked her watch. “Look, kids start arriving in ten minutes. We’re not here to win Best Dressed at Clown Con. We’re here for the loot. Birthday presents stack up like bricks in that back room during the magic show. We hit it then. You distract, I grab.” “And how, pray tell, am I supposed to distract a bunch of sugar-crazed toddlers?” “You’re the expert,” Molly said. “Improvisation. Juggling. Pretend the cake is haunted.” “I like it. Haunted cake routine—classic.” “They’re children, Bill, not film critics.” “But the timing has to be perfect. I can’t just burst in yelling ‘The cupcakes are possessed!’ People will suspect.” “Then be subtle. Say the sprinkles are screaming. Kids love that kind of nonsense.” Bill scratched his chin. “What if I go full puppet? I bring a sock, put a face on it, call him Greg.” “Greg the sock.” “Greg the Haunted Sprinkle Prophet. He delivers the warning.” Molly blinked. “...You might be onto something.” “Told you. I’ve got range.” “And you won’t eat any of the candy?” “Cross my heart. Though... can I keep one whoopee cushion? For sentimental reasons?” “No one needs that much sentiment.” A car horn chirped in the distance. They both froze. “That’s the first parent,” Molly whispered. “Showtime. Remember—chaos, not carnage. And no adopting any kids.” “I make no promises,” Bill said, pulling out a ukulele from behind the dumpster. “Some children have excellent rhythm.” Molly rolled her eyes and tugged the binoculars back up. “Just keep them busy. I want every gift bag, every wrapped monstrosity, every glittery envelope addressed to ‘Timmy (but probably for Jenny).’ And Bill?” “Yeah?” “If you play ‘Baby Shark,’ so help me, I’ll leave you handcuffed to the piñata.” Bill grinned, tuning a string. “But it’s in four languages…”
created_at
"2026-03-25T10:32:40.84Z"
3/25/2026, 10:32:40 AM
model
"qwen/qwen3-235b-a22b-07-25"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
83
moderation_latency
(null)
generation_time
14246
tokens_prompt
77
tokens_completion
914
native_tokens_prompt
72
native_tokens_completion
757
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0000829
router
(null)
provider_responses
0
endpoint_id
"c9f927db-d559-49eb-9846-80862c5e3b4e"
id
"chatcmpl-906f272ebfe984ec"
is_byok
false
latency
30
model_permaslug
"qwen/qwen3-235b-a22b-07-25"
provider_name
"WandB"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1774434760-6GZdTgdeWpLDRQ5uBe41"
upstream_id
"chatcmpl-906f272ebfe984ec"
total_cost
0.0000829
cache_discount
(null)
upstream_inference_cost
0
provider_name
"WandB"
Evaluation details
Result
Evaluator
Details
Meta Data
0.21%
Matches word count
n/a
neededClean
false
words
528
0.00%
Dialogue to Total Word Ratio
Ratio: 69.00%, Deviation: 39.00%
neededClean
false
wordsTotal
529
wordsDialogue
365
0.1070%