NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-25-01KMJ8NANK7KZ8MSQ2A04S8M91
Run Details
Model:
Qwen3 235B A22B Instruct 2507
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
19.4s
Cost:
$0.000451
Provider:
Novita
Evaluation summary
Score
Evaluator
Details Link
75.16%
Matches word count
(details)
0.30%
Dialogue to Total Word Ratio
(details)
37.7281%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The attic smelled of mothballs and regret. Molly crouched beside a dusty vent cover, clipboard in hand, chewing the end of a pink highlighter like it owed her money. Bill balanced on an upturned bucket, adjusting a camcorder mounted on a shelf. “You’re filming through the vent?” Molly said. “It’s surveillance,” Bill corrected, squinting into the lens. “We’re professionals.” “You’re filming a Chuck E. Cheese birthday party through a ventilation shaft with a camera that still runs on AA batteries.” “It has night vision!” Bill whispered dramatically. “Also, it’s not Chuck E. Cheese. It’s ‘Bouncy Barnyard.’ Different vibe. More pigs, less rat.” Molly flipped a page. “According to my intel—gleaned from bribing a janitor with gummy worms—the party starts at 2 p.m. Parents drop kids, panic briefly, then flee to the Starbucks across the street.” “Perfect. Family abandonment works in our favor.” “And the presents are piled under the ‘Barnyard Birthday Tree’—which is a plastic sapling with googly eyes—until the end of the piñata segment.” “So we hit during the piñata,” Bill said, nodding. “Chaos. Sugar rush. Glorious distraction.” “Exactly. While kids are blindfolded and whacking a donkey shaped like a cow, we swoop in, grab the loot, and vanish.” “What if someone sees us?” “They’ll think we’re clowns.” “We’re not wearing clown makeup.” “You’re wearing that striped sweater,” Molly said, eyeing his beige and maroon monstrosity. “It’s ironic,” Bill said defensively. “Also, warm.” “Forget the sweater. You’re staying outside. You’re the lookout.” “But I practiced my magic trick!” Bill pulled a bent spoon from his pocket. “‘Now you see it, now you don’t!’” “That’s not a magic trick. That’s a used utensil.” “It’s symbolic!” “No. You’re on overwatch. I go in, you watch the door. When I text *‘The cow jumps over the moon,’* you create a diversion.” “What kind of diversion?” “I don’t know. Set off the arcade claw machine. Start a conga line. Tell the kids the ice cream truck just arrived.” “But there’s no ice cream truck.” “Exactly. Panic. Mass hysteria. We blend in.” Bill frowned. “This feels emotionally manipulative.” “It’s psychology, Bill. We’re exploiting the inherent instability of birthday parties. One kid starts crying and the whole barn implodes.” “What if the robot chicken starts singing?” “It always sings,” Molly said. “That’s the point. Singing chicken equals zero attention on us.” “And the loot? What kind of presents are we talking?” “According to the Facebook event: ‘No toys please, gift cards only.’ But parents never listen. I saw three kids dragging unopened boxes wrapped in dinosaur paper.” “CASH CONVERTIBLE,” Bill whispered. “We could fund Operation: Fake Retirement in Belize.” “No fake retirements. We save for something meaningful.” “Like… more tools?” “Exactly.” Bill adjusted his bucket. “So I’m outside. You’re in. Piñata chaos. Text. Diversion. Exit?” “Clean, simple, emotionally devastating for six-year-olds.” Bill paused. “...Do you think we’re bad people?” “Only between 2 and 2:45 on Saturdays,” Molly said, snapping her clipboard shut. “Now go practice looking inconspicuous near a ball pit.” “But I love ball pits.” “That’s your problem,” Molly sighed. “And probably your therapist’s.”
created_at
"2026-03-25T10:29:20.695Z"
3/25/2026, 10:29:20 AM
model
"qwen/qwen3-235b-a22b-07-25"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1018
moderation_latency
(null)
generation_time
19418
tokens_prompt
77
tokens_completion
987
native_tokens_prompt
72
native_tokens_completion
767
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00045134
router
(null)
provider_responses
0
endpoint_id
"cf186489-d252-4fb7-aca9-87ef7b557eff"
id
"71a586406f7f5114a8398c9a07603444"
is_byok
false
latency
1018
model_permaslug
"qwen/qwen3-235b-a22b-07-25"
provider_name
"Novita"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1774434560-gDpgep9lMVaMN8MhOmsP"
upstream_id
"71a586406f7f5114a8398c9a07603444"
total_cost
0.00045134
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Novita"
Evaluation details
Result
Evaluator
Details
Meta Data
75.16%
Matches word count
n/a
neededClean
false
words
513
0.30%
Dialogue to Total Word Ratio
Ratio: 80.78%, Deviation: 10.78%
neededClean
false
wordsTotal
515
wordsDialogue
416
37.7281%