NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-25-01KMJ8CAAGA8E5QANZ5X0ZQ9WE
Run Details
Model:
Qwen3 235B A22B Instruct 2507
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
24.0s
Cost:
$0.000621
Provider:
Novita
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.11%
Dialogue to Total Word Ratio
(details)
0.0564%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The attic smelled of mothballs and regret. Molly crouched beside a dusty skylight, peering down through the smudged glass at the chaos below: a jungle gym shaped like a dragon, a clown juggling rubber chickens, and a piñata shaped like a unicorn shedding glitter. Bill stood behind her, nervously adjusting a snorkel over his ears. Molly: “Bill. The snorkel stays. Take it off.” Bill: “I need it. What if there’s punch? Splash zone’s no joke.” Molly: “You’re not swimming through fruit punch. You’re stealing presents at a six-year-old’s birthday party.” Bill: “Exactly. It’s a warzone.” Molly: “It’s cupcakes and karate lessons.” Bill: “Last week I got hit in the eye with a foam Nunchuck. I still see stars when I sneeze.” Molly: “You sneeze because you’re allergic to glitter, Bill. Take the snorkel off. You look like a depressed sea mammal.” Bill: “The glitter is airborne, Molly. It’s a hazard.” Molly: “So’s your fashion sense.” Bill: “I resemble a professional.” Molly: “You resemble a man who confused a scuba store with a stealth warehouse.” Bill: “I’ll have you know, black neoprene is excellent camouflage. Especially against bounce house walls.” Molly: “We’re not hiding in bounce houses. We’re getting in, grabbing the presents, and getting out. Quiet. Efficient. No snorkels.” Bill: “What if the birthday boy opens the treasure chest early?” Molly: “Then we move up the timetable.” Bill: “And risk exposure?” Molly: “Bill, the treasure chest is full of dollar-store puzzles and glow-in-the-dark yo-yos.” Bill: “Still—security risk.” Molly: “The only security here is a woman named Linda who’s been chain-drinking lemonade since noon.” Bill: “You don’t know what she’s capable of.” Molly: “She spilled cake on a teddy bear and cried.” Bill: “Distraction technique?” Molly: “Bill.” Bill: “What? People crack under pressure.” Molly: “You’re overthinking. It’s a party, not Fort Knox.” Bill: “Fort Knox doesn't have a dunk tank. Ever think about that?” Molly: “No.” Bill: “I do. Because I’ve been dunked. Twice.” Molly: “Were you trying to steal from a birthday party then?” Bill: “No, but—” Molly: “No buts. We go in at 2:15, right after the magician makes a bunny disappear. Distracted parents, exhausted kids, prime window.” Bill: “What if the bunny doesn’t disappear?” Molly: “Then we improvise.” Bill: “What if he’s fake? What if the bunny’s complicit?” Molly: “The bunny’s not complicit.” Bill: “Bunnies are excellent at deception. They’re quiet. They judge you.” Molly: “Focus. In and out. Gifts only. No snacking from the snack table.” Bill: “I won't promise anything. Those mini quiches look suspiciously gourmet.” Molly: “You touch one quiche and you’ll be the next missing bunny.” Bill: “I’m just saying, if I faint from low blood sugar during extraction—” Molly: “Then faint quietly.” Bill: “And the piñata? It’s full of candy. Candy’s basically currency in preschool economies.” Molly: “We’re here for the loot, not the loot-loot.” Bill: “There’s a difference?” Molly: “Yes. The wrapped boxes. The brand names. The gift receipts.” Bill: “But the candy’s got wrappers too!” Molly: “No candy.” Bill: “What if I just take the unicorn one? For morale.” Molly: “No.” Bill: “What if I promise to share?” Molly: “Still no.” Bill: “I’ll let you have the chocolate one.” Molly: “Bill.” Bill: “Fine. But if I start hallucinating from sugar withdrawal, don’t blame me when I try to ride the slide like a dragon.” Molly: “Just remember the plan. In. Out. No snorkel. No snacks. No side quests.” Bill: “But what if the clown knows something?” Molly: “The clown’s name is Gary. He drives a minivan. He’s got three kids and a back problem.” Bill: “Exactly. Motive.” Molly: “Gary’s not our problem.” Bill: “He’s smiling too much.” Molly: “It’s his job.” Bill: “People have killed for less.” Molly: “Gary hasn’t killed anyone. Now grab the bag and let’s move.” Bill: “You sure about the snorkel?” Molly: “I’ll be sure when it's in the dumpster. Let’s go.”
created_at
"2026-03-25T10:24:25.429Z"
3/25/2026, 10:24:25 AM
model
"qwen/qwen3-235b-a22b-07-25"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1313
moderation_latency
(null)
generation_time
23870
tokens_prompt
77
tokens_completion
1237
native_tokens_prompt
72
native_tokens_completion
1059
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0006207
router
(null)
provider_responses
0
endpoint_id
"ae57827e-bc4d-483b-abe6-fa519c237237"
is_byok
false
latency
52
model_permaslug
"qwen/qwen3-235b-a22b-07-25"
provider_name
"Crusoe"
status
429
1
endpoint_id
"cf186489-d252-4fb7-aca9-87ef7b557eff"
id
"c1e005992d91efbfd1e8965350cec029"
is_byok
false
latency
1313
model_permaslug
"qwen/qwen3-235b-a22b-07-25"
provider_name
"Novita"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1774434265-zOdkDLsGOrIa4lf66dzB"
upstream_id
"c1e005992d91efbfd1e8965350cec029"
total_cost
0.0006207
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Novita"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
645
0.11%
Dialogue to Total Word Ratio
Ratio: 81.65%, Deviation: 11.65%
neededClean
false
wordsTotal
654
wordsDialogue
534
0.0564%