NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-25-01KMJ8SFYWWY5YK0YFG3Q7BEHS
Run Details
Model:
Qwen3 235B A22B Instruct 2507
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
50.6s
Cost:
$0.000470
Provider:
SiliconFlow
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
5.92%
Dialogue to Total Word Ratio
(details)
2.9625%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The alley behind the bouncing castle snaked between a petting zoo and a suspiciously sticky carousel. Molly crouched behind a dumpster labeled “Only Slightly Used Balloons,” binoculars pressed to her face, scanning the chaos of squealing toddlers and over-caffeinated parents at Timmy’s 7th Birthday Bonanza. Bill adjusted his trench coat, the kind usually seen on noir detectives, not men planning to rob a piñata. Molly lowered the binoculars. “We’re clear. Parents are distracted. One’s trying to free her shoe from the moon bounce entrance. Another is sobbing into a cupcake. It’s beautiful.” Bill scratched his chin. “You sure about this? Kids? Stealing from kids? That’s low, Molly. Even for us.” “Low? Bill, this is peak efficiency. Look at that pile of presents. Three-foot-high, wrapped in cartoon dinosaurs. Each one probably contains a remote-control something or an art kit they’ll lose interest in by Tuesday. We’re doing them a favor—recycling underutilized joy.” “You’re calling stealing ‘recycling’ now?” “Call it redistribution. Capitalism with a conscience. Also, I heard there’s a limited-edition glow-in-the-dark T-Rex in there. Worth at least six hundred on eBay. Kids don’t appreciate vintage dinos.” Bill groaned. “I still say we should’ve stuck to banks. At least banks don’t have face paint.” “Banks have alarms. Kids have juice boxes and nap schedules. We hit hard, we hit fast, during the cake sing. That’s when they’re all distracted by candlelight and forced participation.” “You expect me to crawl through a foam ball pit again?” “Only halfway. Then you emerge near the gift table like some kind of party horror. But cuter.” “I’m not cute!” “You wore a bowtie to rob a credit union last month.” “That was strategic!” “It made you look like a confused penguin.” Bill muttered. “I’m only doing this because you owe me from the squirrel incident.” “Which I’ve apologized for twelve times. And in writing!” “With crayon.” “It was heartfelt!” A balloon exploded nearby. Both flinched. Molly checked her watch. “Cake time in five. You ready?” “I had stage fright in kindergarten. During show and tell.” “That was thirty years ago, Bill.” “I showed up with a dead frog. No one clapped.” “This isn’t show and tell. This is felony-level gift acquisition. Just grab the pile and run. And for God’s sake, don’t make eye contact with any kids.” “What if they *sing* at me?” “They won’t. They’ll be too busy chewing frosting off the walls.” “What about the clown?” “The clown’s in the porta-potty. I saw him go in with a flask.” “You’re sure?” “I’ve been surveilling this party since Thursday. I know the ice cream man’s secret stash. I know which child cries when the music stops. I know where Timmy hides his good Legos.” Bill sighed. “You’re terrifying.” “Flattery won’t get you out of this.” A chorus of off-key voices began to rise from the party tent. “Happy birthday to you…” Molly snapped her fingers. “Go! Operation: Unwanted Redistribution is a go!” Bill straightened his coat. “If I get glitter on me, I’m blaming you.” “If you get glitter on you, you’re doing it right.” He took a deep breath, then dove headfirst into the foam ball pit. Molly whispered into her walkie-talkie. “Subject entering. Let’s give Timmy the best heist birthday surprise he never asked for.”
created_at
"2026-03-25T10:31:37.184Z"
3/25/2026, 10:31:37 AM
model
"qwen/qwen3-235b-a22b-07-25"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4673
moderation_latency
(null)
generation_time
50538
tokens_prompt
77
tokens_completion
972
native_tokens_prompt
72
native_tokens_completion
773
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00047028
router
(null)
provider_responses
0
endpoint_id
"ce835ea8-00c7-4686-831d-7733dac5e2d9"
id
"019d248ccfc9d60523fe05de8ce88a95"
is_byok
false
latency
4673
model_permaslug
"qwen/qwen3-235b-a22b-07-25"
provider_name
"SiliconFlow"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1774434697-TfyHFEpoatSBaYmcEg3Y"
upstream_id
"019d248ccfc9d60523fe05de8ce88a95"
total_cost
0.00047028
cache_discount
(null)
upstream_inference_cost
0
provider_name
"SiliconFlow"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
544
5.92%
Dialogue to Total Word Ratio
Ratio: 77.52%, Deviation: 7.52%
neededClean
false
wordsTotal
556
wordsDialogue
431
2.9625%