NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-25-01KMJ7Y90D5308KCEJ4EE70X8B
Run Details
Model:
Qwen3 235B A22B Instruct 2507
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
13.0s
Cost:
$0.000194
Provider:
Novita
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
77.08%
Dialogue to Total Word Ratio
(details)
38.5407%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Rain pattered against the greasy window of the unmarked van parked outside Chuckles the Clown’s Birthday Barn. Inside, Molly adjusted her thermal goggles, which she had painted with glittery unicorn stickers for morale. Across from her, Bill meticulously duct-taped foam padding to his elbows and knees, resembling a man preparing for competitive naptime. A large blueprint of the party barn—stolen from the city’s zoning office and slightly stained with what might have been nacho cheese—lay spread across a folding table, weighed down by a rubber chicken and three half-eaten granola bars. Molly circled the bounce house in red marker, muttering about structural weak points and the psychological advantage of surprise candy tosses. Bill tested his silent shoe inserts, which squeaked like wounded rodents no matter how slowly he stepped. They had rehearsed the distraction: Bill would dress as a slightly-off version of Chuckles, reciting knock-knock jokes in a monotone until the children wept from confusion. Molly, disguised as a sentient cupcake with limited mobility, would slip into the gift corral during the ensuing chaos. A toy wind-up penguin marched in circles near the radio, contributing nothing. The clock ticked toward 2:47 PM—the precise moment when cake-induced pandemonium peaked. Molly double-checked the loot bag: industrial-grade pillowcase, scented markers (for disguising the re-gifting later), and a permission slip forged to look like it came from “Mom of All Heists.” “Ready,” Molly whispered. “Born ready,” Bill lied.
created_at
"2026-03-25T10:16:45.329Z"
3/25/2026, 10:16:45 AM
model
"qwen/qwen3-235b-a22b-07-25"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
424
moderation_latency
(null)
generation_time
12733
tokens_prompt
77
tokens_completion
402
native_tokens_prompt
72
native_tokens_completion
323
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00019382
router
(null)
provider_responses
0
endpoint_id
"cf186489-d252-4fb7-aca9-87ef7b557eff"
id
"c577166585a8c1f2f28557deae15b0cd"
is_byok
false
latency
424
model_permaslug
"qwen/qwen3-235b-a22b-07-25"
provider_name
"Novita"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1774433805-lGVdYwgaPTyP0w5m1agy"
upstream_id
"c577166585a8c1f2f28557deae15b0cd"
total_cost
0.00019382
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Novita"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
237
77.08%
Dialogue to Total Word Ratio
Ratio: 2.86%, Deviation: 7.14%
neededClean
false
wordsTotal
245
wordsDialogue
7
38.5407%