NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZP8N2FE32CGAXM8XYMMFM
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
31.6s
Cost:
$0.047904
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.21%
Dialogue to Total Word Ratio
(details)
0.1031%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In their cluttered basement hideout, Molly unrolled a crude map drawn on a napkin while Bill nervously twirled a fake mustache between his fingers. "Alright Bill, let's nail this down. We're crashing that kiddie party to swipe all the presents. It's a genius move because who suspects thieves at a six year old's celebration?" Molly began with enthusiasm. "Molly, I gotta say, this plan makes me uneasy. Stealing toys from tots? What's next, taking candy from babies? Literally?" Bill questioned, frowning deeply. "Oh come on, don't go soft on me now. These aren't just any toys. It's a wealthy family's bash with designer gifts and electronics disguised as kid stuff. We'll be rolling in high end merchandise we can fence easily," Molly countered while rolling her eyes. "But the logistics, Molly. How exactly do we pull this off without a swarm of angry moms chasing us?" Bill asked while pacing the room anxiously. "We pose as the magicians. I did my research. The entertainment canceled last minute. I made sure of that with a little anonymous call. You will be the bumbling assistant who distracts everyone with bad jokes and card tricks while I use my sleight of hand to make the presents disappear into our truck," she explained with a wink. "Bad jokes? I don't know any kid friendly jokes. What am I supposed to say, why did the chicken cross the playground?" Bill said sarcastically, throwing his hands up. "Perfect, they'll eat it up. Meanwhile, I'll suggest a game of pass the parcel but the parcel is all the gifts and it ends up in our vehicle. When they sing happy birthday that's our cue to vamoose," Molly continued confidently. "The birthday song cue is smart. But what if there's a bouncy castle? Kids will be all over it and might see us," Bill said as he scratched his chin thoughtfully. "We use the bouncy castle to our advantage. You start jumping around like an idiot to draw attention while I load up. Make it look like you're part of the fun," she suggested with a sly smile. "Okay but if a kid asks for their present back what do I do? Hand it over and say sorry?" Bill wondered aloud. "No you fool. You tell them it's magic and the presents are being turned into real live ponies or something ridiculous. Kids will believe it and get excited," Molly laughed heartily. "This is getting crazier by the minute. What about getaway? The street will be packed with SUVs," Bill pointed out, shaking his head. "There's an alley behind the house. We have our scooter waiting there disguised as a delivery bike. We'll put the bags in the side compartments and scoot away before anyone notices the pile is gone," Molly assured him, patting his back. "Scooter? You expect me to ride away on a scooter with clown makeup on?" Bill exclaimed in disbelief, his eyes widening. "It's comedic gold. No one will take it seriously until it's too late. Now let's review the checklist. Disguises, check. Distraction tools like whoopee cushions and silly string, check. And most importantly, our story if questioned. We're there to make the kids smile," she said, ticking off items on her fingers. "I can't believe I'm agreeing to this but okay. The plan is set. We hit the party at four o'clock sharp, you do your thing, I do mine, and we meet up by the fence with the loot. If we get caught I'm saying you forced me," Bill sighed heavily. "Forced you? With what, my sparkling personality? Fine, but we'll be successful. Imagine the headlines: Dynamic Duo Dashing with Dozens of Delightful Gifts. It'll be our funniest heist yet," Molly said with a chuckle. "You're right, it does sound funny. Let's do this. But promise me we donate one toy to charity or something to balance the karma," Bill smiled reluctantly. "Deal, but only the broken one. Now stop worrying and help me pack the bag of tricks," Molly concluded as she stood up and grabbed the bag.
created_at
"2026-03-12T21:36:52.39Z"
3/12/2026, 9:36:52 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
29323
moderation_latency
(null)
generation_time
31391
tokens_prompt
77
tokens_completion
965
native_tokens_prompt
184
native_tokens_completion
7961
native_tokens_completion_images
0
native_tokens_reasoning
7125
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0479036
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"b552ca3b-2a65-a496-4f24-ff43fc2a78f0"
is_byok
false
latency
258
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351412-Sai1owV2tYgsq2Dq1yJe"
upstream_id
"b552ca3b-2a65-a496-4f24-ff43fc2a78f0"
total_cost
0.0479036
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
678
0.21%
Dialogue to Total Word Ratio
Ratio: 81.12%, Deviation: 11.12%
neededClean
false
wordsTotal
678
wordsDialogue
550
0.1031%