NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZC87N2395021SCFTFR6N8
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
4.3s
Cost:
$0.004376
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her fake mustache in the rearview mirror of the battered van, peering at Bill who was trying to wedge a clown wig over his bald spot. “This is the dumbest thing we’ve ever done, Bill. We’ve robbed banks, jewelry stores, even that casino run by the mob, but a children’s birthday party? For God’s sake, we’re stealing from toddlers.” Bill grinned, teeth flashing beneath a smear of red lipstick that was supposed to be a smiley face. “Exactly, Mol. Nobody expects the clowns to clean out the gift table. Think about it. Fifty screaming kids, one distracted mom, and a mountain of wrapped boxes bigger than my ex-wife’s ego. Easy in, easy out, and we retire on Pokémon cards and Barbie dreamhouses.” “You’re an idiot,” Molly said, smacking his arm. “What if one of them starts crying? You know I can’t handle tiny humans crying. Last time a kid looked at me funny I gave him my wallet.” “That was one time and it was adorable,” Bill replied, checking the comically large prop gun that was actually a water pistol filled with grape juice. “Besides, we’ve got the perfect cover. I’m Bongo the Clown and you’re Giggles the… whatever. We show up, do a couple balloon animals, slip the presents into the fake cake we’re wheeling in, and roll out before the piñata even drops.” Molly groaned, rubbing her temples. “The fake cake is made of cardboard, Bill. Cardboard. If little Timmy kicks it, our entire operation collapses like a bad soufflé. And what about the parents? They’re not all idiots. One soccer mom with a suspicious look and we’re done.” “Parents are too busy filming everything for Instagram to notice us,” Bill countered, pulling on oversized shoes that squeaked with every step. “We’ll be in and out in twelve minutes. I timed it. You distract with magic tricks, I load the swag. Then we meet at the van, peel off these ridiculous noses, and disappear into the night richer than a dentist.” “Richer in what?” Molly demanded. “Gift cards to Chuck E. Cheese? A plastic tea set? My fence doesn’t take Shopkins, Bill. He barely takes diamonds anymore. This is career suicide wrapped in Disney paper.” Bill clapped her on the shoulder, leaving a glittery handprint. “Trust the process, partner. Kids’ parties have the best stuff now. iPads, drones, those tiny electric cars. One good haul and we can finally buy that boat we’ve been talking about. Imagine it: us, sipping margaritas, no more crawling through air vents or dodging guard dogs. Just sun, sea, and zero regret.” Molly stared at him for a long moment, then sighed deeply. “If we get recognized by a six-year-old and I have to explain to the cops why I’m dressed like a depressed clown, I’m throwing you under the bus. Literally. I’ll run you over with this van.” “Fair enough,” Bill said cheerfully, starting the engine. “Now let’s go make some children cry for all the right reasons. Operation Birthday Bandit is a go.” Molly pulled her rainbow suspenders tight and muttered, “I hate you so much right now.” The van rattled off into the suburban twilight, two clowns on a mission no self-respecting criminal would ever admit to.
created_at
"2026-03-12T21:31:24.281Z"
3/12/2026, 9:31:24 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
492
moderation_latency
(null)
generation_time
3985
tokens_prompt
77
tokens_completion
839
native_tokens_prompt
182
native_tokens_completion
707
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0043756
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"126e5d70-456d-6323-ae8e-fcd2c120424e"
is_byok
false
latency
451
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351084-8wcLBSSK3bq20zAoQH6c"
upstream_id
"126e5d70-456d-6323-ae8e-fcd2c120424e"
total_cost
0.0043756
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
539
0.00%
Dialogue to Total Word Ratio
Ratio: 73.62%, Deviation: 23.62%
neededClean
false
wordsTotal
542
wordsDialogue
399
0.0000%