NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZRB1J9801V5BYCPVVEPJT
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
4.1s
Cost:
$0.004208
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her oversized sunglasses and leaned across the sticky diner table. "Okay Bill, hear me out. We hit the bouncy castle party at 3:15 sharp. That's when the little monsters are in full sugar meltdown and the parents are hiding in the kitchen with the wine." Bill rubbed his mustache like it owed him money. "A children's party, Mol? We're stooping to birthday clowns now? I used to boost armored cars." "Exactly why nobody will suspect us, genius. Think about it. Who looks at two adults in a ball pit and thinks, 'There go the crown jewels of teddy bears'?" "I still say we need better disguises. Last time you wore that wig you looked like a depressed poodle." "It was a French poodle, thank you very much. Very sophisticated. Now focus. The presents are all piled on the gift table by the chocolate fountain. We create a distraction, I swoop in with the black trash bag disguised as a garbage liner because who questions the cleaning lady at a kid's party?" Bill snorted into his coffee. "And what's my job, exactly? Playing Pin the Tail on the Billionaire?" "You, my dear partner in minimal crime, are the human piƱata. You wear the mascot costume I borrowed from that bankrupt theme park. The one that looks like a depressed purple dinosaur. Kids will flock to you like moths to a poorly lit flame." "I ain't getting in no dinosaur suit, Molly. Last time I wore spandex I couldn't feel my legs for three days." "It's not spandex, it's felt. Very breathable. You'll be fine. While they're beating you with plastic bats trying to get the candy out, I'll be over at the gift table having a lovely conversation with a stack of brightly wrapped Xboxes." Bill tapped his spoon against his mug. "What about the parents? One soccer mom gets suspicious and we're done." "That's why we bring the decoy cake. I baked it myself. It's mostly cardboard and regret. When I 'accidentally' drop it, everyone will be too busy staring at the mess to notice me stuffing Barbies into my cleavage." "Your cleavage has seen more action than a airport security line." "Focus, Bill. The birthday boy is turning seven. His name is Connor and he wants a robot puppy. We're not taking that one. Even we have standards." "Since when?" "Since now. We leave the sentimental stuff. Just the big ticket items. Gaming consoles, that fancy drone, maybe the envelope full of cash from Grandma who's clearly loaded." Bill sighed dramatically. "Fine. But if I get recognized in that dinosaur costume, I'm retiring to Florida and you're not invited to my condo." Molly grinned, raising her coffee cup. "To the great T-Rex Heist of 2025. May the children be distracted and the parents be drunk." Bill clinked his mug against hers. "If we get caught, I'm telling them you forced me. I'm just a simple man who likes lunch." "Simple is right," Molly said, stealing his last fry. "Now let's go practice your roar. It's gotta be pathetic enough to sell the act but not so bad it scares the toddlers into therapy." "You're enjoying this way too much." "Bill, I'm about to commit the most wholesome felony in history. Of course I'm enjoying it."
created_at
"2026-03-12T21:38:00.375Z"
3/12/2026, 9:38:00 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
318
moderation_latency
(null)
generation_time
4007
tokens_prompt
77
tokens_completion
815
native_tokens_prompt
182
native_tokens_completion
679
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0042076
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"953fc938-3feb-ec74-e070-542e41a737f8"
is_byok
false
latency
261
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351480-2mFoOVVUwGeYuD2GntCt"
upstream_id
"953fc938-3feb-ec74-e070-542e41a737f8"
total_cost
0.0042076
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
548
0.00%
Dialogue to Total Word Ratio
Ratio: 90.15%, Deviation: 20.15%
neededClean
false
wordsTotal
548
wordsDialogue
494
0.0000%