NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHRWQJKD2Y3528T2Q6PQCRP
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
12.7s
Cost:
$0.001868
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
2.01%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
1.0058%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her oversized sunglasses and peered through the chain-link fence at the bouncing castle. “Bill, if you say ‘yoink’ one more time while we’re stealing from six-year-olds I will duct-tape your tongue to the roof of the van.” Bill grinned, teeth bright against the fake mustache he’d insisted on wearing. “Relax, Mol. It’s not stealing, it’s… premature present redistribution. Think of it as Santa’s evil twin.” “We are not evil twins,” Molly hissed, checking her watch. “We are professionals. Sort of. The cake gets cut at three. That’s our window. You handle the piñata decoy, I’ll slip in through the kitchen and bag the gifts under the big rainbow table.” Bill rubbed his hands together. “Love the rainbow table. Very thematic. What if the birthday kid sees us? Tiny Timmy’s got those laser eyes, I swear.” “His name is Ethan, and you’re going to distract him with the fake puppy.” She patted the plush golden retriever tucked under her arm. It had a speaker in its stomach that played woofs on command. “Fake puppy feels a little low, even for us,” Bill muttered, though his eyes sparkled with the old familiar larceny light. “Couldn’t we have gone with the classic clown disguise? I still fit in my polka dots.” Molly rolled her eyes so hard they threatened to unscrew. “Last time you wore those dots you got stuck in a vent. We are not repeating the Great Chuckles Debacle of 2019. Besides, the parents think we’re the hired entertainers. I told them you’re Balloon Bill.” “Balloon Bill,” he echoed, tasting it. “I sound like a budget superhero. Can I at least yell ‘It’s clobbering time’ when we make the getaway?” “Only if you want me to clobber you first.” She checked the fence line again. “Look, we grab the mountain of presents, shove them in the laundry bags, and roll out behind the ice-cream truck. Mrs. Abernathy always orders extra sprinkles. Chaos is our friend.” Bill nodded slowly, then frowned. “One problem, partner. What about the goody bags? Those little monsters get sugar and plastic junk. They’ll riot if we leave with the real loot.” Molly sighed, producing two handfuls of off-brand lollipops from her coat. “Backup bribery. One per tiny terrorist. Think you can charm a room full of sugar-drunk kindergarteners for ninety seconds?” “Mol, I once convinced a rottweiler I was a mailman. I got this.” He cracked his knuckles. “So we go on three?” She looked at him, half exasperated, half fond. “On three. And Bill?” “Yeah?” “If you call me Mommy again while we’re inside I will leave you handcuffed to the bouncy castle.” Bill laughed under his breath. “Wouldn’t dream of it, partner. Let’s go make some very disappointed children and very confused parents.” Molly zipped her hoodie, adjusted the fake puppy, and gave the tiniest smile. “Try to keep up, Balloon Bill.” Together they slipped through the gate, two thieves armed with lollipops, laundry sacks, and the unshakable belief that birthday presents belonged in better—by which they meant their—hands. The screams of delight from inside the house were about to take on a whole new meaning.
created_at
"2026-05-01T12:40:51.547Z"
5/1/2026, 12:40:51 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
732
moderation_latency
(null)
generation_time
12419
tokens_prompt
77
tokens_completion
854
native_tokens_prompt
182
native_tokens_completion
710
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0018681
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"62065163-b848-8036-0b5b-148fa5801de3"
is_byok
false
latency
134
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777639251-u1NmLVobT2V5Cr5w0yKD"
session_id
(null)
api_type
"completions"
id
"gen-1777639251-lM4t3bc4ScqzuuEn5Scz"
upstream_id
"62065163-b848-8036-0b5b-148fa5801de3"
total_cost
0.0018681
cache_discount
0.0001344
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
2.01%
Matches word count
n/a
neededClean
false
words
525
0.00%
Dialogue to Total Word Ratio
Ratio: 65.23%, Deviation: 35.23%
neededClean
false
wordsTotal
532
wordsDialogue
347
1.0058%