NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ010YFQDCA7K6DMVYMQ4F
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
5.2s
Cost:
$0.004160
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
99.98%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.9920%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her oversized clown wig and squinted at the crudely drawn map on the diner napkin. “Bill, if you say ‘it’s just like the casino job’ one more time I’m going to shove this entire whoopee cushion down your throat.” Bill grinned around a mouthful of pancake. “Relax, Mol. It’s literally a bunch of six-year-olds. What are they gonna do, bite our ankles? We roll in, we roll out, we roll home with forty Barbie dream houses and a mountain of Pokémon cards.” “Those kids have the collective attention span of a goldfish on espresso,” Molly hissed. “One scream and we’re swarmed by soccer moms with fanny packs full of judgment. We need a plan that doesn’t end with us on the local news as ‘the idiots who made Timmy cry.’” “Which is why,” Bill said, tapping the napkin with his fork, “phase one: distraction. I dress as a six-foot-tall purple dinosaur. Kids lose their tiny minds, parents take videos instead of paying attention.” “You’re going to be Barney? You? The man who once cried during a bank heist because the teller was ‘too nice’?” “I prefer the term ‘method actor.’ Anyway, while they’re all singing the ‘I Love You’ song, you slip in the back as the stressed-out mom who forgot the goodie bags. Classic misdirection.” Molly rubbed her temples. “Fine. But if you start doing the dinosaur dance, I will abandon you and keep all the loot. Also, we’re not touching the birthday cake. Last time you stress-ate an entire sheet cake and spent three days regretting your life choices.” “Cake is off limits, noted. What about the piñata? That thing’s probably stuffed with cash and tiny bottles of hand sanitizer.” “Leave the piñata. I’m not getting clubbed by a blindfolded child with a plastic bat. We hit the gift table at the exact moment they start ‘Happy Birthday.’ That’s our window. Thirty seconds max.” Bill leaned in, eyes gleaming. “Then we rendezvous at the bouncy castle. I’ll have the van idling by the fence. We dump the presents in the back, peel off the costumes, and vanish like classy phantoms.” “Classy phantoms who stole toys from children,” Molly muttered. “We’re going to hell, Bill. A very brightly colored, balloon-filled hell.” “Hey, we’re not monsters. We’ll send them a postcard from the Cayman Islands. ‘Sorry about your Legos, love, the Clown and the Dinosaur.’” Molly sighed and snatched the last piece of bacon from his plate. “If this goes south, I’m telling the cops it was all your idea and that you made me wear the wig.” “Fair. Just don’t forget the most important part.” “Which is?” Bill winked. “We leave one present behind. That way they’ll think it was a really weird magician who got confused about his gig.” Molly stared at him for a long second, then shook her head with a reluctant smile. “You’re an absolute menace, you know that?” “Yeah,” Bill said cheerfully, “but I’m your menace. Now pass the syrup. Evil geniuses need carbs.”
created_at
"2026-03-12T21:24:43.683Z"
3/12/2026, 9:24:43 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
269
moderation_latency
(null)
generation_time
5022
tokens_prompt
77
tokens_completion
813
native_tokens_prompt
182
native_tokens_completion
671
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0041596
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"390ef207-be52-f7a1-0eaf-6ce1fbdc22d9"
is_byok
false
latency
237
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350683-oGZPcj2MCKq8mbZiytOM"
upstream_id
"390ef207-be52-f7a1-0eaf-6ce1fbdc22d9"
total_cost
0.0041596
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.98%
Matches word count
n/a
neededClean
false
words
502
0.00%
Dialogue to Total Word Ratio
Ratio: 84.68%, Deviation: 34.68%
neededClean
false
wordsTotal
509
wordsDialogue
431
49.9920%