NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHT1YH2E4V22R9BPQZQJ4XN
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
15.1s
Cost:
$0.001765
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
81.27%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
40.6364%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her oversized clown wig in the rearview mirror of their battered van, parked two blocks from the birthday bash. The muffled sounds of shrieking kids and balloon pops drifted through the crisp afternoon air. Bill slouched in the passenger seat, fiddling with a comically large pair of fake mustache glasses. “Bill, for the last time, we’re not using the whoopee cushion as a diversion,” Molly hissed, her voice a mix of exasperation and suppressed laughter. “Those things are louder than your snoring after taco night.” Bill grinned, adjusting his polka-dot bow tie that clashed horribly with his burglar-black turtleneck. “Aw, come on, Mol. Picture it. We slip in during cake time, I let one rip—metaphorically, of course—and while the rugrats are rolling around laughing, we bag every last gift under that piñata. Smooth as butter.” Molly rolled her eyes so hard it looked like she was checking her own wig again. “Smooth as butter? Last time you said that we ended up stuck in a vent for three hours because your ‘butter’ hips wouldn’t fit. This is a six-year-old’s party, not the Met Gala. We go in as the hired entertainers, like we rehearsed. You juggle, I do the balloon animals. Then we ‘accidentally’ load the presents into the van for ‘safekeeping.’” “Juggling?” Bill protested, his fake mustache twitching. “I can barely keep three oranges in the air without starting a food fight. What if little Timmy from down the street recognizes me from that unfortunate grocery store incident?” “That was your fault for trying to shoplift a watermelon under your shirt,” Molly shot back. “Look, we hit the presents at peak chaos—right after the magician leaves but before the parents start herding everyone for goodie bags. I’ll create a distraction with my world-famous sock puppet show. You slip out back and start the engine. We’ll be in and out before anyone notices the missing mountain of Sparkle Pony playsets.” Bill leaned forward, suddenly serious despite the rubber nose. “You really think we can pull this off? Stealing from kids, Mol. Feels a bit low, even for us.” Molly patted his knee with mock affection. “They’re rich kids, Bill. One less Talking Truck isn’t going to scar them for life. Besides, we’re not monsters—we’ll leave the cake. Now stop whining and practice your goofy laugh. And for God’s sake, lose the real lockpicks. They clash with the suspenders.” Bill chuckled, deep and rumbling. “Fine, fine. But if this goes south, I’m blaming your balloon poodle. That thing looks more like a depressed giraffe.” Molly smirked, starting the van. “Just remember the exit cue: when I yell ‘Who wants more ice cream?’ that’s your signal to floor it. Let’s go make some unhappy parents and very confused children.” They shared a conspiratorial glance, the van rumbling to life as colorful streamers fluttered from the antenna like surrender flags no criminal would ever wave.
created_at
"2026-05-01T13:01:11.08Z"
5/1/2026, 1:01:11 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
465
moderation_latency
(null)
generation_time
15020
tokens_prompt
77
tokens_completion
782
native_tokens_prompt
182
native_tokens_completion
642
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0017653
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"20596124-3edc-9b0d-aa72-20aa9aa7dc78"
is_byok
false
latency
82
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777640471-st4Wh4xt24ZseP8NtIbo"
session_id
(null)
api_type
"completions"
id
"gen-1777640471-CLPfaQ3VgRNFO1TV1jy7"
upstream_id
"20596124-3edc-9b0d-aa72-20aa9aa7dc78"
total_cost
0.0017653
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
81.27%
Matches word count
n/a
neededClean
false
words
488
0.00%
Dialogue to Total Word Ratio
Ratio: 68.56%, Deviation: 38.56%
neededClean
false
wordsTotal
493
wordsDialogue
338
40.6364%