NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZCA0YV4NATDCC3SQ0ME2H
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
4.4s
Cost:
$0.004388
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
20.19%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
10.0950%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her fake mustache in the rearview mirror of their beat-up van parked two blocks from the birthday bash. “Bill, I still can’t believe we’re doing this. We’re about to crash a kiddie party like it’s the Crown Jewels.” Bill grinned, peeling the wrapper off a stolen Fun Size Snickers. “Hey, those presents are worth more than gold to the little gremlins. We grab the table full of wrapped boxes, we’re set for Christmas. You got the clown noses?” “Two red ones and a whoopee cushion for distraction,” Molly said, stuffing them into her overalls. “But listen, genius, what’s our cover story when the soccer moms ask why we’re not wearing birthday hats?” “We’re the entertainment! I’m Bongo the Magnificent, master balloon animal sculptor. You’re my lovely assistant, Sparkles McGee. We waltz in, I make a giraffe, you slip the gifts into the big sack disguised as a piñata. Easy peasy, lemon squeezy.” Molly snorted. “Sparkles McGee? That’s the best you could do? Last time you called me that I ended up explaining to a cop why I had a sack full of silverware.” “This is different,” Bill insisted, checking the fake clipboard. “No silverware, just Barbies and Lego sets. High demand on the black market. My guy Frankie says he can flip a talking unicorn for fifty bucks easy.” “Fifty bucks? We’re risking felony fun time for fifty bucks? Bill, there’s gotta be a hundred presents in there. Little Timmy’s parents rented a bouncy castle. That means rich people. Rich people give rich gifts.” “Fine, fine, we aim for the big shiny ones,” Bill conceded. “But we can’t just shove everything in the bag. We gotta be selective. I’ll create a diversion by ‘accidentally’ releasing the helium balloons. Chaos ensues, kids scream, parents chase floating Mylar. Meanwhile you swap the present table with our decoy box full of newspaper.” Molly rolled her eyes. “Newspaper? They’ll notice when little Emily opens her gift and finds yesterday’s sports section. We need better decoys. I brought some old socks and a broken toaster. Looks heavy, feels disappointing, classic misdirection.” “You beautiful criminal,” Bill said with genuine admiration. “Okay, timeline. We hit at peak cake time, around two-fifteen. I start the balloon animals, you work the crowd with face paint. When the birthday boy blows out the candles, that’s our window. Everybody’s singing, nobody’s watching the gift table.” “What if they sing Happy Birthday to You off-key? That song always makes me emotional, Bill. I might hesitate.” “Don’t you dare get soft on me, Sparkles. Those presents are funding our retirement in Boca. Think of the beach, not the sobbing children. We’ll leave one crappy gift so they don’t feel completely robbed. A nice participation trophy or something.” Molly checked her watch. “It’s showtime in six minutes. You ready to ruin the happiest day of some six-year-old’s life?” Bill pulled on his oversized clown shoes. “Born ready. Just promise me one thing.” “What?” “After we fence the haul, we never speak of this again. Especially not to my mother.” Molly laughed, adjusting her rainbow wig. “Deal. Let’s go steal some joy, Bongo.”
created_at
"2026-03-12T21:31:26.114Z"
3/12/2026, 9:31:26 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
494
moderation_latency
(null)
generation_time
4242
tokens_prompt
77
tokens_completion
866
native_tokens_prompt
182
native_tokens_completion
709
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0043876
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"60becd8b-c40b-06a6-1fbe-e4338eda3ad2"
is_byok
false
latency
490
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351086-ywY4IoBsBC2gs92fVSTs"
upstream_id
"60becd8b-c40b-06a6-1fbe-e4338eda3ad2"
total_cost
0.0043876
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
20.19%
Matches word count
n/a
neededClean
false
words
520
0.00%
Dialogue to Total Word Ratio
Ratio: 85.71%, Deviation: 15.71%
neededClean
false
wordsTotal
525
wordsDialogue
450
10.0950%