NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZHAA4JHMHQ3MJESQCGQV1
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
3.9s
Cost:
$0.003914
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0014%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her ski mask for the third time, peering over the hedge at the bouncy castle. “Bill, I’m telling you, this is the perfect score. Twenty screaming kids, one distracted magician, and a mountain of wrapped presents under that piñata tree. In and out before the cake’s even cut.” Bill crouched beside her, clutching a comically small backpack. “Molly, we’re breaking into a children’s party. My mother would disown me. Again. Last time she disowned me it was for that library thing.” “Library was amateur hour. This is art. Look, the clown’s already half-drunk on apple juice and regret. We wait for musical chairs, create a diversion, and boom—presents are ours. I brought tiny suction cups for the gift bags.” “Tiny suction cups? We’re not stealing the Hope Diamond, Mol. We’re robbing six-year-olds of their Pokémon cards and whatever the hell ‘Shopkins’ are. My soul feels dirty just saying that word.” “Your soul was dirty the day you agreed to partner with me. Besides, these kids get too much stuff. We’re performing a public service. Wealth redistribution with a side of petty larceny. Think of it as the Robin Hood of birthday parties.” Bill groaned. “Robin Hood didn’t have to explain to a room full of parents why the Easter Bunny was suddenly six-foot-two and swearing. What’s our exit strategy? The bouncy castle has one way in and one way out, and it’s currently occupied by a hyperactive toddler named Kevin who’s been farting the national anthem for twenty minutes.” “We use the magician’s smoke bombs. I nicked three from his van. We pop one, yell ‘Abracadabra, your presents are crap now,’ grab the loot, and slide down the inflatable slide like civilized criminals. Then we vanish into the suburbs. No one ever looks for thieves in a minivan with a ‘Baby on Board’ sticker.” “You stole the smoke bombs? Molly, that’s actually impressive. Terrifying, but impressive. What if the parents tackle us? I’m not getting taken down by a soccer mom in yoga pants. My reputation.” “Your reputation died in 2019 when you tried to steal that ice cream truck and only got sprinkles. Look, I’ll handle the parents. You handle Kevin and his musical butt. We split the haul sixty-forty.” “Seventy-thirty, and I get first pick of the Lego sets.” “Sixty-forty or I tell your mother about the library.” “Fine. Sixty-forty, you heartless witch. But if we get caught, I’m blaming the clown. Everyone always believes the clown’s guilty.” Molly grinned, checking her watch. “Musical chairs starts in three minutes. Ready to ruin some childhoods?” Bill sighed, zipping his tiny backpack. “I was born ready. Unfortunately. Let’s go steal some joy.” They crept toward the gate, two silhouettes against the laughter and shrieking, already arguing about who got to keep the giant stuffed unicorn.
created_at
"2026-03-12T21:34:10.248Z"
3/12/2026, 9:34:10 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
385
moderation_latency
(null)
generation_time
3776
tokens_prompt
77
tokens_completion
766
native_tokens_prompt
182
native_tokens_completion
630
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0039136
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"61c39790-44d3-74a4-73f8-f6d276f58515"
is_byok
false
latency
343
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351250-d2Bt6HsCtKESc73cRS6q"
upstream_id
"61c39790-44d3-74a4-73f8-f6d276f58515"
total_cost
0.0039136
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
468
0.00%
Dialogue to Total Word Ratio
Ratio: 86.95%, Deviation: 16.95%
neededClean
false
wordsTotal
475
wordsDialogue
413
0.0014%