NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHS650G7P5460KTEBC0AP49
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
16.6s
Cost:
$0.001908
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
95.99%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
47.9934%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Alright, Bill, walk me through it again, but this time without the part where you say ‘trust me, I’ve done worse.’” “Fine, Mol. We hit the party at sixteen hundred hours, right when the clown starts his balloon-animal torture routine. Kids’ll be hypnotized. That’s our window.” “Clown at four o’clock. Got it. What about the parents? They hover like seagulls at a chip shop.” “We use the decoy. You roll in with the fake present wrapped in that hideous unicorn paper. I’ll be behind you dressed as the world’s saddest magician. Cape, top hat, tragic moustache. The mums will be so busy judging my life choices they won’t notice us slipping toward the gift table.” “You do realise I have to wear the matching unicorn jumper, right? This is emotional warfare, Bill.” “Necessary evil. Think of the haul. Twenty, maybe thirty brightly wrapped boxes. Xboxes, Lego Death Stars, whatever little Timmy wished for that costs more than our rent. We bag ’em, we’re set for months.” “Months? We’re not exactly Ocean’s Eleven here. Last time you said that we ended up with three Barbies and a half-eaten birthday cake.” “That cake was chocolate. Worth it. Look, this time we’ve got the getaway sorted. I borrowed Derek’s ice-cream van. The one with the broken jingle. It’s practically invisible; nobody looks at an ice-cream van at a kids’ party. It just blends.” “Blends. Like a grown man in a purple cape licking a Flake. Sure. And how do we get past the bouncy castle? There’s always a bouncy castle.” “Easy. You challenge the birthday boy to a bounce-off. While he’s trying to impress you I’ll be behind the castle shovelling presents into the fake present sack. It’s got ‘Congratulations on Your Retirement, Kevin’ written on it. They’ll never suspect.” “Bill, I’m five-foot-two in heels. That kid’s going to launch me into orbit. Also, why do you own a retirement sack?” “Long story. Inheritance. Never mind. Once we’ve got the swag we meet back at the van. I’ll be selling ninety-nine flakes to the parents as cover. You slide in the back like a pro. Then we’re gone before the candles are even lit.” “What if the clown sees us? Those guys are twitchy. Last clown I met had a switchblade in his flower.” “Then we deploy the ultimate distraction. You tell him his shoes are untied. Clowns hate that. It’s like telling a mime he’s loud. He’ll spiral. We vanish in the confusion.” “I can’t believe I’m risking my dignity for plastic toys and lukewarm iced gems.” “Think bigger, Mol. This isn’t just theft. It’s redistribution of joy. Those kids get new presents tomorrow anyway. We’re simply… accelerating the economy.” “You’re an idiot.” “Yeah, but I’m your idiot. And I’ve got snacks in the van.” “Fine. But if I have to do the unicorn dance, you’re buying me chips afterwards. With the stolen cake money.” “Deal. Now let’s go rehearse your confused-parent face. You look too competent right now.” “Better?” “Perfect. You could be my wife. God help us both.”
created_at
"2026-05-01T12:46:00.213Z"
5/1/2026, 12:46:00 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
420
moderation_latency
(null)
generation_time
16530
tokens_prompt
77
tokens_completion
840
native_tokens_prompt
182
native_tokens_completion
699
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0019078
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"0f87639c-d044-a99c-19a6-baf277ac14b1"
is_byok
false
latency
93
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777639560-AfqqatiB7wUKpGysaibz"
session_id
(null)
api_type
"completions"
id
"gen-1777639560-dNLRawSHbAjoFmHEFBdF"
upstream_id
"0f87639c-d044-a99c-19a6-baf277ac14b1"
total_cost
0.0019078
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
95.99%
Matches word count
n/a
neededClean
false
words
508
0.00%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 30.00%
neededClean
false
wordsTotal
517
wordsDialogue
517
47.9934%