NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ3VVKNPVTV3FVTZZGC21R
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
5.8s
Cost:
$0.003950
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
68.10%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
34.0511%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her oversized sunglasses and peered through the chain-link fence at the bouncy castle. “Bill, I’m telling you, this is our ticket. Twenty screaming kids, one distracted magician, and a table piled high with shiny wrapped boxes. We hit it during musical chairs. Chaos is our friend.” Bill crouched beside her, adjusting the fake mustache that kept slipping into his mouth. “Molly, we’re stealing from children. Tiny, sticky children. My mom always said that was the fast track to karma kicking you in the shins.” “ Your mom also said pizza was a vegetable. We’re not robbing orphans, we’re liberating excess plastic from overindulged suburban gremlins. Think of it as recycling with a profit margin.” Molly pulled out a wrinkled party invitation. “Look, they even wrote ‘gifts optional but encouraged.’ We’re encouraged, Bill. The universe is literally rooting for us.” He sighed. “Fine. What’s the play? I’m not wearing the giant teddy bear costume again. Last time I almost suffocated and some kid tried to feed me juice through the eyehole.” “We go as the entertainment. You’re Professor Balloon Animal, world-renowned twister of latex. I’m DJ Sparkles, spinner of only the most annoying songs known to man.” Molly grinned. “While you’re making balloon poodles, I’ll accidentally bump the gift table. Everything slides into the duffel bag I hid inside the speakers. Simple.” Bill rubbed his forehead. “Simple? Kids are faster than you think. One of them always has a sixth sense for candy and crime. What if we get recognized? My face was on that ‘Have you seen this idiot’ poster after the mall fountain incident.” “That was in Cleveland. Different state, different vibe. Besides, I bought you a new wig. It’s purple. Kids love purple.” She poked his chest. “Come on, where’s the old Bill who stole the mayor’s toupee during the Fourth of July parade?” “He grew up and realized felony records don’t look good on dating apps.” Bill paused, then smirked. “But… free toys do sell well on the dark web. Especially limited edition ones.” “Exactly! We fence the loot, split the cash, and I finally buy that espresso machine that doesn’t sound like a dying walrus.” Molly checked her watch. “Party starts in twenty. You in or are you gonna chicken out and go sell vacuums door-to-door again?” Bill stood up, straightened his Hawaiian shirt, and adjusted the ridiculous mustache one last time. “I’m in. But if a six-year-old tackles me, I’m leaving you behind. Every thief for himself.” Molly laughed and slung the duffel over her shoulder. “Fair enough. Let’s go make some birthday memories they’ll never forget and we’ll never admit to. And Bill?” “Yeah?” “Try not to pop any balloons this time. Last job sounded like a fireworks factory having a nervous breakdown.” They slipped through the gate, pasting on their biggest, most trustworthy adult smiles as the first notes of the “Chicken Dance” drifted across the lawn.
created_at
"2026-03-12T21:26:49.464Z"
3/12/2026, 9:26:49 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
373
moderation_latency
(null)
generation_time
5723
tokens_prompt
77
tokens_completion
804
native_tokens_prompt
182
native_tokens_completion
636
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0039496
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"762ae26d-a388-c7f7-e773-4a1282263a7a"
is_byok
false
latency
318
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350809-gOZtiaVBRYsPYZMcTdgK"
upstream_id
"762ae26d-a388-c7f7-e773-4a1282263a7a"
total_cost
0.0039496
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
68.10%
Matches word count
n/a
neededClean
false
words
486
0.00%
Dialogue to Total Word Ratio
Ratio: 74.39%, Deviation: 24.39%
neededClean
false
wordsTotal
492
wordsDialogue
366
34.0511%