NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ0BVJMRTBM9CPF3RK62KS
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
2.6s
Cost:
$0.002055
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
1.04%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.5180%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge, peering at the bouncy castle where shrieking kids swarmed like sugar-crazed locusts. Bill squeezed in beside her, clutching a comically small backpack shaped like a dinosaur. "This is the worst idea you've ever had," she whispered, "and you've had some stinkers." Bill grinned, adjusting his rainbow party hat. "Come on, Mol. Those presents are practically begging to be liberated. Think of it as redistribution of wealth. To us." "They're six-year-olds, Bill. Their idea of security is a balloon tied to the gift table." "Exactly. Childproof, not thief-proof." He pulled out two clown masks and offered her one. "We go in during musical chairs. Chaos is our friend." Molly stared at the mask. "If we get caught, I'm telling them you're my idiot cousin from out of town." "Fair enough." Bill checked his watch. "PiƱata breaks in seven minutes. That's our window. You grab the big shiny ones, I'll create a diversion with the bubble machine." She sighed, slipping on the mask. "I can't believe I'm about to ruin a birthday for a stack of gift cards and whatever the hell is in those My Little Pony boxes." "Think of the loot," Bill said, eyes gleaming. "And the story. Nobody's ever heisted a children's party before. We'll be legends." "Legends in therapy," Molly muttered, but she followed him toward the laughter anyway.
created_at
"2026-03-12T21:24:54.774Z"
3/12/2026, 9:24:54 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
375
moderation_latency
(null)
generation_time
2485
tokens_prompt
77
tokens_completion
356
native_tokens_prompt
182
native_tokens_completion
301
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0020548
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"dc1da764-f0df-7cf8-87bd-c0056216f848"
is_byok
false
latency
334
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350694-IOv135PHe6ZmogSmYzxJ"
upstream_id
"dc1da764-f0df-7cf8-87bd-c0056216f848"
total_cost
0.0020548
cache_discount
0.0001152
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
1.04%
Matches word count
n/a
neededClean
false
words
226
0.00%
Dialogue to Total Word Ratio
Ratio: 65.22%, Deviation: 55.22%
neededClean
false
wordsTotal
230
wordsDialogue
150
0.5180%