NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHT2FGYNTZHW3N60SM9GR2K
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
18.7s
Cost:
$0.001858
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.21%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.1070%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge, peering through binoculars at the backyard chaos of screaming kids and colorful balloons. Bill hunched beside her, fiddling with a comically oversized pair of tweezers he insisted were “professional party-crashing tools.” “Bill, for the love of everything unholy, put those down,” Molly hissed. “We’re not extracting a splinter from a dinosaur. We’re stealing birthday presents.” Bill grinned, his fake mustache drooping slightly. “Relax, Mol. I’m improvising. Look, I brought the emergency clown nose. Instant disguise. Kids love clowns.” “They also love pointing and screaming ‘Stranger danger!’ We’re going in as the entertainment’s last-minute replacements. You’re the magician. I’m the assistant. We slip the gifts into the fake rabbit cage during the big finale.” Bill snorted. “Rabbit cage? That thing’s the size of a breadbox. How do we fit twenty presents in there?” “By not being complete idiots,” Molly said, rolling her eyes. “The cage has a false bottom. I spent three hours building it last night while you were watching cartoons.” “Hey, research,” Bill protested. “I learned that the birthday boy hates broccoli but loves anything that explodes in glitter. We can use that.” Molly pinched the bridge of her nose. “We are not weaponizing glitter. Last time you did that we left a trail brighter than a Vegas sign. The cops followed us for six blocks. Six.” “That was one time,” Bill muttered. “Besides, kids are distracted. We do the classic switcheroo: I pull a dove out of my sleeve, you palm the gift bags. Easy peasy, lemon squeezy.” “Easy peasy my ass,” Molly said. “There’s a piñata the size of a small car. If that thing breaks early we’re done. Thirty hyper six-year-olds will swarm like locusts. We need to be out before cake.” Bill tapped his chin thoughtfully. “What if I accidentally-on-purpose set off the smoke machine during ‘Happy Birthday’? Total confusion. We grab everything, including the cake if we’re feeling spicy.” Molly stared at him. “We are not stealing the cake, Bill. We have standards. Barely, but they exist. Focus. In through the side gate at three-fifteen. You do your crappy magic routine. I ‘assist’ by loading the presents into the van disguised as the balloon truck. We’re out in eight minutes. Any longer and someone’s mom will want to take selfies with us.” “Eight minutes,” Bill repeated, testing the words. “I can do eight minutes. I once stole an entire poker table in nine.” “You stole a felt-covered folding table from your cousin’s garage,” Molly corrected. “Not exactly the Crown Jewels.” Bill shrugged, adjusting his bow tie. “Practice for the big leagues. Today, we become legends. The thieves who ruined little Timmy’s eighth birthday and made it the best story he’ll ever tell.” Molly sighed but couldn’t hide her smirk. “If we get caught because you insist on doing the disappearing handkerchief trick, I will personally feed you to those children.” “Fair enough,” Bill said, offering her a fist bump. “Let’s go ruin a party.” Molly bumped his fist. “Try not to enjoy it too much.” They slipped toward the gate, two shadows armed with bad magic and worse ideas, ready to heist the happiest place on Earth.
created_at
"2026-05-01T13:01:28.484Z"
5/1/2026, 1:01:28 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1454
moderation_latency
(null)
generation_time
18556
tokens_prompt
77
tokens_completion
890
native_tokens_prompt
182
native_tokens_completion
706
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0018581
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"a6ece46c-a522-9fe9-8d64-bebc17dd16d6"
is_byok
false
latency
64
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777640488-xddLMzgZTmhnlWQvGkWQ"
session_id
(null)
api_type
"completions"
id
"gen-1777640488-qpzaP7jUSAKWa5Fv8CXk"
upstream_id
"a6ece46c-a522-9fe9-8d64-bebc17dd16d6"
total_cost
0.0018581
cache_discount
0.0001344
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.21%
Matches word count
n/a
neededClean
false
words
528
0.00%
Dialogue to Total Word Ratio
Ratio: 76.87%, Deviation: 46.87%
neededClean
false
wordsTotal
536
wordsDialogue
412
0.1070%