NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHSMJXYBVDCSG7PERAF77XN
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
21.0s
Cost:
$0.001663
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her fake mustache in the rearview mirror of their beat-up van, parked two blocks from the bounce house bonanza. “This is the dumbest idea you’ve ever had, Bill. We’re not stealing from a bank, we’re stealing from toddlers. There’s a special level of hell for this.” Bill grinned, unwrapping a candy bracelet he’d stolen from a previous reconnaissance mission. “Relax, Mol. Kids are terrible at security. They get distracted by clowns and juice boxes. We slip in, we grab the gift table, we’re out before the piñata even hits the ground. Easy money.” “Easy money?” Molly snatched the bracelet and bit off a piece. “Those little gremlins have eagle eyes. One scream and we’re swarmed by soccer moms wielding diaper bags like medieval maces. And what’s our cover story? ‘Hi, we’re Uncle and Aunt Creepy who just showed up uninvited?’” “We’re the new entertainers,” Bill said, holding up two cheap clown noses. “I’m Bubbles the Magnificent. You’re... uh... Sparkles the Slightly Bitter.” “I will murder you in your sleep.” “See? Method acting. The kids will love the tension.” Bill checked his watch. “Party starts in twenty. We hit during the cake smash. Nobody watches presents when there’s buttercream flying.” Molly pinched the bridge of her nose. “Remind me why we’re not sticking to liquor stores? At least adults expect to get robbed.” “Because liquor stores have cameras, alarms, and sober managers. This place has a bouncy castle, a hired magician who’s probably drunk, and a gift table loaded with PlayStations and Lego sets worth more per ounce than cocaine. We fence the haul, we retire to Mexico. Think of the margaritas, Mol.” “I’m thinking of tiny handcuffs,” she muttered. “And therapy. Years of therapy. ‘Tell me again why you stole Timmy’s birthday bike, Molly.’” Bill leaned over and patted her shoulder. “You wear the sparkly wig, I’ll handle the sack. We go in left of the balloon arch, circle behind the face-painting station, and boom—presents are ours. If anyone asks, we’re collecting donations for the Clown College Scholarship Fund.” Molly stared at him for a long beat. “There is no Clown College Scholarship Fund.” “There is now. I printed business cards.” He flashed a glittery rectangle that read ‘Bubbles & Sparkles: Making Dreams Come True (and Slightly Sad).’ She groaned but took the card. “If we get caught, I’m blaming everything on your stupid clown nose.” “Fair.” Bill started the van. “Ready to ruin some childhoods for profit?” Molly pulled on the rainbow wig, sighed deeply, and checked her lockpicks. “I hate how much I love this plan. Let’s go steal Christmas in April.” “That’s the spirit, Sparkles.” “Call me that again and the only thing getting stolen is your ability to reproduce.”
created_at
"2026-05-01T12:53:53.218Z"
5/1/2026, 12:53:53 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
10588
moderation_latency
(null)
generation_time
20927
tokens_prompt
77
tokens_completion
769
native_tokens_prompt
182
native_tokens_completion
628
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0016631
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"23399b78-2b4f-965d-97b6-8661a1f20325"
is_byok
false
latency
97
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777640033-ktWdcPC3iCCK3mDy3HhT"
session_id
(null)
api_type
"completions"
id
"gen-1777640033-1tXnWICxoOteiNqlvB60"
upstream_id
"23399b78-2b4f-965d-97b6-8661a1f20325"
total_cost
0.0016631
cache_discount
0.0001344
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
456
0.00%
Dialogue to Total Word Ratio
Ratio: 76.20%, Deviation: 26.20%
neededClean
false
wordsTotal
458
wordsDialogue
349
0.0000%