NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHYRKW8W1Q8W4T424BVJ086
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
4.7s
Cost:
$0.004527
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
20.19%
Matches word count
(details)
0.04%
Dialogue to Total Word Ratio
(details)
10.1157%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her oversized sunglasses and leaned across the wobbly diner booth. “Okay, Bill, listen. We hit the party at exactly three-fifteen, right when they’re singing Happy Birthday. Nobody’s watching the gift table during that off-key massacre.” Bill slurped his coffee and grinned like a man who’d already spent the loot. “You’re a genius, Mol. Pure evil wrapped in sparkly logic. But how do we blend in? We look like we rob banks for a living.” “We look like we rob banks because we do rob banks, you walnut. That’s why I bought the props.” Molly slid a plastic bag across the table. “You’re going as the world’s saddest clown. I’m the balloon artist. Kids won’t suspect a thing.” Bill peeked inside and recoiled. “A clown? With the red nose and everything? I’ll have nightmares about myself.” “Better than the alternative. Last time you tried blending in you wore that fake mustache and the birthday mom thought you were her ex-husband. We barely escaped with the cake.” “Fair point,” Bill muttered. “Fine, clown it is. What’s the exit strategy? Those ankle-biters are fast when sugar’s involved.” Molly tapped a crude map drawn on a napkin. “Back door by the bounce house. I deflate it while you load the presents into the fake ambulance we’re borrowing. Parents see a clown and a paramedic carrying boxes, they assume it’s part of the show. We’re basically performance artists with larceny credentials.” Bill chuckled. “You’re enjoying this way too much. We’re stealing teddy bears and junior chemistry sets, not diamonds.” “Sentimental value, Bill. Those presents are worth their weight in parental guilt. Plus little Timmy’s getting a PlayStation 5. I saw the box. We can flip that in twenty minutes flat.” “Timmy’s gonna cry,” Bill said, almost sounding sorry. “Timmy cries when the juice box is warm. He’ll survive. We’ll leave him a balloon animal as restitution. Maybe a sad little giraffe to match his emotional state.” Bill leaned back, arms crossed. “You’ve thought of everything except the dog. There’s always a dog at these things, Mol. Some yappy little monster that sniffs out guilt like it’s bacon.” Molly waved a dismissive hand. “I’ve got bacon-flavored tranquilizer treats. Non-harmful, just makes him dopey and friendly. He’ll think we’re his new best friends.” “You’re terrifying when you’re organized.” “Thank you. Now, important question: do we take the piñata too? There’s candy in there and I’ve got a sweet tooth.” Bill stared at her. “We’re not animals, Molly. The piñata stays. We have standards.” “Since when?” “Since never, but I’m drawing the line at destroying a paper donkey in front of thirty screaming children. My conscience has limits.” Molly sighed theatrically. “Fine. Piñata lives. But the presents are ours. We hit hard, we hit fast, and we leave with enough loot to finally buy that taco truck we’ve been talking about.” Bill raised his coffee mug in a toast. “To robbing children and achieving our dreams.” Molly clinked her mug against his. “And to never speaking of this job again after today.” “Deal,” Bill said, already laughing. “Let’s go make some memories they’ll need therapy for.”
created_at
"2026-03-12T21:20:41.134Z"
3/12/2026, 9:20:41 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
792
moderation_latency
(null)
generation_time
4103
tokens_prompt
77
tokens_completion
889
native_tokens_prompt
182
native_tokens_completion
713
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0045268
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"bebdb4d9-40fd-3163-4fa4-4818e20e429f"
is_byok
false
latency
792
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350441-Lj9vHPNMgEotrdYMTws1"
upstream_id
"bebdb4d9-40fd-3163-4fa4-4818e20e429f"
total_cost
0.0045268
cache_discount
0.0001152
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
20.19%
Matches word count
n/a
neededClean
false
words
520
0.04%
Dialogue to Total Word Ratio
Ratio: 82.48%, Deviation: 12.48%
neededClean
false
wordsTotal
525
wordsDialogue
433
10.1157%