NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZM11J2EG55S444P22496N
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
2.1s
Cost:
$0.001748
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
97.63%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
48.8138%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her fake mustache and peered through the hedge at the bouncing castle. "This is insane, Bill. We're not stealing from a bank, we're crashing a kid's party. What if they have a piƱata guard?" Bill grinned, checking his clown wig. "Relax, Mol. We go in as entertainers. I juggle, you do face painting. While the little monsters are distracted by cake, we bag the presents. Simple." "Simple? There's like thirty screaming six-year-olds in there," Molly hissed. "One of them looks like he could take us both. And is that a magician? We hate magicians." "Exactly why it's perfect," Bill whispered. "Nobody suspects the clowns. We'll blend right in. You grab the gift table, I'll cause a distraction with my balloon animals. Who doesn't love balloon animals?" "I don't love balloon animals, Bill. And what about the parents? They're not idiots." "Parents? They'll be too busy filming everything for Instagram. By the time they notice the presents are gone, we'll be halfway to the ice cream truck with enough loot to retire." Molly sighed. "If we get caught because some kid wants his face painted like a dinosaur, I'm blaming you." "Deal. Now quit whining and practice your silly voice. Time to steal Christmas... in June."
created_at
"2026-03-12T21:35:39.063Z"
3/12/2026, 9:35:39 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
288
moderation_latency
(null)
generation_time
2003
tokens_prompt
77
tokens_completion
322
native_tokens_prompt
182
native_tokens_completion
269
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0017476
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"2c70ccde-8369-6e12-789b-9501c09c42ab"
is_byok
false
latency
243
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351339-UTDAWnjWymZJletTMACi"
upstream_id
"2c70ccde-8369-6e12-789b-9501c09c42ab"
total_cost
0.0017476
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
97.63%
Matches word count
n/a
neededClean
false
words
207
0.00%
Dialogue to Total Word Ratio
Ratio: 87.56%, Deviation: 37.56%
neededClean
false
wordsTotal
209
wordsDialogue
183
48.8138%