NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZCW8NXMESTCXX2HWSB9XP
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
1.9s
Cost:
$0.001700
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.9995%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her oversized clown wig and grinned at Bill across the cluttered van. "Okay, genius. We hit the party at peak present time. Right after cake but before the piƱata. Those kids will be sugar-high and distracted." Bill peeked through the peephole at the bouncing castle. "This feels wrong, Mol. Stealing from toddlers? What are we, monsters with a death wish?" "They're tiny billionaires with too many toys," Molly whispered, checking her fake mustache. "Think of it as redistribution of wealth. Robin Hood with better outfits." "I still say the clown shoes are a mistake," Bill muttered, lifting one massive red foot. "How am I supposed to run in these if things go south?" "You won't run. You'll waddle with dignity. Now remember the plan: I juggle, you cry like a disappointed toddler so they give you the presents to cheer you up. Classic misdirection." Bill sighed. "If my mother calls, tell her I'm at book club." "Book club? With the balloon animals?" Molly smirked. "Shut up. Fine, let's go steal Barbie's empire. But if a kid bites me, I'm keeping the whole haul." "Deal. Now paste on that smile, partner. Time to ruin some birthdays for profit."
created_at
"2026-03-12T21:31:44.793Z"
3/12/2026, 9:31:44 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
377
moderation_latency
(null)
generation_time
1771
tokens_prompt
77
tokens_completion
307
native_tokens_prompt
182
native_tokens_completion
261
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0016996
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"865f1fa5-4731-09e5-a297-beb185558b9d"
is_byok
false
latency
308
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351104-BspNiwtLPnjyo8KItKfS"
upstream_id
"865f1fa5-4731-09e5-a297-beb185558b9d"
total_cost
0.0016996
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
199
0.00%
Dialogue to Total Word Ratio
Ratio: 80.00%, Deviation: 30.00%
neededClean
false
wordsTotal
200
wordsDialogue
160
49.9995%