NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZRCA5T7JRKS8WJS0TX5J5
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
1.9s
Cost:
$0.001748
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
99.98%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.9920%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly, you gotta be kidding me. A kiddie party? With clowns and juice boxes? It’s genius, Bill. They’ll have a mountain of presents stacked like a pirate’s hoard. We slip in during musical chairs, grab the lot, and vanish before the cake’s cut. What about the parents? They’ll spot us like two raccoons in a sandbox. We wear the matching tracksuits from the last job, say we’re the hired entertainers. You’re Balloon Bill, I’m Magic Molly. I pull a quarter from a kid’s ear, you twist balloon swords. While they’re distracted we bag the gifts. These are children’s presents, Mol. There’ll be Barbies and plastic trucks. My fence won’t touch that junk. We’re not fencing them, genius. We’re swapping the real ones with identical boxes filled with our old socks and that expired tuna we found. The kids open them later, chaos ensues, parents blame the cheap toy company. We’re long gone with the good stuff. You’re evil. I love it. Good. Now practice your goofy voice. We hit the party at three. And Bill? Yeah? No sampling the cake this time. Last job you had blue frosting on your ski mask for a week. Fair enough. Let’s go ruin some birthdays.
created_at
"2026-03-12T21:38:01.674Z"
3/12/2026, 9:38:01 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
363
moderation_latency
(null)
generation_time
1765
tokens_prompt
77
tokens_completion
317
native_tokens_prompt
182
native_tokens_completion
269
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0017476
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"f2a6922f-cf25-f1fd-6514-7e335f2be1eb"
is_byok
false
latency
324
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351481-5b2fRbdl3ao1LC5TWOiy"
upstream_id
"f2a6922f-cf25-f1fd-6514-7e335f2be1eb"
total_cost
0.0017476
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.98%
Matches word count
n/a
neededClean
false
words
202
0.00%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 90.00%
neededClean
false
wordsTotal
202
wordsDialogue
0
49.9920%