NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ15J2P04EGDF506D6BJ7K
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
2.0s
Cost:
$0.001976
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
35.00%
Matches word count
(details)
98.98%
Dialogue to Total Word Ratio
(details)
66.9912%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly rubbed her hands together with glee. “Okay, Bill, we hit the bouncy castle at exactly three-fifteen, right when they bring out the cake. Those little monsters will be hypnotized by sugar.” Bill frowned. “Molly, it’s a children’s party. There’s a guy dressed as a dinosaur handing out juice boxes. This feels… wrong.” “Wrong? It’s genius. We slip in as the hired magicians. I’ll yell ‘Abracadabra’ while you stuff the goody bags into the fake rabbit.” “I’m not being the rabbit again. Last time the tail got stuck in a fence.” “Fine, you’re the sad clown. I’ll be the sparkly assistant. We wait for ‘Happy Birthday,’ then—poof—we vanish with every wrapped box under the table.” Bill sighed. “What if a kid cries? I can’t handle tears, Mol. I once cried during a toy commercial.” “You big baby. We’ll leave one cheap truck so they don’t riot. The rest go in the van marked ‘Lost Property.’” “Lost Property? That’s your plan?” “Best I got on short notice. Look, we grab the presents, ditch the clown noses, and split the loot at the ice-cream truck. Deal?” Bill groaned. “If we get chased by angry parents, I’m throwing you under the bouncy castle first.” Molly grinned. “That’s the spirit. Now practice your goofy laugh. We strike at cake time.”
created_at
"2026-03-12T21:25:21.095Z"
3/12/2026, 9:25:21 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
256
moderation_latency
(null)
generation_time
1867
tokens_prompt
77
tokens_completion
364
native_tokens_prompt
182
native_tokens_completion
307
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0019756
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"9e244e75-3bb1-a4cd-ee2c-b13f42c39db4"
is_byok
false
latency
241
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350721-deYvfpW7v0odh9fjKgxd"
upstream_id
"9e244e75-3bb1-a4cd-ee2c-b13f42c39db4"
total_cost
0.0019756
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
35.00%
Matches word count
n/a
neededClean
false
words
218
98.98%
Dialogue to Total Word Ratio
Ratio: 93.18%, Deviation: 3.18%
neededClean
false
wordsTotal
220
wordsDialogue
205
66.9912%