NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ4NHA0Y8TDXQ39W89TN2A
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
47.1s
Cost:
$0.060576
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
99.74%
Matches word count
(details)
88.45%
Dialogue to Total Word Ratio
(details)
94.0991%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill sat plotting their next move. "Bill, we are breaking into the children's party to steal all the presents. Get your clown suit ready because we're going in as entertainers. Distract with jokes and balloons while I swipe the gifts. Ignore the screams and focus on the gift table." "Molly, this is crazy. What if the kids see through our disguises? They have eagle eyes for fakes. And the mom with the camera will photograph everything." "Ignore that, Bill. We'll be in costumes from the theater shop. It's perfect. The presents will be ours by the time they play musical chairs. Think of the money from selling video games and dolls." "But it's a kid's party, Molly. I feel bad." "No way. All or nothing. Stop being a wimp, Bill. This is the heist of the century for us. Are you with me or not?" "Yeah, I'm with you but if we get busted it's your fault." "Good, now let's go over the exit strategy again. The backyard fence is our way out after we load up." "The fence? Last time I ripped my pants on one. But okay, Molly, if you say so."
created_at
"2026-03-12T21:27:15.758Z"
3/12/2026, 9:27:15 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
46511
moderation_latency
(null)
generation_time
47029
tokens_prompt
77
tokens_completion
275
native_tokens_prompt
184
native_tokens_completion
10073
native_tokens_completion_images
0
native_tokens_reasoning
9830
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0605756
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"5c3ce3b1-586a-c8ae-d273-b33ae359ab62"
is_byok
false
latency
288
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350835-aNnUaylMKkFLWj3Rs6Gd"
upstream_id
"5c3ce3b1-586a-c8ae-d273-b33ae359ab62"
total_cost
0.0605756
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.74%
Matches word count
n/a
neededClean
false
words
196
88.45%
Dialogue to Total Word Ratio
Ratio: 95.92%, Deviation: 5.92%
neededClean
false
wordsTotal
196
wordsDialogue
188
94.0991%