NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZJYT2N8JDB5NRZAH34V4P
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
35.4s
Cost:
$0.058332
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
75.16%
Matches word count
(details)
99.56%
Dialogue to Total Word Ratio
(details)
87.3560%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The two thieves had been planning this caper for days but the target made it all the more absurd. In the dimly lit back alley hideout, Molly and Bill hunched over their map. The air smelled of old pizza and anticipation. Molly peered at the stolen invitation. "Bill, this party has twenty kids and presents. We strike during musical chairs." Bill rubbed his chin as he adjusted his glasses. "Musical chairs? Molly, how do we avoid the tattle tales?" With a mischievous glint, Molly continued. "We dress as the pony. I'll be the front, you the back. No one will suspect the pony." "The pony? Have you lost your mind? Last time I got peed on by a dog," Bill exclaimed. Trying to calm him, she put a hand on his shoulder. "Stop complaining. Once we have the presents, we vanish into the crowd of parents. Act like we are returning a gift." Bill looked skeptical. "Returning a gift? With a sack full? You are crazy but I like it. What about the birthday boy?" "He will get over it. We leave him a sock. Now let's practice our neighs," Molly laughed. Bill shook his head but smiled. The plan was stupid but it might just work in the most hilarious way possible.
created_at
"2026-03-12T21:35:04.011Z"
3/12/2026, 9:35:04 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
34537
moderation_latency
(null)
generation_time
35172
tokens_prompt
77
tokens_completion
302
native_tokens_prompt
184
native_tokens_completion
9699
native_tokens_completion_images
0
native_tokens_reasoning
9435
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0583316
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"0a0b8b63-5df3-9e39-4937-44395e78b92f"
is_byok
false
latency
293
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351304-70ggKYj4isXUkIuyv7vC"
upstream_id
"0a0b8b63-5df3-9e39-4937-44395e78b92f"
total_cost
0.0583316
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
75.16%
Matches word count
n/a
neededClean
false
words
213
99.56%
Dialogue to Total Word Ratio
Ratio: 52.58%, Deviation: 2.58%
neededClean
false
wordsTotal
213
wordsDialogue
112
87.3560%