NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ6TB8GDEKCXVV3NGNBGJX
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
23.4s
Cost:
$0.039258
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
99.98%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
99.9909%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In the shadowy corner of an abandoned playground, Molly and Bill crouched behind a slide, their eyes fixed on the nearby house where a children's party was in full swing. The air was filled with the scent of popcorn and the sounds of joyful squeals. These two seasoned thieves had sunk to a new low, targeting birthday presents from innocent kids, but the economy was bad and the pawn shops were paying top dollar for unused toys. Molly adjusted her oversized bow tie while Bill checked the contents of his utility belt filled with balloons and whoopee cushions. Their plan involved infiltrating the party as surprise clowns. They would perform a few silly acts to win over the crowd, then during the gift opening, they would swap the real presents with empty boxes and make off with the goods in their tricked-out ice cream truck. It seemed foolproof, except for the potential of tantrums and parental interference. The duo reviewed their timing one last time. Everything hinged on the moment the birthday cake arrived. "We hit the presents after cake," said Molly. "Right, but avoid the pony," Bill answered. "Got it, let's not get caught by toddlers," she added. They high-fived awkwardly.
created_at
"2026-03-12T21:28:26.221Z"
3/12/2026, 9:28:26 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
22702
moderation_latency
(null)
generation_time
23320
tokens_prompt
77
tokens_completion
298
native_tokens_prompt
184
native_tokens_completion
6520
native_tokens_completion_images
0
native_tokens_reasoning
6280
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0392576
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"10261f3e-81d5-21a2-35f8-a4228f5bfabb"
is_byok
false
latency
266
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350906-ba0HR9ko2epidyDKWaCZ"
upstream_id
"10261f3e-81d5-21a2-35f8-a4228f5bfabb"
total_cost
0.0392576
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.98%
Matches word count
n/a
neededClean
false
words
202
100.00%
Dialogue to Total Word Ratio
Ratio: 9.31%, Deviation: 0.69%
neededClean
false
wordsTotal
204
wordsDialogue
19
99.9909%