NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHSH13D1BBHFE5Y10CZ86BF
Run Details
Model:
Grok 4.20 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
1m 27s
Cost:
$0.016853
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
90.48%
Matches word count
(details)
92.75%
Dialogue to Total Word Ratio
(details)
91.6159%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Bill, this plan is insane but tell me again how we're going to break into that children's party and steal all the presents without getting caught," said Molly. "There will be parents and vicious kids everywhere." "Easy Molly, we arrive as surprise entertainers, a pair of bumbling magicians," replied Bill. "While I do card tricks you load the bags and if anyone asks it's all part of the show. We fake it like always. The kids won't know the difference and the adults will be too busy to notice. Plus we bring our own cake with a file in it just in case." "A file in the cake? That's for prison breaks not birthday parties you moron," said Molly. "This is the worst idea since we tried to steal the Easter bunny. Why are we doing this?" "Imagine the loot Molly all those new bikes computers and stuffed animals we can sell for top dollar. It's perfect," said Bill. "We'll be in and out before the cake is cut. No one will be the wiser. Ready partner?" "Ready as I'll ever be for this madness but if a kid cries I'm out. Let's review the escape route before we put on those polka dot suits and ridiculous noses," said Molly.
created_at
"2026-05-01T12:51:56.658Z"
5/1/2026, 12:51:56 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
85526
moderation_latency
(null)
generation_time
87118
tokens_prompt
77
tokens_completion
1712
native_tokens_prompt
184
native_tokens_completion
6676
native_tokens_completion_images
(null)
native_tokens_reasoning
6427
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0168528
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"c9973f45-58a4-5ca4-d543-06e9a176a890"
is_byok
false
latency
62
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777639916-hDxE4n7bHnfaiZfo7Xkq"
session_id
(null)
api_type
"completions"
id
"gen-1777639916-kwCaiTG4f1fPbvA9Ffvo"
upstream_id
"c9973f45-58a4-5ca4-d543-06e9a176a890"
total_cost
0.0168528
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
90.48%
Matches word count
n/a
neededClean
false
words
210
92.75%
Dialogue to Total Word Ratio
Ratio: 95.24%, Deviation: 5.24%
neededClean
false
wordsTotal
210
wordsDialogue
200
91.6159%