NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZGSCFB4D0MB12418VZHVT
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
1.7s
Cost:
$0.001484
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.03%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0152%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly, you’re joking. A children’s party? We’re not stealing from toddlers. Oh, grow a spine, Bill. It’s perfect. Twenty screaming kids, one distracted mom in a paper crown, and a mountain of brightly wrapped presents nobody’s guarding. We’ll be in and out before the ice cream melts. We’ll look like idiots in those tiny chairs. We’re not sitting, genius. You’re the clown. I already rented the suit. Big red nose, floppy shoes, the works. You show up as the “magician.” Pull a rabbit out of your hat, then shove the presents in the fake bottom while they clap. What if one of them recognizes us? From where, our extensive rap sheet of birthday parties? Relax. I’ll keep the little monsters busy with balloon animals. You swoop in, grab the haul, and we vanish before “Happy Birthday” even finishes. This is the dumbest score we’ve ever done. It’s the safest. Who calls the cops on the clown? Now shut up and practice your disappearing act, Bill. We strike at cake time.
created_at
"2026-03-12T21:33:52.916Z"
3/12/2026, 9:33:52 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
314
moderation_latency
(null)
generation_time
1456
tokens_prompt
77
tokens_completion
267
native_tokens_prompt
182
native_tokens_completion
225
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0014836
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"2a6e9980-bafe-965e-81b3-43d167110951"
is_byok
false
latency
276
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351232-CRFwCd5wjFr9huLUV4eU"
upstream_id
"2a6e9980-bafe-965e-81b3-43d167110951"
total_cost
0.0014836
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.03%
Matches word count
n/a
neededClean
false
words
170
0.00%
Dialogue to Total Word Ratio
Ratio: 1.76%, Deviation: 88.24%
neededClean
false
wordsTotal
170
wordsDialogue
3
0.0152%