NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHSKPG2THHKQ4EW1271Z5X1
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
6.7s
Cost:
$0.000908
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
27.17%
Matches word count
(details)
36.79%
Dialogue to Total Word Ratio
(details)
31.9769%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Bill, you’re sure this is the one? A kiddie party? We’re stealing from toddlers now?” “Toddlers with rich parents, Molly. Those presents are stacked like pirate treasure. We hit it during musical chairs, in and out.” “I draw the line at wearing the clown suit. I’m not painting my face.” “You’re not the clown, I’m the clown. You’re the balloon artist. Nobody suspects balloon artists. We slip in, I do the funny walk, you palm the gift bags while they’re distracted by my balloon poodle.” “Your balloon poodle looks like a sausage with legs. Last time a kid cried.” “Exactly. Tears create chaos. Chaos is our friend. While they’re consoling little Timmy, we vacuum up the loot. Think of it as early redistribution of wealth.” “I’m thinking more like we’re going to hell, Bill. What’s next, nicking teeth from under pillows?” “Don’t tempt me. Focus. You enter from the garden pretending to deliver the bouncy castle. I burst out of the cake yelling ‘It’s magic time!’ Then we both yell ‘Pin the tail on the donkey’ and start stuffing sacks.” “You’ve actually practiced yelling ‘It’s magic time,’ haven’t you?” “Every morning in the shower. You in or not?” “Fine. But if I end up twisting balloon animals while you moonwalk in size-twenty shoes, I’m keeping the biggest teddy bear.”
created_at
"2026-05-01T12:53:24.103Z"
5/1/2026, 12:53:24 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
395
moderation_latency
(null)
generation_time
6650
tokens_prompt
77
tokens_completion
372
native_tokens_prompt
182
native_tokens_completion
299
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0009078
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"e9e64b70-1624-0a15-11fa-c975dbad1499"
is_byok
false
latency
98
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777640004-jDPjuY1ljkJXnC9EbIJU"
session_id
(null)
api_type
"completions"
id
"gen-1777640004-OscIwePVm0102pe9l6Ax"
upstream_id
"e9e64b70-1624-0a15-11fa-c975dbad1499"
total_cost
0.0009078
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
27.17%
Matches word count
n/a
neededClean
false
words
219
36.79%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
220
wordsDialogue
220
31.9769%