NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZA6AS1WVZ6WAJ0ZTG5GGW
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
35.5s
Cost:
$0.052482
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
61.94%
Dialogue to Total Word Ratio
(details)
30.9695%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Bill, this is the big one. We break into the children's party on Saturday and swipe every single present from the table," Molly said. "They'll have a piƱata and everything but while the kids are swinging at it blindfolded we'll be filling our bags with the good stuff. What do you say, are you in?" "I'm in but we need a solid plan, Molly. How do we get past the parents? They will be everywhere watching their precious darlings," Bill said. "Easy, we pose as the entertainment. You juggle and I do face painting. By the time they realize the presents are gone we'll be long gone in our getaway van disguised as an ice cream truck. Think of all the video games and bicycles we can sell on the black market. It's brilliant, I tell you." "But stealing from children it feels wrong. Maybe we should target a bank instead like normal thieves." "Normal is boring, Bill. This is comedic heist material. Kids parties are chaotic. No one will suspect a thing until it's too late. We just need to make sure we don't eat the cake or we'll get caught with crumbs on our faces. Are you with me or not?" "Yes, I'm with you. Let's do this thing." "Bill, just imagine the look on their faces when they see the empty table. It'll be hilarious." "Molly, you're right. This heist will be our funniest yet."
created_at
"2026-03-12T21:30:16.798Z"
3/12/2026, 9:30:16 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
34625
moderation_latency
(null)
generation_time
35403
tokens_prompt
77
tokens_completion
336
native_tokens_prompt
184
native_tokens_completion
8724
native_tokens_completion_images
0
native_tokens_reasoning
8435
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0524816
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"fb620e73-fce4-50f3-e316-962e468c5df3"
is_byok
false
latency
285
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351016-AA9oSKUDCqraFfv1YOTs"
upstream_id
"fb620e73-fce4-50f3-e316-962e468c5df3"
total_cost
0.0524816
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
238
61.94%
Dialogue to Total Word Ratio
Ratio: 98.32%, Deviation: 8.32%
neededClean
false
wordsTotal
238
wordsDialogue
234
30.9695%