NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZF84E7146KMQ91XRHSFCJ
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
34.5s
Cost:
$0.041424
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
99.92%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.9595%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly, are you really serious about this job? Breaking into a children's party to steal all the presents sounds like a nightmare waiting to happen. Come on Bill, it's perfect. The security is nonexistent. Just a bunch of balloons and streamers. We dress as clowns, blend right in. You do your silly juggling act and I'll be the one collecting the presents for the magic show. Poof, they disappear into our bag. But what if the children get upset? I can't stand seeing little ones cry over their lost toys. Maybe we should rethink this heist. Bill, stop being such a baby yourself. These kids get too many presents anyway. They'll never miss a few. Plus, we're leaving the cake. That's more important to them than some stupid dolls and trucks. Now, the plan is to arrive at two thirty during the games. While they're doing the limbo, we strike the gift table. Two thirty, got it. And the getaway? I assume the van is ready with the fake plates. Yes, parked in the alley. We change out of the costumes quick and drive away like normal parents leaving the party. No one will suspect the clowns.
created_at
"2026-03-12T21:33:02.482Z"
3/12/2026, 9:33:02 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
33805
moderation_latency
(null)
generation_time
34345
tokens_prompt
77
tokens_completion
273
native_tokens_prompt
184
native_tokens_completion
6881
native_tokens_completion_images
0
native_tokens_reasoning
6646
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0414236
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"a27bea7c-2ed0-e249-cd7c-03729d5b8fa2"
is_byok
false
latency
494
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351182-dlE9VJx88LWGSodteocc"
upstream_id
"a27bea7c-2ed0-e249-cd7c-03729d5b8fa2"
total_cost
0.0414236
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.92%
Matches word count
n/a
neededClean
false
words
197
0.00%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 90.00%
neededClean
false
wordsTotal
197
wordsDialogue
0
49.9595%