NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHT6QPY0J1FEY9AGZPMNNZH
Run Details
Model:
Grok 4.20 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
5m 2s
Cost:
$0.051873
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
95.99%
Matches word count
(details)
99.86%
Dialogue to Total Word Ratio
(details)
97.9250%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The two thieves had been planning this heist for a long time and now the moment was upon them. In the shadow of the large oak tree near the back yard, the pair reviewed their notes one last time before launching into action with nervous energy and excitement. Following weeks of careful preparation the two bumbling thieves found themselves ready for the task. Molly and Bill crouched behind the bushes watching the party. "This is it Bill. We move when the clown comes out," Molly said. "I can't believe we're stealing kids presents. We must be desperate," Bill replied. "Desperate times. You distract with dance I'll snatch the goods," Molly said. "Why do I get the humiliating jobs? Okay if a kid cries I'm out," Bill said. "No one will cry. The signal is the balloon popping then we run with the loot," Molly instructed. "This better be worth it. I feel like a cartoon villain," Bill muttered. "You take the left side of the table I'll take the right," Molly said. "We are the villains in this story. Let's go steal some joy from the children," Bill replied as they moved toward the door with their bags ready. The comical pair hoped their plan would not backfire.
created_at
"2026-05-01T13:03:47.941Z"
5/1/2026, 1:03:47 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
298549
moderation_latency
(null)
generation_time
301470
tokens_prompt
77
tokens_completion
17658
native_tokens_prompt
184
native_tokens_completion
20684
native_tokens_completion_images
(null)
native_tokens_reasoning
20435
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0518728
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"24ae1f6f-dfde-ed90-0cc0-59d0afdf9b67"
is_byok
false
latency
105
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777640627-YNDkj7zmkCN8uHHWLY4S"
session_id
(null)
api_type
"completions"
id
"gen-1777640627-d5OBIfuJQLP4mgsZeLV8"
upstream_id
"24ae1f6f-dfde-ed90-0cc0-59d0afdf9b67"
total_cost
0.0518728
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
95.99%
Matches word count
n/a
neededClean
false
words
208
99.86%
Dialogue to Total Word Ratio
Ratio: 48.08%, Deviation: 1.92%
neededClean
false
wordsTotal
208
wordsDialogue
100
97.9250%