NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ7EB6GS084M8WCVR2A6HQ
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
57.7s
Cost:
$0.080886
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
5.26%
Dialogue to Total Word Ratio
(details)
2.6305%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In the cluttered apartment that served as their headquarters for all things illegal, Molly and Bill prepared for their most unusual heist yet. The walls were lined with shelves of various disguises and tools from previous jobs. A large map of the neighborhood was pinned to the wall with the target house marked in red. Molly sat at the table with a pen in hand while Bill stood by the window looking out nervously. "Let's talk about how we are going to do this," Molly said. "Do you really think we should target a children's party for the presents?" Bill asked, turning from the window to face her. "It is the perfect opportunity because no one will expect it," Molly said. Bill moved closer to the table and sat down opposite her. "What is our disguise going to be?" Bill asked. "We will go as clowns with full makeup and costumes," Molly said. "That sounds humiliating but effective," Bill commented, rubbing his hands together. Molly pushed the map towards him. "We arrive early to set up," Molly said. "How do we steal the presents without getting caught by the birthday child?" Bill asked. "During the cake cutting when all attention is on the birthday song," Molly said. Bill looked at the map carefully. "The van should be parked close," Bill said. "Yes, close enough to make quick trips," Molly said. "What if a kid sees us taking his gift?" Bill asked with concern. "We say it is part of a magic trick and give him a balloon instead," Molly said. Bill leaned back in his chair. "This could actually work," Bill said. "I told you it would," Molly said. They both looked at each other and smiled. "Let's go over the list of items we need to bring," Molly said. "Noses, wigs, and oversized shoes," Bill listed. "And the bag for the loot," Molly added. "This is going to be fun in a weird way," Bill said. "Just remember to stay in character the entire time," Molly warned. Bill nodded his head. "I will be the best clown thief ever," Bill said. The planning session lasted for quite some time as they ironed out all the potential issues that could arise during the execution of their plan. They discussed everything from what to do if it rained to how to handle a crying child. By the end, they had a solid plan that they believed would allow them to successfully steal all the presents from the children's party in a hilarious and unexpected manner that fit their comedic heist style.
created_at
"2026-03-12T21:28:46.699Z"
3/12/2026, 9:28:46 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
56150
moderation_latency
(null)
generation_time
57413
tokens_prompt
77
tokens_completion
648
native_tokens_prompt
184
native_tokens_completion
13458
native_tokens_completion_images
0
native_tokens_reasoning
12956
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0808856
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"55d8d933-b155-5f52-4c7c-0b66d6e56f6d"
is_byok
false
latency
254
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350926-40d8FDqaIRAluQHmQM6Z"
upstream_id
"55d8d933-b155-5f52-4c7c-0b66d6e56f6d"
total_cost
0.0808856
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
430
5.26%
Dialogue to Total Word Ratio
Ratio: 42.33%, Deviation: 7.67%
neededClean
false
wordsTotal
430
wordsDialogue
182
2.6305%