NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZH0GB8WW3NPXP6DTWCTFE
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
36.0s
Cost:
$0.054552
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
14.30%
Matches word count
(details)
71.22%
Dialogue to Total Word Ratio
(details)
42.7607%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In their cluttered hideout filled with maps, disguises, and half-eaten sandwiches, Molly and Bill prepared to execute their latest comedic heist. The children's party next door was the perfect target for their scheme to steal all the presents. As authors of many successful comedic heist stories would know, the best plots come from the most unlikely scenarios, and this one was no exception. The idea to target a children's party came to them after a night of brainstorming over cheap wine and even cheaper pizza. They saw it as the ultimate challenge to their skills of deception and quick thinking. Molly adjusted her notes while Bill practiced his silly walk. "Bill, are you ready to discuss the plan for stealing the presents at the children's party?" Molly asked. Bill looked up from his clown shoes with a worried expression, his face pale at the thought of facing dozens of excited youngsters. "Yes, but I still think this is crazy," Bill responded. Molly rolled her eyes and pointed at the detailed drawing she had made of the backyard layout, complete with marked entry and exit points. "Crazy but lucrative. We dress as clowns and enter during the party to steal all the presents," Molly said. The mention of clowns made Bill shudder slightly, recalling a previous job where his makeup had melted under the hot sun, leading to a messy escape. "How do we avoid detection from the children and the adults?" Bill questioned. With a confident smile, Molly leaned forward and outlined the next steps in their plan, emphasizing the importance of timing. "By keeping them entertained while I grab the gifts one by one," Molly answered. Bill nodded but still had concerns about the vigilant eyes of the parents who might be taking photos or videos of the event. "What about the parents watching everything?" Bill said. She had an answer for every doubt, pulling out a stack of fake business cards for a fictional party entertainment company. "We tell them we're part of the show and they will believe it," Molly replied. The discussion continued as they considered all possible scenarios that could derail their operation in the middle of the festivities. "And the getaway route with all the stolen presents?" Bill inquired. Pointing to the back alley on their map, she described the vehicle they would use for a swift departure. "Through the back gate with the bags marked as party favors so no one looks twice," Molly explained. Despite the detailed planning, Bill couldn't shake the feeling that this job was more suited for a cartoon than real life criminals like themselves. "This could go wrong in so many funny ways but I am committed," Bill noted. Molly patted him on the back, appreciating his dedication to the cause even if it was laced with doubt. "Now you're getting into the spirit of this comedic heist," Molly said with a chuckle. As they wrapped up their meeting, the sounds of children playing could be heard in the distance, spurring them into final preparations. "Remember, if we get caught we can always say it's a prank," Bill added.
created_at
"2026-03-12T21:34:00.208Z"
3/12/2026, 9:34:00 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
34202
moderation_latency
(null)
generation_time
35744
tokens_prompt
77
tokens_completion
805
native_tokens_prompt
184
native_tokens_completion
9069
native_tokens_completion_images
0
native_tokens_reasoning
8470
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0545516
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"480cdde0-2540-4cee-2f6d-2d00ffa9a424"
is_byok
false
latency
405
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351240-HJfKfb46B6s34UGaVi7K"
upstream_id
"480cdde0-2540-4cee-2f6d-2d00ffa9a424"
total_cost
0.0545516
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
14.30%
Matches word count
n/a
neededClean
false
words
521
71.22%
Dialogue to Total Word Ratio
Ratio: 27.39%, Deviation: 2.61%
neededClean
false
wordsTotal
522
wordsDialogue
143
42.7607%