NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZHAMAX59WV80SKCTE3V3W
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
53.9s
Cost:
$0.080034
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.52%
Dialogue to Total Word Ratio
(details)
0.2594%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In a dimly lit apartment not far from the suburban neighborhood where the party was to be held, Molly and Bill were deep in planning mode for their most unusual heist to date. The table was covered with photos of the party location they had taken over several days, lists of expected guests based on social media posts and ideas for disguises they had collected from various stores. The children's party was expected to have over twenty kids and their parents, providing the perfect cover for their activities amid the chaos of games and food. Molly was the more cautious of the two, always thinking ahead to possible complications like nosy neighbors or alert parents. Bill was the optimistic one, seeing the potential for easy gains and a good story to tell later. They had chosen this target because it was unexpected and the loot was portable and likely to be valuable in the secondary market for children's items. "Bill, this is crazy even for us. Breaking into a children's party?" Molly said. "It's not breaking in if we blend in. We'll be invisible," Bill said. "Invisible in clown suits? I doubt it. What is the signal for when to start?" Molly said. "The cake cutting. That's when all attention is on the birthday boy," Bill said. "Then we move fast to the gift table and take them all," Molly said. "Exactly. The wagon is ready in the bushes. Load and go," Bill said. "What if a kid cries? That could ruin everything," Molly said. "Kids cry all the time at these things. No one will notice," Bill said. "I guess you're right. Let's go over the exit strategy again," Molly said. The planning took several hours as they considered all angles and potential pitfalls. They mapped out the yard in detail on a large sheet of paper, noting where each activity was likely to happen based on their reconnaissance. The bouncy castle would keep the kids occupied for long periods while the parents socialized with drinks and small talk. This gave them a substantial window of opportunity to execute the theft without much interference. Molly expressed her doubts about the whole operation but Bill's enthusiasm was contagious and eventually won her over. They gathered their materials from the closet and practiced their roles in front of the mirror to perfect their act. The comedy would come naturally from the situation itself as two adults tried to fit in with the children's celebration. As the time for the party approached, they felt a mix of excitement and apprehension about what could go wrong or right. The heist was on and they were committed to seeing it through to the end no matter how silly it seemed. The room was filled with the smell of takeout food they had eaten while working on the plan. Scattered papers and empty coffee cups added to the chaotic atmosphere that mirrored their thoughts. They knew this job was a departure from their usual targets like offices or warehouses, but the challenge excited them. It would be a story they could tell for years if they succeeded without getting caught by a group of six year olds.
created_at
"2026-03-12T21:34:10.574Z"
3/12/2026, 9:34:10 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
51927
moderation_latency
(null)
generation_time
53488
tokens_prompt
77
tokens_completion
780
native_tokens_prompt
184
native_tokens_completion
13316
native_tokens_completion_images
0
native_tokens_reasoning
12715
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0800336
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"6b556ab6-ecfd-14e3-f60a-22178c575848"
is_byok
false
latency
260
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351250-GhwwkPcuwk8vl7eGji7c"
upstream_id
"6b556ab6-ecfd-14e3-f60a-22178c575848"
total_cost
0.0800336
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
532
0.52%
Dialogue to Total Word Ratio
Ratio: 19.74%, Deviation: 10.26%
neededClean
false
wordsTotal
532
wordsDialogue
105
0.2594%