NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHR100V2ER95MVCE0ZFER6J
Run Details
Model:
Grok 4.20 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
2m 32s
Cost:
$0.044323
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.21%
Matches word count
(details)
1.96%
Dialogue to Total Word Ratio
(details)
1.0873%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In a hidden corner of the city, far from the eyes of law enforcement, the duo of experienced thieves were concocting their most daring and laughable plot to date. The children's party was scheduled for the weekend and they had gathered all the intelligence they could from public social media posts about the event. Molly peered at the hand-drawn map of the birthday party layout spread out on the table in their small apartment. She adjusted her glasses and looked at her partner with a mischievous grin. Bill was pacing back and forth, clearly uneasy about the target they had chosen for their next job. "This is the big one Bill. We break in during the piƱata smashing. All the kids will be in the yard so the presents will be unguarded." Bill looked shocked at the suggestion, his face turning pale at the thought of facing hordes of sugar-high children. "Unguarded but surrounded by adults Molly. How do we get past them without raising suspicion?" "We go in disguised as the replacement magicians. The real ones will mysteriously not show up because we accidentally sent them to the wrong address earlier today." Bill stopped pacing and considered the idea, a small smile forming despite his reservations. "You think of everything don't you? What tricks do we perform to keep them entertained long enough?" "You do the card tricks and rabbit stuff while I vanish with the gifts in my oversized magic bag. It's perfect." "Perfect until a kid asks to see the rabbit up close. Then what?" "Then you pretend the rabbit escaped and start a search party. That will distract everyone while I make multiple trips to the van." "The van that we painted with cartoon characters yesterday? I guess that helps us look legitimate." "Exactly. We load up all the presents including the big ones if we can break them down fast. The haul could be worth thousands on the black market." "Thousands? From toy cars and dolls? You have high hopes but I'm with you." "Good. Remember to laugh like a clown the whole time to sell the act." "Like this? Ha ha ha little boy do you like magic?" "Not bad but make it higher pitched. We can't fail at the voices or the parents will know." "Right. Higher pitched. Got it. After we have the bags full we exit through the kitchen pretending to get more supplies." "Yes and then we drive away slowly as not to attract attention from the neighbors." "This plan is so crazy it just might succeed. Let's go over it one more time from the beginning." "Okay but first pass me that coffee. Planning heists always makes me thirsty." They both laughed at the ridiculousness of their scheme as they refined the details late into the evening, the map becoming covered with notes and arrows indicating their path to the presents and escape route. The room filled with the sound of their voices going over every possible scenario and contingency for the heist at the children's party, ensuring that no detail was left to chance in what would surely be their most memorable and comical adventure yet.
created_at
"2026-05-01T12:25:42.688Z"
5/1/2026, 12:25:42 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
149564
moderation_latency
(null)
generation_time
152195
tokens_prompt
77
tokens_completion
4728
native_tokens_prompt
184
native_tokens_completion
17691
native_tokens_completion_images
(null)
native_tokens_reasoning
17089
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0443231
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"c596cda7-42b2-6715-7cc4-03dedcdb8b9b"
is_byok
false
latency
85
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777638342-dSr8RLzXdawwnHGXqLDw"
session_id
(null)
api_type
"completions"
id
"gen-1777638342-tVceSJhFYCy1VxmoEeTC"
upstream_id
"c596cda7-42b2-6715-7cc4-03dedcdb8b9b"
total_cost
0.0443231
cache_discount
0.0001344
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.21%
Matches word count
n/a
neededClean
false
words
528
1.96%
Dialogue to Total Word Ratio
Ratio: 58.87%, Deviation: 8.87%
neededClean
false
wordsTotal
530
wordsDialogue
312
1.0873%