NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHRNW90WW2Q0PFCCE0WZ1AR
Run Details
Model:
Grok 4.20 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
2m 54s
Cost:
$0.046255
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
90.48%
Matches word count
(details)
95.31%
Dialogue to Total Word Ratio
(details)
92.8958%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In a dimly lit room filled with stolen goods from previous botched jobs, Molly and Bill were finalizing the details of their most outrageous heist to date. The target was a children's party, an event bursting with laughter, games, and most importantly, a huge pile of presents that could keep them afloat for months if sold on the black market. The comedic potential of two adult thieves mingling with toddlers and their parents was not lost on them, but the risk was high if they didn't execute perfectly. They had spent days casing the joint and gathering intelligence on the schedule of events including when the presents would be placed and when the kids would be otherwise occupied with cake and games. "Listen Bill, we can't just burst in there like we own the place. We need to be the entertainment. You do your balloon animals and I will perform magic tricks to keep them all occupied while we slowly make the presents disappear." Bill leaned forward with a worried frown on his face as he considered the implications of their scheme. "But what if the kids see us taking their gifts? They'll throw a fit and blow our cover faster than you can say abracadabra. And how do we get the big ones out without being seen?" "We use the cake as cover. A giant rolling cake that we hide the boxes in. No one will suspect a thing until it's too late. Remember to act natural and keep the laughs coming." Bill shook his head in disbelief at the elaborate plan but he knew they had few other options left in their line of work. "This is the craziest idea we've ever had Molly. Stealing from kids? I feel like we're crossing into villain territory here. But if it works we can finally take that vacation we've been talking about." The room fell silent for a moment as they both considered the absurdity of their plan. Molly was the driving force, always coming up with these zany ideas that somehow seemed workable in her mind despite the obvious moral questions. Bill, on the other hand, was the voice of slight reason, though he usually ended up going along with whatever crazy scheme she suggested after some token protest. They spent several hours refining the details, discussing possible contingencies like a child wanting to assist with the magic show or a parent asking too many questions about their credentials as performers and where they had trained. The comedy of errors that could ensue if things went south made them both chuckle despite the underlying tension of the criminal activity they were planning. With their disguises prepared and the van fully gassed up, they were as ready as they would ever be for the unusual heist that awaited them. The night of the party approached rapidly and with it the promise of either a big financial payoff or a memorably hilarious disaster that would add yet another story to their long list of legendary failures in the world of thievery.
created_at
"2026-05-01T12:37:06.98Z"
5/1/2026, 12:37:06 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
169455
moderation_latency
(null)
generation_time
173564
tokens_prompt
77
tokens_completion
5288
native_tokens_prompt
184
native_tokens_completion
18437
native_tokens_completion_images
(null)
native_tokens_reasoning
17869
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0462553
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"53e6cf5f-f0fc-5767-f3d7-22e98010b9bd"
is_byok
false
latency
165
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777639026-0X1Ge05sWpkABxJu3M04"
session_id
(null)
api_type
"completions"
id
"gen-1777639026-lfaKpqHSvxSOGovIJSFp"
upstream_id
"53e6cf5f-f0fc-5767-f3d7-22e98010b9bd"
total_cost
0.0462553
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
90.48%
Matches word count
n/a
neededClean
false
words
510
95.31%
Dialogue to Total Word Ratio
Ratio: 29.02%, Deviation: 0.98%
neededClean
false
wordsTotal
510
wordsDialogue
148
92.8958%