NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHYV39Q9Z429M7RSKBZ1YZE
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
37.1s
Cost:
$0.042157
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
27.17%
Matches word count
(details)
4.79%
Dialogue to Total Word Ratio
(details)
15.9769%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill sat in the cluttered back room of their rundown apartment, surrounded by maps, fake mustaches, and various party supplies they had gathered for the operation. The air was thick with the smell of stale coffee and anticipation. Molly, the brains of the duo, eagerly unrolled a large sheet of paper that detailed the layout of the children's party venue. Bill, ever the reluctant partner, fidgeted with a rubber chicken he had found in their props box. "Bill, this is our biggest score yet. A whole pile of presents just waiting to be liberated from those little brats," Molly declared enthusiastically. Bill raised an eyebrow and leaned back in his creaky chair. "Are you sure about this, Molly? Breaking into a kids' party seems low even for us. What if we scar them for life?" Molly waved off his concerns with a dramatic flourish of her hand. "Nonsense. They'll think it's part of the show. We go in as the surprise magicians. Once the cake arrives, everyone will be too busy cheering to notice us hauling away the gifts." "But how do we get past the parents? There will be dozens of them taking videos," Bill pointed out while tossing the rubber chicken aside. "We use the classic misdirection. I'll perform a few card tricks to draw the adults' attention while you load the presents into the fake gift return boxes," Molly explained with confidence. "Card tricks? You know I always drop the deck. And what are fake gift return boxes?" Bill asked, looking confused. "They look like regular moving boxes labeled as returns from the toy store. No one will bat an eye. We'll sneak them out the back door near the swing set," Molly continued, pointing at her diagram. Bill scratched his chin thoughtfully. "Swing set? Sounds noisy. Kids might be playing there. This plan has too many variables." "Variables make it exciting. Think of all the video games and dolls we can sell on the black market. No more small time jobs for us," Molly said, her eyes lighting up at the prospect. "Black market for kids toys? I guess there's a market for everything. But let's talk escape. If things go south, what's our signal?" Bill inquired. "The signal is me shouting 'Happy Birthday' really loud. Then we run like hell to the van." Molly smiled mischievously. "This is crazy. But okay, I'm with you. Just promise no balloons in the getaway. They give me the creeps," Bill agreed finally. "No balloons, scout's honor. Now let's practice our routine one more time," Molly replied. The two continued their preparations late into the afternoon, refining every detail of their absurd plan to infiltrate the cheerful celebration and make off with the bounty of wrapped boxes. The comedy of the situation was not lost on them, but the lure of easy riches proved too strong to resist.
created_at
"2026-03-12T21:22:02.172Z"
3/12/2026, 9:22:02 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
35000
moderation_latency
(null)
generation_time
36834
tokens_prompt
77
tokens_completion
695
native_tokens_prompt
184
native_tokens_completion
6984
native_tokens_completion_images
0
native_tokens_reasoning
6400
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0421568
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"8d07551c-9ef3-4eea-4336-d08b97c39fa5"
is_byok
false
latency
276
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350522-xW8JgPbjyXim8w5biNyZ"
upstream_id
"8d07551c-9ef3-4eea-4336-d08b97c39fa5"
total_cost
0.0421568
cache_discount
0.0001152
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
27.17%
Matches word count
n/a
neededClean
false
words
481
4.79%
Dialogue to Total Word Ratio
Ratio: 57.80%, Deviation: 7.80%
neededClean
false
wordsTotal
481
wordsDialogue
278
15.9769%