NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHYXZWK2A61Q0RBAY2SACMB
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
35.0s
Cost:
$0.042798
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
81.27%
Matches word count
(details)
27.96%
Dialogue to Total Word Ratio
(details)
54.6151%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In the flickering light of a single bulb, two figures plotted their next caper. Molly, the brains of the operation, and Bill, the reluctant muscle, were preparing for what would surely be their most absurd adventure. Molly and Bill were hunched over a small table in their dingy apartment, papers scattered everywhere with crude drawings of a house and party setup. The clock was ticking, and the heist was set for the next day. Molly pointed at the diagram while Bill sipped his coffee nervously. "Bill, this is going to be our greatest score yet. But we have to be smart about it," Molly began. "Smart? Stealing presents from a children's party doesn't sound smart to me. What if the kids start screaming?" Bill replied. "They won't if we do it right. We'll wait until the cake is served and all eyes are on the birthday boy. Then we make our move on the gift table," she said. "But how do we get invited? It's not like we know the family," he questioned. "We don't need an invitation. We'll pose as the hired entertainment. I have clown wigs in the bag," Molly explained with a grin. "Clown wigs? You know I hate clowns. They freak me out," Bill protested. "That's the point. No one will suspect us because we'll look ridiculous. You be the clown and I'll be the assistant. We distract with bad magic tricks and while they're laughing, I slip the presents out the back," she detailed. "Bad magic tricks? My rabbit in the hat always fails. Last time the rabbit bit me," he remembered. "This time no rabbit. Just balloons and lame jokes. Why did the kid cross the road? To get to the presents! See, it'll be funny," Molly laughed. "Funny to you maybe. What about escape? If parents catch us with armfuls of gifts, we're done," Bill said anxiously. "We park the van close. Back it into the alley. The back gate is unlocked according to my research. Load the big ones first like the bike and the dollhouse. Leave the small ones to avoid suspicion," she instructed. "You researched a kids party? You're scarier than I thought. Okay, but we split the loot fifty fifty, right? No taking the good toys for yourself," he negotiated. "Of course, partner. You take the video games and I'll take the art supplies. But seriously, this is easy money. Those presents are brand new, we can sell them online before the party ends," Molly assured him. "I still feel bad. Maybe we should leave a note saying sorry," Bill suggested. "A note? That's a good way to get caught, genius. No notes, no fingerprints, just in and out like ghosts. Now let's practice our routine," she said. "Fine. But if this goes south, I'm retiring from thievery. Maybe get a real job," he mumbled. "Retire? After this we'll have enough to retire to a beach somewhere. Now, let's practice. We are the greatest thieves in the world," Molly commanded. "We are the greatest thieves in the world," Bill echoed half-heartedly.
created_at
"2026-03-12T21:23:36.985Z"
3/12/2026, 9:23:36 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
32704
moderation_latency
(null)
generation_time
34889
tokens_prompt
77
tokens_completion
752
native_tokens_prompt
184
native_tokens_completion
7110
native_tokens_completion_images
0
native_tokens_reasoning
6472
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0427976
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"f1626e68-ca4c-0dc7-c8b1-2a9456c1c6d1"
is_byok
false
latency
305
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350616-qT7Orhg9qlgjXNvISn0M"
upstream_id
"f1626e68-ca4c-0dc7-c8b1-2a9456c1c6d1"
total_cost
0.0427976
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
81.27%
Matches word count
n/a
neededClean
false
words
512
27.96%
Dialogue to Total Word Ratio
Ratio: 75.05%, Deviation: 5.05%
neededClean
false
wordsTotal
513
wordsDialogue
385
54.6151%