NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZERW44K974QVS65SE3DPM
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
53.4s
Cost:
$0.073806
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
40.52%
Dialogue to Total Word Ratio
(details)
20.2613%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In their cramped, dusty van parked discreetly across the street from the large suburban house where colorful balloons bobbed in the breeze, Molly and Bill sat reviewing the plan one last time as the sounds of children's laughter and happy music drifted from the backyard party. "Okay Bill, it's time. We go in as the hired help. You keep the children entertained with some silly jokes and I'll handle the presents. They're all in one place on the big table by the fence," Molly said. Bill sighed and shook his head slowly. "This is the worst idea you've ever had. Breaking into a children's party to steal all the presents? What are we going to do with a bunch of stuffed animals and board games? I feel terrible about this already," Bill said. "Come on, it's not that bad. The parents are rich. They'll buy new ones. We need the money for our next big job. This is just practice and quick cash. You juggle some balls, I grab the bags and we are out in ten minutes," Molly said. "Juggling? I told you I hate juggling. Last time I hit a kid in the head. And ten minutes? With all those parents watching? This could go bad fast," Bill said. "It won't because we'll blend in. I have the costumes in the back. Put on the silly hat and the vest. No one will suspect us if we look the part. Trust me, I've thought this through," Molly said. Bill groaned as he picked up the vest. "Thought it through? Like the time we tried to steal from the fair and ended up with cotton candy instead? Fine, but if we get caught I'm saying it was your idea," Bill said. "It is my idea and it's a good one. Now, once inside, head straight for the entertainment area. Start telling jokes about pirates or whatever kids like these days. When they are all laughing, that's when I strike. The big bag is for the big items," Molly said. "Pirate jokes? I don't know any pirate jokes. This is not going to work. Why can't we steal from adults like normal thieves?" Bill asked. "Because adults are suspicious. Kids are distracted by everything. It's perfect. Just do the silly voice and they'll love it. After we have the goods, we meet back at the van and split everything later," Molly said. "Silly voice? You're killing me here. Okay, one last thing. What if there is a piƱata with candy? Can we at least take some candy for ourselves?" Bill asked. "Sure, a little candy never hurt. But the presents are the priority. Ready to become the worst guests ever?" Molly asked. "No but let's get it over with before I change my mind. I can't believe we're doing this. It's like stealing candy from a baby but the candy is the presents," Bill said. "It's not stealing candy. It's stealing presents and it's a victimless crime basically. Let's just get those gifts and get out," Molly said. The plan was risky but the potential reward of all those presents made their eyes light up with greedy anticipation. Molly grabbed the bag of props and they both stepped out of the van into the afternoon sun, their hearts beating fast at the thought of the comedic heist they were about to pull off at the children's party.
created_at
"2026-03-12T21:32:46.858Z"
3/12/2026, 9:32:46 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
51308
moderation_latency
(null)
generation_time
52769
tokens_prompt
77
tokens_completion
797
native_tokens_prompt
184
native_tokens_completion
12278
native_tokens_completion_images
0
native_tokens_reasoning
11595
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0738056
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"73aca0c4-aac3-3627-bbbd-eb2c4f84c403"
is_byok
false
latency
259
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351166-zg9hSiQ2EqhV4kZCNRIT"
upstream_id
"73aca0c4-aac3-3627-bbbd-eb2c4f84c403"
total_cost
0.0738056
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
567
40.52%
Dialogue to Total Word Ratio
Ratio: 74.25%, Deviation: 4.25%
neededClean
false
wordsTotal
567
wordsDialogue
421
20.2613%