NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHQYVE77MXTZ6R896BQFMGQ
Run Details
Model:
Grok 4.20 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
1m 2s
Cost:
$0.015890
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
66.69%
Dialogue to Total Word Ratio
(details)
33.3452%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The two thieves sat in silence for a moment, staring at the colorful balloons tied to the mailbox. Molly was the brains, always concocting these zany schemes, while Bill was the muscle, though mostly he just complained a lot. They huddled in their cluttered van parked across from the community center, the sounds of laughter drifting from inside the children's party. "Bill, this is our big score. All those presents just sitting there waiting for us. We sneak in during the cake time," Molly whispered excitedly. Bill frowned, adjusting his fake mustache. "Are you kidding me, Molly? It's a kids birthday party. What if we get recognized? Or worse, what if they think we're the entertainment and make us play games? I hate musical chairs." "Relax, we'll pose as the magicians. You do the disappearing act on the presents while I distract with card tricks. Kids won't suspect a thing," Molly said, pulling out two cheap capes. "But how do we disappear the presents without them noticing? Little Timmy or whoever will scream if his new bike vanishes. And the parents? They'll call the cops faster than you can say abracadabra," Bill asked, his voice filled with doubt. Molly laughed. "That's why we replace them with wrapped empty boxes. Same wrapping paper, see? I spent all night on it. While you wave your wand and say the magic words, I'll swap them out. Easy." "Easy for you to say. You're not the one who has to entertain a bunch of sugar high six year olds. What if they want a bunny from the hat? I got nothing but an old sock in there," Bill retorted. "We'll use your sock as the bunny. Call it Fluffy. Look, the plan is solid. We go in the side door during the piƱata smashing. Chaos is our friend. Grab the bags, slip out, and we're rich in toys," Molly suggested with a straight face. "Rich in toys? What are we going to do with a bunch of Barbie dolls and video games? Fence them at the pawn shop? Hey, got any takers for slightly used teddy bears? This is the dumbest heist yet," Bill questioned. "It's not about keeping the stuff, Bill. It's about the thrill. And selling them online. Anonymous. Now stop whining and practice your magician voice. Nothing up my sleeve but lots of loot!" Molly explained. "Nothing up my sleeve but lots of loot. See? No one will buy it. The kids will boo us off the stage before we even get close to the presents," Bill repeated flatly. "Then we improvise. If things go south, release the whoopee cushion smoke bombs I made. Creates a diversion. Trust me, partner. This will be our funniest haul yet. In and out before the candles are blown out," Molly explained patiently. Bill sighed and nodded reluctantly. "Fine, but if I end up singing happy birthday, you're buying the beer for a month. Let's go over the exit route one more time." They continued discussing the details, their voices low as they refined the absurd plan to steal from the children's party. The aroma of cake and candy filled the air, making Bill's stomach rumble despite the tension.
created_at
"2026-05-01T12:24:32.461Z"
5/1/2026, 12:24:32 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
58193
moderation_latency
(null)
generation_time
61844
tokens_prompt
77
tokens_completion
2224
native_tokens_prompt
184
native_tokens_completion
6291
native_tokens_completion_images
(null)
native_tokens_reasoning
5636
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0158903
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"5efc167c-9171-13ec-c8c6-7e9bc71dbc0b"
is_byok
false
latency
2904
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777638272-wq2SAOWO2CDquYw2KTYo"
session_id
(null)
api_type
"completions"
id
"gen-1777638272-wDfZb1Ksy8ESFBvEQQCq"
upstream_id
"5efc167c-9171-13ec-c8c6-7e9bc71dbc0b"
total_cost
0.0158903
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
534
66.69%
Dialogue to Total Word Ratio
Ratio: 72.85%, Deviation: 2.85%
neededClean
false
wordsTotal
534
wordsDialogue
389
33.3452%