NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHYV3GHG2JKNY8N3J6XEYXK
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
4.1s
Cost:
$0.004064
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
43.38%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
21.6892%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge, peering through a gap at the colorful balloons bobbing in the breeze. The backyard party was in full swing, shrieks of laughter mixing with the tinny blast of a clown’s kazoo. She turned to Bill, who was fiddling with a comically large pair of novelty sunglasses. “Bill, remind me again why we’re robbing a six-year-old’s birthday party,” Molly whispered. “Because those presents are wrapped in pure profit, Mol. Look at the size of that pile. One good haul and we’re set for the month. Plus, nobody suspects the clown car outside,” Bill replied, adjusting his fake nose. She rolled her eyes. “A clown car. Really? You look like a rejected rodeo act.” “Hey, it’s authentic. I even practiced my honk. Honk-honk,” he said, squeezing an imaginary bulb. Molly stifled a laugh. “Focus. We wait until they start the piñata. That’s our window. Kids go feral, parents get distracted with phones, and we slip in through the patio door.” “Slip in? You make it sound like a ballet. I’m wearing size thirteen shoes, Mol. I clomp,” Bill said, lifting one enormous sneaker for emphasis. “Then clomp quietly. And no candy. Last time you ate half the evidence and we barely outran that angry soccer mom,” she reminded him. “That was one time! And those gummies were shaped like dinosaurs. I couldn’t resist. Besides, you’re the one who wanted to wear the glittery tutu disguise,” Bill shot back. “It was camouflage! The theme is princesses. I blended,” Molly protested. “Blended? You looked like a disco ball that lost a fight with a unicorn,” he chuckled. She swatted his arm. “Keep it down. Okay, plan: I create a distraction by pretending to be the new magician’s assistant. You grab the big boxes first. The ones that say ‘To: Timmy, From: Grandma Who Loves You.’ Grandmas always go expensive.” “Roger that. What if the birthday boy sees us?” Bill asked. “Then you do the thing with the squirting flower and blame it on the dog. Works every time,” Molly said. Bill nodded slowly. “I still think we should’ve brought the fake beard. Kids respect beards.” “Kids respect nothing, Bill. They bite. Remember the Anderson twins last year?” she said with a shudder. “Fair point. Ouch. Those kids drew blood,” he admitted. Molly checked her watch. “Piñata starts in three minutes. You ready to become the worst fairy godparents in history?” “Born ready. Just promise me one thing,” Bill said, suddenly serious. “What?” “After we fence the loot, we get real cake. None of that store-bought garbage they’re serving. I want the good stuff with the little plastic race cars on top,” he declared. Molly grinned despite herself. “Deal. Now shut up and honk quietly. Let’s go steal Christmas in July.” They high-fived awkwardly between the branches and crept toward the fence, two of the most ridiculous criminals the suburbs had ever seen.
created_at
"2026-03-12T21:22:02.39Z"
3/12/2026, 9:22:02 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
398
moderation_latency
(null)
generation_time
3922
tokens_prompt
77
tokens_completion
795
native_tokens_prompt
182
native_tokens_completion
655
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0040636
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"1e102e0c-beb9-6bba-5925-437b54706951"
is_byok
false
latency
359
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350522-ydiDYY3djJepfdC4r0Xq"
upstream_id
"1e102e0c-beb9-6bba-5925-437b54706951"
total_cost
0.0040636
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
43.38%
Matches word count
n/a
neededClean
false
words
483
0.00%
Dialogue to Total Word Ratio
Ratio: 70.08%, Deviation: 40.08%
neededClean
false
wordsTotal
488
wordsDialogue
342
21.6892%