NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZK9D3TXZF1JFPA0B5F1GV
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
4.5s
Cost:
$0.004496
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
9.61%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
4.8041%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her oversized sunglasses and leaned across the wobbly diner booth, her voice a conspiratorial whisper. “Okay, Bill, hear me out. We hit the McGuffin birthday bash at three sharp. Twenty sticky-fingered kids, one giant pile of brightly wrapped loot, and a clueless mom in a paper crown. It’s the perfect score.” Bill slurped his coffee, nearly choking. “A children’s party? Mol, we’re thieves, not the nightmare before Christmas. What are we gonna do, rappel down from the bouncy castle?” “Precisely,” Molly said, slapping a crudely drawn floor plan on the table. “You wear the mascot costume. I’m talking full purple dinosaur. You’ll blend right in. While the rugrats are doing the conga line, I’ll be under the gift table with my patented Present Vacuum.” Bill stared at her. “Present Vacuum? Is that the shop-vac you stole from the hardware store last week?” “Modified it myself. Sucks up gift bags like a dream. But I need you to create a diversion. Start a chant. Something like, ‘Pin the tail on the burglar.’” “I am not chanting that,” Bill hissed, glancing around the diner. “Kids are vicious, Mol. One screams and we’ve got a room full of tiny eyewitnesses. ‘The big purple dinosaur touched my Xbox!’ Next thing you know, we’re on the news.” Molly grinned. “That’s why we bring the decoy. A remote-control car loaded with candy. You release it at the far end of the yard, they swarm like locusts, and I clean house. Thirty seconds tops. We’ll be sipping piña coladas before the candles are even lit.” Bill rubbed his temples. “And the parents? There’s always one dad filming everything on his phone like it’s the Super Bowl.” “Already thought of that,” she said, tapping the plan. “You spike the punch with your famous sleepy-time herbal mix. Nothing illegal, just enough to make everyone real chill. Real, real chill. By the time they realize the presents are missing, they’ll think the birthday boy wished them away.” “You want me to drug a children’s party?” Bill squeaked. “That’s a new low, even for us.” “It’s chamomile and melatonin, you big baby. They’ll just take a nice nap. We’ll even leave a thank-you card that says ‘Sorry about the loot, love, the Birthday Bandits.’” Bill leaned back, arms crossed. “I don’t like it. Too many variables. What if the clown shows up? Clowns hate competition.” Molly waved a dismissive hand. “We take the clown out first. One quick whoopee-cushion drop and he’s on his back. Then we strike.” There was a long pause. Bill sighed. “Fine. But I’m not wearing the dinosaur suit. I’ll be the magician. Cape, top hat, the works. I can make the presents disappear for real.” Molly’s eyes lit up. “Now you’re talking. We split the haul seventy-thirty.” “Fifty-fifty,” Bill countered. “Sixty-forty and I let you keep the Xbox.” Bill grinned despite himself. “Deal. But if we get caught, I’m telling the cops you made me do it.” Molly laughed and raised her coffee mug. “To the easiest job we’ve never done. The Birthday Bandits ride again.” Bill clinked his mug against hers. “God help us all.”
created_at
"2026-03-12T21:35:14.857Z"
3/12/2026, 9:35:14 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
295
moderation_latency
(null)
generation_time
3994
tokens_prompt
77
tokens_completion
858
native_tokens_prompt
182
native_tokens_completion
727
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0044956
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"3371935c-4946-25d7-782f-3b49bbc52f87"
is_byok
false
latency
248
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351314-AB03DRON0fbP8pDp8EPY"
upstream_id
"3371935c-4946-25d7-782f-3b49bbc52f87"
total_cost
0.0044956
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
9.61%
Matches word count
n/a
neededClean
false
words
522
0.00%
Dialogue to Total Word Ratio
Ratio: 81.66%, Deviation: 31.66%
neededClean
false
wordsTotal
529
wordsDialogue
432
4.8041%