NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ74ED22FAC64RETWMXT59
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
3.9s
Cost:
$0.004142
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
90.48%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
45.2430%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her fake mustache in the rearview mirror while Bill tapped the steering wheel like it owed him money. “This is the dumbest idea you’ve ever had, and I’ve known you since the Great Kitten Heist of ’09.” “Relax,” Molly said, popping a cheese puff into her mouth. “It’s a six-year-old’s birthday party at the community center. Cake, balloons, chaos. We’ll be in and out before the piñata hits the floor.” Bill stared at her. “We’re robbing toddlers, Mol. Toddlers. Their moral compass is still in the box with the batteries.” “Exactly. They won’t suspect a thing. We’re going as the entertainment. I’m the magician, you’re my bumbling assistant. Classic misdirection. While they’re watching me pull a rabbit out of a hat, you’re stuffing goody bags into your pants.” “My pants have limited real estate,” Bill grumbled. “And I’m not sure I want that kind of reputation in the neighborhood. ‘Hey, that’s the guy who shoved Frozen-themed party favors down his trousers.’” Molly waved a hand. “We wear the inflatable T-Rex costumes. Nobody looks at your crotch when you’ve got a giant rubber tail. Trust me, I’ve done the math.” “You did the math on dinosaur crotch visibility?” “Page seventeen of the plan, Bill. Keep up. We hit the gift table at 3:15, right after they sing Happy Birthday. The kids will be in a sugar frenzy. We grab the big boxes first—those big shiny ones always have the good stuff. Then we work our way to the envelopes. Cash and gift cards, baby.” Bill pinched the bridge of his nose. “What if one of them cries? I can’t handle it when they cry. That little lip quiver thing destroys me.” “Then you do the thing with the balloon animals. I’ve seen you make a poodle. Kids love poodles. You make the poodle, I make the getaway. Simple.” “Simple,” he echoed, clearly not believing the word. “We’re going to hell for this. There’s a special level for people who steal Shopkins from six-year-olds.” Molly shrugged. “We’ll send them an anonymous gift basket later. Full of slightly less stolen toys. Balance the karma scales.” Bill was quiet for a moment, then muttered, “I want the talking dinosaur. The one that says ‘rawr means I love you.’ I’ve earned that.” “You can have the dinosaur if you stop whining. And if we get caught, we’re cousins from out of town. My name is Brenda, you’re Kevin, and we thought this was a costume party.” “Brenda and Kevin sound like they sell essential oils at flea markets.” “Perfect. No one will suspect Brenda and Kevin of felony present theft.” Bill finally smiled, small and reluctant. “Fine. But if we pull this off, we’re never speaking of it again.” “Deal,” Molly said, starting the car. “Now help me practice my magic voice. Abracadabra, you little gremlins, hand over the presents.” Bill groaned. “We’re going to jail.” “Only if we’re bad at it.”
created_at
"2026-03-12T21:28:36.563Z"
3/12/2026, 9:28:36 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
311
moderation_latency
(null)
generation_time
3549
tokens_prompt
77
tokens_completion
800
native_tokens_prompt
182
native_tokens_completion
668
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0041416
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"153cf9c2-1c02-2cc7-cf1c-6820560ffc6d"
is_byok
false
latency
266
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350916-Lg1LWA370HZiI8Q2EHQp"
upstream_id
"153cf9c2-1c02-2cc7-cf1c-6820560ffc6d"
total_cost
0.0041416
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
90.48%
Matches word count
n/a
neededClean
false
words
490
0.00%
Dialogue to Total Word Ratio
Ratio: 84.65%, Deviation: 14.65%
neededClean
false
wordsTotal
495
wordsDialogue
419
45.2430%