NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY9VFJ001WAWAMWMPXZTB5W
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
39.9s
Cost:
$0.015040
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.21%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.1070%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly peered over the crumpled blueprint spread across the rickety kitchen table, her finger tracing a squiggly line that was supposed to be the backyard fence. "Okay, Bill, listen up. The party's at the Johnsons' mansion tomorrow. Kids everywhere, balloons, cake—the works. Our target: that pile of presents in the living room. We're talking bikes, toys, maybe even a pony if we're lucky." Bill, slouched in a chair with a half-eaten pizza slice dangling from his mouth, raised an eyebrow. "A pony? Molly, we're thieves, not cowboys. How we gonna fence a live animal? 'Hey, shady guy in the alley, want a Shetland with a bow on it?'" She swatted his arm. "Focus! The plan's gold. We go in disguised as clowns. Big shoes, red noses, the whole shebang. Kids love clowns. We'll blend right in." "Clowns? Last time I dressed as one, that birthday brat puked cotton candy on my polka dots. Plus, have you seen me juggle? I can barely tie my own laces without dropping 'em." Molly rolled her eyes, sketching a crude clown face on the blueprint. "You'll learn. Step one: Arrive at 2 PM, right when the piñata's getting bashed. Chaos ensues—kids screaming, parents distracted. We slip in through the side gate." Bill leaned forward, wiping grease on his sleeve. "Side gate? It's got one of those fancy locks. You know, the kind with beeps and lasers or whatever." "Lasers? Bill, it's a suburban backyard, not Fort Knox. I've got my lockpick set. We'll pop it open in ten seconds flat." He chuckled. "Your lockpick set? That's just a bent paperclip and a nail file. Remember the museum job? You picked the janitor's closet instead of the safe." "That was a minor setback! Anyway, once inside, I distract the crowd with balloon animals. You head to the present pile, stuff 'em in our oversized clown pants." Bill burst out laughing. "Oversized pants? Molly, we'll look like we're smuggling elephants! And what if a kid wants a giraffe balloon? I can't make a giraffe—mine always look like deranged snakes." "Fine, you do the distracting. Honk your nose, trip over your feet. Kids eat that slapstick crap up. I'll grab the goods. We rendezvous at the bounce house—it's got an exit flap straight to the alley." He nodded slowly. "Bounce house, huh? Sneaky. But what about the parents? One of 'em spots us waddling out with presents bulging everywhere, and boom—cops." Molly grinned, slapping the table. "That's the beauty! We'll yell 'Surprise game!' and toss a few decoy toys. Confusion city. By the time they figure it out, we're gone in the getaway van—painted like an ice cream truck, of course." "Ice cream truck? Genius! Kids'll chase us instead of calling the fuzz." "Exactly. High-five, partner. This heist's gonna be legendary." Bill high-fived her, but his pizza slice flew across the room, splattering the blueprint. "Oops. Uh, think tomato sauce looks like a guard dog on the map?" Molly sighed. "Just our luck. Alright, plan B: Avoid the sauce stains." They both dissolved into laughter, the blueprint forgotten amid dreams of stolen teddy bears and bikes.
created_at
"2026-02-20T19:53:58.605Z"
2/20/2026, 7:53:58 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
8813
moderation_latency
(null)
generation_time
39816
tokens_prompt
77
tokens_completion
798
native_tokens_prompt
744
native_tokens_completion
956
native_tokens_completion_images
0
native_tokens_reasoning
230
native_tokens_cached
681
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01503975
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"f20ec0de-f9ba-b9b6-0534-c6a8cf75bac5"
is_byok
false
latency
360
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771617238-O3wUVdLKz4ZrD09nFthp"
upstream_id
"f20ec0de-f9ba-b9b6-0534-c6a8cf75bac5"
total_cost
0.01503975
cache_discount
0.00153225
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.21%
Matches word count
n/a
neededClean
false
words
528
0.00%
Dialogue to Total Word Ratio
Ratio: 78.22%, Deviation: 28.22%
neededClean
false
wordsTotal
528
wordsDialogue
413
0.1070%