NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY8AD1YQ7GAAA7Z6JFE6573
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
37.3s
Cost:
$0.015290
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
56.91%
Dialogue to Total Word Ratio
(details)
28.4551%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In the dim glow of a flickering lamp in their cramped apartment, Molly paced back and forth, her wild red curls bouncing like over-caffeinated squirrels. Bill slouched on the sagging couch, scribbling notes on a napkin stained with yesterday's pizza grease. The air smelled of stale coffee and desperation, but their eyes sparkled with the thrill of the absurd heist ahead. "Alright, Bill, listen up," Molly said, slamming her fist on the rickety table. "This kids' party is our golden ticket. Little Timmy's turning eight, and that means a mountain of presents—video games, bikes, maybe even that fancy drone I've been eyeing. We steal 'em all, fence 'em quick, and retire to that beach in Mexico." Bill looked up, his bushy eyebrows knitting together like confused caterpillars. "Mexico? With our luck, we'll end up in a lobster trap. But okay, genius, how do we crash a kiddie bash without looking like total creeps? I ain't dressing as a pony again—that last gig left me with hay fever and a lawsuit." Molly grinned, pulling out a crumpled clown costume from a duffel bag. "Disguises, you dolt! You're going as Bongo the Clown. Honk your nose, juggle some balls, and while the rugrats are distracted, I'll swipe the gift table. Easy peasy." "Easy? Those brats are like tiny ninjas on sugar highs! Remember the Thompson wedding? One kid bit my ankle thinking I was the piñata. And what about the parents? They'll be hovering like hawks with smartphones ready to film our downfall." She waved him off, her voice dripping with mock confidence. "Parents? Pfft. We'll blend in. I'll be the frazzled aunt—'Oh, dear, where's the cake?'—and you do your clown schtick. 'Hey kids, who wants balloon animals?' While they're squealing, we load the presents into the getaway stroller. I've modded it with extra compartments—fits a whole bike if we fold it right." Bill snorted, nearly choking on his coffee. "A stroller? Molly, you're a madwoman. What if there's a real baby in it? Or worse, one of those demon toddlers who screams bloody murder? Last time we tried a family gig, I ended up covered in glitter and regret." "Glitter washes off, regrets don't. Think big, Bill! Imagine the haul: Legos, action figures, gift cards. We'll be rolling in it. Entry point's the backyard—scale the fence during the cake-cutting chaos. You create a diversion—maybe fake a magic trick gone wrong, like pulling a rabbit out that's actually our decoy bag." He leaned forward, eyes widening. "A rabbit? Where do we get a rabbit? And what if the kids want to pet it? I'm telling you, this is suicide by birthday candle. One wrong move, and we're dodging dodgeballs from angry soccer moms." Molly laughed, slapping his back. "That's the beauty! Dodgeballs mean distraction. We grab, we dash, we disappear. Trust me, Bill, this'll be our masterpiece. Or at least funnier than that bank job where you tripped over your own shoelaces." Bill sighed, but a smirk crept onto his face. "Fine, but if I end up in a ball pit prison, you're bailing me out. And no more animal sidekicks—next time, we're stealing from adults." As they high-fived, the napkin plan fluttered to the floor, a blueprint for comedic catastrophe.
created_at
"2026-02-20T19:27:10.407Z"
2/20/2026, 7:27:10 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
8606
moderation_latency
(null)
generation_time
37243
tokens_prompt
77
tokens_completion
823
native_tokens_prompt
744
native_tokens_completion
982
native_tokens_completion_images
0
native_tokens_reasoning
243
native_tokens_cached
743
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01529025
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"a600a651-c47b-1472-0b84-63f0e4270dab"
is_byok
false
latency
77
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771615630-t35Nwvf6RwcnKrprNRmD"
upstream_id
"a600a651-c47b-1472-0b84-63f0e4270dab"
total_cost
0.01529025
cache_discount
0.00167175
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
548
56.91%
Dialogue to Total Word Ratio
Ratio: 73.36%, Deviation: 3.36%
neededClean
false
wordsTotal
548
wordsDialogue
402
28.4551%