NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY7PAR52D8RCDJHT4QHKEEJ
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
37.3s
Cost:
$0.014800
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
93.65%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
46.8248%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly paced the cramped motel room, her red wig askew, while Bill hunched over a crumpled napkin scribbled with crayon drawings of a backyard layout. The air smelled of stale coffee and desperation, but their eyes gleamed with the thrill of the score. Toys 'R' Us gift cards, action figures, maybe even a pony-shaped piñata stuffed with cash—who knew what rich parents splurged on for little Timmy's birthday bash? "Alright, Bill, listen up," Molly said, jabbing a finger at the napkin. "The party's at noon. We crash it dressed as entertainers. You be the magician, I'll be the balloon animal chick. Kids love that crap." Bill snorted, adjusting his fake mustache. "Magician? With my luck, I'll pull a rabbit out of my hat and it'll bite me. Why not clowns? Clowns are classic for kids' parties." "Clowns? Are you nuts? Last time we did clowns, that toddler screamed bloody murder and blew our cover. No, magician it is. You distract 'em with card tricks while I swipe the present pile. It's genius!" He leaned back, crossing his arms. "Genius? What if the parents are watching? Rich folks like these hire bouncers for birthday parties. I saw it on TV—nannies with tasers." Molly rolled her eyes. "Bouncers? It's a kiddie party, not the Oscars. We'll blend in. You say abracadabra, make some balloons pop, and poof—I'm out the back with a sack of goodies. Easy peasy." "Easy? What about the cake? Kids go feral around cake. One wrong move and we're buried in frosting and tantrums." She grinned, pulling out a clown nose from her pocket. "That's why we time it for the piñata smash. Chaos ensues, kids swinging bats blindly—we grab and dash. But fine, if you're chicken, we could go in as caterers. 'Oh, excuse me, ma'am, just delivering extra cupcakes... and taking all your wrapped boxes.'" Bill laughed, a deep belly rumble. "Caterers? With your cooking? You'd poison the punch. Nah, magician's fine. But I get the big presents. Last heist, you hogged the diamond necklace while I got stuck with cubic zirconia." "Dream on, partner. Fifty-fifty split, remember? Now, contingencies: Dog in the yard?" "Tranq dart disguised as a party popper," he replied smugly. "Nosy neighbor?" "Fake FBI badges. 'Routine gift inspection, ma'am.'" Molly high-fived him. "Perfect. And if a kid spots us?" Bill winked. "We bribe 'em with candy. Kids are mini-extortionists anyway." She nodded, folding the napkin. "Alright, gear up. This is gonna be the heist of the century—Operation Birthday Bandits. Just don't trip over any Legos." As they gathered their props—a top hat, a bag of fake wands, and a suspiciously large duffel—Bill paused. "Hey, Molly, ever think we should've picked a real job? Like, accounting?" She smirked. "And miss the fun? Never. Now let's go steal some joy—literally." They slipped out into the sunlight, ready to turn a celebration into their payday.
created_at
"2026-02-20T19:16:12.689Z"
2/20/2026, 7:16:12 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
9562
moderation_latency
(null)
generation_time
37236
tokens_prompt
77
tokens_completion
755
native_tokens_prompt
744
native_tokens_completion
940
native_tokens_completion_images
0
native_tokens_reasoning
248
native_tokens_cached
681
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01479975
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"303c1027-24b2-57c8-c355-b577eaa62afa"
is_byok
false
latency
380
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771614972-SsNAKanVgplgRYRgEHPC"
upstream_id
"303c1027-24b2-57c8-c355-b577eaa62afa"
total_cost
0.01479975
cache_discount
0.00153225
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
93.65%
Matches word count
n/a
neededClean
false
words
491
0.00%
Dialogue to Total Word Ratio
Ratio: 72.71%, Deviation: 22.71%
neededClean
false
wordsTotal
491
wordsDialogue
357
46.8248%