NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY85Q7N6814QRVFK13WGW4G
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
35.0s
Cost:
$0.015130
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
99.74%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.8722%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly paced the cramped motel room, her black leather jacket slung over a chair piled with crumpled maps. Bill slouched on the bed, munching on a stale doughnut, his eyes lighting up like he'd just won the lottery. "Alright, Bill, listen up," Molly said, slapping a blueprint of the suburban mansion onto the rickety table. "The party's tomorrow at noon. Little Timmy's turning eight, and that means a mountain of presents—toys, gadgets, gift cards. We're talking thousands in resale value if we fence 'em right." Bill wiped crumbs from his chin, grinning. "Thousands? Hot damn, Molly! I can finally afford that gold-plated toaster I've been eyeing. But a kids' party? Ain't that risky? Balloons, clowns, screaming brats everywhere." "Risky's our middle name, you idiot," Molly shot back, rolling her eyes. "We'll go in disguised as entertainers. You be the clown—fits your natural talent for looking ridiculous. I'll be the magician's assistant. Wave a wand, pull a rabbit out of a hat, and while the rugrats are dazzled, we swipe the gift table." Bill honked an imaginary nose. "Honk honk! Yeah, but what if the parents notice? Some soccer mom spots us hauling sacks of presents and calls the cops. 'Officer, the clown's got my kid's bike!' Boom, we're toast." Molly smirked, leaning in close. "That's why we bring the diversion. Remember that smoke machine from the museum job? We rig it to the piñata. One good whack, and poof—smoke everywhere. Chaos ensues. Kids coughing, parents panicking. We grab the loot and slip out the back door dressed as caterers." "Caterers? In clown makeup?" Bill chuckled, nearly choking on his doughnut. "Molly, you're a genius, but this sounds like a bad cartoon. What if the birthday boy cries? I ain't good with waterworks. Last time I made a kid bawl, it was 'cause I tripped over his tricycle during recon." "Focus, Bill! No one's crying if we pull this off smooth," Molly snapped, jabbing a finger at the blueprint. "Entry point: kitchen window, unlocked during prep. Exit: garage, where we'll stash the van. And for Pete's sake, don't eat the cake. Last heist, you got frosting all over the getaway car—cops followed the sprinkles right to us." Bill leaned back, arms crossed. "Hey, that cake was chocolate! Irresistible. But fine, no snacks. What about alarms? Rich folks like these got security tighter than a toddler's grip on candy." "Disabled 'em remotely last night," Molly said with a wink. "Hacked the system with that app you mocked. Now, rehearsal: You say, 'Who wants a balloon animal?' While I say, 'Abracadabra!' and poof, presents vanish." Bill stood, mimicking a clown pose. "Abracadabra? More like 'Abra-grab-a-bag!' This is gonna be epic, Molly. We're not just thieves; we're party crashers extraordinaire." Molly laughed despite herself. "Just don't screw it up, or we'll be the ones getting presents—in prison stripes." They high-fived, the blueprint crinkling under their palms, as visions of stolen Barbies and video games danced in their heads.
created_at
"2026-02-20T19:24:36.993Z"
2/20/2026, 7:24:36 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
9648
moderation_latency
(null)
generation_time
34878
tokens_prompt
77
tokens_completion
778
native_tokens_prompt
744
native_tokens_completion
962
native_tokens_completion_images
0
native_tokens_reasoning
253
native_tokens_cached
681
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01512975
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"2ed66326-cf9b-309e-4041-a617d651ca56"
is_byok
false
latency
158
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771615477-jAFatHIFHOdH4woJJO2Y"
upstream_id
"2ed66326-cf9b-309e-4041-a617d651ca56"
total_cost
0.01512975
cache_discount
0.00153225
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.74%
Matches word count
n/a
neededClean
false
words
504
0.00%
Dialogue to Total Word Ratio
Ratio: 74.21%, Deviation: 24.21%
neededClean
false
wordsTotal
504
wordsDialogue
374
49.8722%