NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHYB60PA1P4FBW0V0K5YVDW7
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
33.9s
Cost:
$0.015395
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
81.27%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
40.6364%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In the dimly lit backroom of O'Malley's Pub, where the air smelled of stale beer and regret, Molly and Bill hunched over a crumpled blueprint of the McAllister mansion. Molly, with her wild red curls and a tattoo of a pickpocket's glove on her forearm, jabbed a finger at the layout. Bill, a lanky guy with a perpetual five-o'clock shadow and a knack for bad puns, slurped his coffee noisily. "Alright, Bill, listen up," Molly said, her voice a conspiratorial whisper. "The party's tomorrow afternoon. Little Timmy's turning eight, and that means a mountain of presents—video games, bikes, probably some gift cards we can flip for cash. We hit during the piñata bash; chaos ensues, we slip in." Bill grinned, leaning back in his creaky chair. "Piñata bash? Sounds like my kind of party. But how do we blend in? I ain't exactly kid-sized." Molly rolled her eyes. "Disguises, genius. You'll go as a clown—big shoes, red nose, the works. I'll be the magician's assistant in a sparkly dress. We mingle, act all jolly, then bam—grab the gifts from under the tree while they're blindfolded and swinging at candy." "A clown? Molly, last time I wore face paint, I looked like a melted mime. Kids'll scream bloody murder!" "That's the point! Distraction. You honk your horn, juggle some balls—non-stolen ones—and keep the rugrats occupied. I'll sneak to the gift pile, stuff 'em in my oversized top hat. It's got a false bottom, see? Holds a whole Xbox." Bill scratched his chin, eyeing the blueprint. "What about security? Rich folks like the McAllisters probably got cameras, alarms, maybe a guard dog named Fluffy with a taste for ankles." "Fluffy's a poodle—harmless. And the cameras? We loop the feed with that gadget you lifted from the electronics store. As for alarms, the party's got clowns, magicians, and a bounce house. No one's noticing a window jimmied open amid the squeals." Bill chuckled. "You're a devious one, Mol. But what if a kid spots us? 'Hey, mister clown, why's your bag full of my new Lego set?'" "Improvise! Tell 'em it's a magic trick—poof, presents disappear! Then we bolt to the van parked out back, disguised as a catering truck. Easy peasy." "Easy? Last heist, you said that about the bakery job, and we ended up covered in frosting, chased by a rolling pin-wielding grandma." Molly smirked, folding the blueprint. "This is different. Kids' parties are goldmines—pure chaos, no one sober enough to notice. We're in, out, and rich by sundown. You in or what?" Bill raised his mug in a mock toast. "Hell yeah. To the Great Birthday Bonanza Heist. May our getaway be swift and our pockets full." They clinked mugs, laughter echoing as they plotted the finer details, oblivious to the barman eavesdropping from the shadows. Little did they know, the real fun was just beginning.
created_at
"2026-02-20T20:17:12.408Z"
2/20/2026, 8:17:12 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
8492
moderation_latency
(null)
generation_time
33839
tokens_prompt
77
tokens_completion
730
native_tokens_prompt
744
native_tokens_completion
989
native_tokens_completion_images
0
native_tokens_reasoning
315
native_tokens_cached
743
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01539525
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"fbe2a864-d66f-9d5b-c218-596fc997737e"
is_byok
false
latency
114
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771618632-zhFkpvAXsDd8s9Y0TDJd"
upstream_id
"fbe2a864-d66f-9d5b-c218-596fc997737e"
total_cost
0.01539525
cache_discount
0.00167175
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
81.27%
Matches word count
n/a
neededClean
false
words
488
0.00%
Dialogue to Total Word Ratio
Ratio: 71.11%, Deviation: 41.11%
neededClean
false
wordsTotal
488
wordsDialogue
347
40.6364%