NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY9NYEJYFHTDANJTSYF8ZAM
Run Details
Model:
Grok 4 Fast
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
6.7s
Cost:
$0.000474
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
51.93%
Matches word count
(details)
95.20%
Dialogue to Total Word Ratio
(details)
73.5634%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly hunched over the rickety kitchen table in their dimly lit apartment, blueprints of "Kiddie Kingdom Party Palace" spread out like a treasure map. Bill paced nearby, chomping on a stale pretzel, his eyes wild with the kind of greed that only a lifetime of petty crime could foster. "Listen, Bill," Molly said, jabbing a finger at the diagram. "This gig's a goldmine. Rich parents, spoiled brats—presents piled higher than a piñata explosion. We hit at peak chaos: cake time." Bill stopped pacing, crumbs tumbling from his mustache. "Cake time? You're talkin' like we need an invite. How do we even get past the balloon brigade? Those clowns guardin' the door look meaner than a tax auditor." Molly snorted, tracing the entrance with her pen. "Disguises, genius. We go as party crashers. I'll be the face painter—slap on some war paint, turn kids into superheroes. You? The magician. Pull rabbits outta hats, but really, you're palming gift tags." "A magician? Me?" Bill guffawed, slapping his knee. "Last time I did magic, I set my pants on fire. What if the kids see through it? One snot-nosed tyke yells 'thief,' and we're swarmed like ants on a lollipop." "Relax," Molly shot back, her voice dripping sarcasm. "We'll time it for the piñata bash. You distract with smoke bombs—non-lethal, promise—while I snag the haul from the gift table. Wheeled cart under the skirt, easy roll-out to the van." Bill leaned in, squinting at the map. "Van? Parked where? The lot's crawling with minivans and soccer moms. One of 'em calls the cops, we're toast. And what about the birthday kid? Little Timmy or whoever—gonna cry rivers if we yoink his Lego castle." Molly rolled her eyes, sketching a quick getaway route. "Cry? That's the beauty! Blame it on the dog or some rival party prank. We'll leave a decoy gift: a rock in shiny paper. As for the van, we valet it as 'Caterer's Ride.' Flash a fake badge, park up front." "Fake badge? From where, the dollar store?" Bill chuckled, but his brow furrowed. "Okay, say we score the loot—barbies, bikes, that drone thingamajig. Split's fifty-fifty, right? I get the gadgets; you take the dolls?" "Deal," Molly agreed, folding the blueprint. "But no double-crossing. Last heist, you 'forgot' my cut on those hubcaps. This is kids' stuff—literally—but one slip, and we're the punchline in tomorrow's news: 'Dumb Crooks Caught by Cupcakes.'" Bill grinned, raising his pretzel like a toast. "To the sweetest heist ever. May the presents be plentiful and the kiddos clueless." Molly clinked her coffee mug against it. "And may we not end up in orange jumpsuits playing dodgeball in the big house." As they huddled closer, laughter echoing off the peeling walls, the plan solidified—absurd, risky, and ripe for comedy gold. But in the thief trade, what wasn't?
created_at
"2026-02-20T19:50:57.246Z"
2/20/2026, 7:50:57 PM
model
"x-ai/grok-4-fast"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2004
moderation_latency
(null)
generation_time
6554
tokens_prompt
77
tokens_completion
728
native_tokens_prompt
216
native_tokens_completion
926
native_tokens_completion_images
0
native_tokens_reasoning
235
native_tokens_cached
215
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00047395
router
(null)
provider_responses
0
endpoint_id
"c9586d9d-97ff-40ca-ae94-22504d6f9b7e"
id
"3782da38-5364-5e42-5f49-2594f02d535b"
is_byok
false
latency
156
model_permaslug
"x-ai/grok-4-fast"
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771617057-lJi9YxtzpDcsG4VaPdbV"
upstream_id
"3782da38-5364-5e42-5f49-2594f02d535b"
total_cost
0.00047395
cache_discount
0.00003225
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
51.93%
Matches word count
n/a
neededClean
false
words
484
95.20%
Dialogue to Total Word Ratio
Ratio: 69.01%, Deviation: 0.99%
neededClean
false
wordsTotal
484
wordsDialogue
334
73.5634%