NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY8958NSWR3BRHQYCX4RXNN
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
31.2s
Cost:
$0.013415
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
2.01%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
1.0058%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly slouched on the worn-out couch in their dingy apartment, a blueprint of the suburban McMansion sprawled across the coffee table. Bill paced back and forth, munching on a stale donut, his eyes wide with that manic gleam he got before a big score. "Alright, Molly, listen up. This kids' party is gold. Little Timmy's turning eight, and his parents are loaded. Presents stacked like a pyramid in the living room. We're talking bikes, tablets, maybe even a drone or two. We hit it during the piñata chaos, snag the loot, and vanish like ghosts." Molly arched an eyebrow, sipping her lukewarm coffee. "Ghosts? Bill, it's a backyard bash with screaming brats everywhere. How do we even get in? Crash the clown act?" Bill grinned, crumbs flying. "Exactly! We go in as entertainers. I'll be Bongo the Clown, you be Sparkles the Magician. I've got the costumes from that flea market heist last year. Wigs, makeup, the works." "Sparkles? That's the best you got? Fine, but what's the exit strategy? We can't just waddle out with arms full of Barbies and Legos." "Easy. We park the van as the catering truck. During the cake cutting, I distract with balloon animals gone wrong—pop 'em loud, cause a kid meltdown. You load the presents into gift bags we plant beforehand. Zip out the side gate while everyone's dealing with the sugar rush apocalypse." Molly chuckled, shaking her head. "Balloon animals? You can barely tie your shoes without instructions. Remember the zoo job? You tried making a giraffe and it looked like a deformed snake. Kids'll riot." "Hey, that snake was artistic! Anyway, focus. Security's minimal—rich folks think their HOA dues protect 'em. Just a nanny cam in the playroom. We loop it with footage of happy kiddos from YouTube." "YouTube? Bill, that's genius-level stupid. What if they spot us? Or worse, one of those helicopter moms corners us for a magic trick?" Bill waved his hand dismissively. "Improvise! Pull a rabbit out of your hat. Or better, pull their wallet while you're at it. Double score!" Molly leaned forward, eyes narrowing. "No side hustles. Stick to the plan. In at 2 PM, out by 2:30. We fence the toys at Uncle Pete's pawn shop. But what about the dog? Blueprint shows a golden retriever." "Piece of cake. I bring treats laced with sleepy-time herbs. Natural stuff, from that health food scam we pulled. Fido snoozes, we cruise." "Herbs? You're turning this into a holistic heist? Fine, but if that mutt wakes up and bites your clown pants off, I'm leaving you." Bill laughed, clapping her on the back. "Deal. This'll be our masterpiece. Who knew robbing rugrats could be so rewarding?" Molly smirked, rolling up the blueprint. "Just don't trip over your big shoes, Bongo. Let's make some memories—and a fortune."
created_at
"2026-02-20T19:26:29.665Z"
2/20/2026, 7:26:29 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
6173
moderation_latency
(null)
generation_time
31176
tokens_prompt
77
tokens_completion
717
native_tokens_prompt
744
native_tokens_completion
857
native_tokens_completion_images
0
native_tokens_reasoning
211
native_tokens_cached
743
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01341525
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"678941ce-6c2b-5c03-a449-a3478871dc13"
is_byok
false
latency
342
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771615589-xtPMRA9oXakNYd6Q4quo"
upstream_id
"678941ce-6c2b-5c03-a449-a3478871dc13"
total_cost
0.01341525
cache_discount
0.00167175
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
2.01%
Matches word count
n/a
neededClean
false
words
475
0.00%
Dialogue to Total Word Ratio
Ratio: 82.11%, Deviation: 32.11%
neededClean
false
wordsTotal
475
wordsDialogue
390
1.0058%