NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY7YHF29DH150JQ7RRGJ3T4
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
29.1s
Cost:
$0.014315
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
81.27%
Matches word count
(details)
25.81%
Dialogue to Total Word Ratio
(details)
53.5394%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly paced the cluttered motel room, her eyes glued to a crumpled blueprint of the McAllister mansion. Bill lounged on the bed, munching on a stale doughnut, his fake mustache peeling at the edges. "Alright, Bill, listen up," Molly said, slapping the blueprint onto the rickety table. "The party's tomorrow—little Timmy's tenth birthday bash. Balloons, cake, the works. And under that piñata? A mountain of presents. We're talking bikes, gadgets, that new video game console everyone's raving about. We snag 'em all, fence 'em quick, and retire to Fiji." Bill wiped crumbs from his chin. "Fiji? With our luck, it'll be the county jail. How we even getting in? It's a kids' party, not a bank vault." "Disguises, genius," Molly shot back, holding up a clown wig and a red nose. "You're Bozo the Clown, hired entertainment. I'll be the balloon artist. We blend in, scope the joint, then bam—during the cake-cutting chaos, we load up the van parked out back." Bill snorted. "Bozo? I look ridiculous in polka dots. Last time I wore a wig, it caught fire from that botched fireworks heist. And what if the kids swarm me? I hate kids—they're sticky and ask too many questions." "Deal with it," Molly snapped. "Pretend you're fun. Juggle some balls, tell bad jokes. 'Why did the thief go to the party? To get presents!' See? Comedic gold." Bill rolled his eyes. "Hilarious. Okay, fine. But the presents—where they stashing 'em? Not under the tree like Christmas; it's a summer bash." "Intel says the gift table's in the living room, right by the French doors," Molly replied, pointing at the blueprint. "We create a diversion. I pop a balloon animal that looks like a bomb—nah, too risky. How about you 'accidentally' knock over the punch bowl? Flood the floor, parents freak, we slip out with the loot in garbage bags disguised as party trash." "Genius, but what about security? Rich folks like the McAllisters got cameras, maybe a guard dog," Bill countered, sitting up. Molly grinned slyly. "Cameras? We loop the feed with that hack from the museum job. Dog? Sedative-laced hot dogs. Easy peasy." Bill scratched his head. "Sedatives? Where we getting those? And hot dogs—I'm allergic, remember? Last time, I swelled up like a balloon myself." "Stop whining," Molly said, exasperated. "I'll handle the dog. You focus on not tripping over your giant clown shoes. Look, Bill, this is our big score. No more penny-ante gigs. Imagine it: sandy beaches, no more ramen noodles." Bill sighed dramatically. "Fine, but if a kid pukes on me, I'm out. Deal?" "Deal," Molly laughed. "Now, practice your clown laugh. Ho ho ho—no, that's Santa. Try again." Bill honked an imaginary nose. "Honk honk! Let's steal some joy!" They both burst out laughing, the blueprint forgotten amid their ridiculous scheming. Little did they know, the real chaos was just beginning.
created_at
"2026-02-20T19:20:41.709Z"
2/20/2026, 7:20:41 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
6408
moderation_latency
(null)
generation_time
29024
tokens_prompt
77
tokens_completion
744
native_tokens_prompt
744
native_tokens_completion
917
native_tokens_completion_images
0
native_tokens_reasoning
229
native_tokens_cached
743
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01431525
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"3c80bb25-98df-4572-6929-29de4fda6250"
is_byok
false
latency
130
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771615241-FQSgfEM0w8aksOS7a4EM"
upstream_id
"3c80bb25-98df-4572-6929-29de4fda6250"
total_cost
0.01431525
cache_discount
0.00167175
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
81.27%
Matches word count
n/a
neededClean
false
words
488
25.81%
Dialogue to Total Word Ratio
Ratio: 75.20%, Deviation: 5.20%
neededClean
false
wordsTotal
488
wordsDialogue
367
53.5394%