NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY85PK685HHGRG6BQ76RBPA
Run Details
Model:
Grok 4.1 Fast
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
11.3s
Cost:
$0.000523
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
50.31%
Dialogue to Total Word Ratio
(details)
25.1528%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly hunched over the greasy diner table, blueprints of the Peterson estate spread out like a treasure map. Bill fidgeted across from her, nursing a coffee that smelled like regret. Outside, rain pattered against the window, perfect cover for scheming. "Listen up, Bill," Molly whispered, jabbing a finger at the diagram. "Kids' birthday bash tomorrow. Rich parents, spoiled spawn turning six. Gift table's loaded—Xboxen, bikes, iPads. We snag 'em all during the chaos." Bill's eyes widened. "A children's party? Molly, those tiny terrorists got fists like hammers! One scream and we're swarmed by soccer moms with tasers." "Relax, genius. We go in disguised as entertainers. I'm the clown—red nose, floppy shoes. You're the magician. Abracadabra, presents vanish!" Bill snorted coffee. "Magician? I can barely pull a rabbit outta my ass without it biting me. And clowns? Kids hate clowns now. They're all 'It' this, 'killer' that. We'll trigger a stampede!" Molly grinned, sketching a mustache on the blueprint dad. "Distraction's key. I spike the punch with sleepy-time juice—nothing lethal, just enough drool. You do the magic show: misdirect with card tricks while I stuff sacks under the tablecloth." "Sleepy punch? What if the birthday boy keels over mid-candle blow? Cops'll think we poisoned Pinocchio!" Bill waved his arms like a panicked bird. "Then we blame the caterer. 'Allergic reaction!' Classic. Entry's through the bouncy castle—zip in the back, deflate it for cover. Presents to the van in five minutes flat." Bill leaned in, whispering hoarsely. "Van? My uncle's rustbucket? It backfires like a cannon. Kids'll hear it from space!" "Park it two blocks away. We use wagons—red Radio Flyers, festive! Load 'em up like Santa's elves on steroids. Getaway: blend into the traffic of pissed-off parents fleeing the meltdown." "You're nuts. What about the dog? Blueprint says Fido's a Doberman. One bark and we're kibble." Molly pulled a dog biscuit from her pocket. "Laced with tranquilizers. He'll nap like a king. And security? Ha, it's a kiddie gig. Inflatable arch, balloon guy—worst guard dog's a goldfish." Bill rubbed his temples. "Timing? Cake at 3 PM, piñata massacre after. We hit at 2:45, right when they're blind with sugar?" "Perfect. I create diversion: fake piñata explosion—smoke bombs, candy rain. You vanish the haul. Split fifty-fifty, retire to Margaritaville." "Fifty-fifty? You get the clown shoes, I get mauled by toddlers!" Bill groaned, but his grin betrayed him. Molly slapped his hand. "Deal. Tomorrow, we fleece the rugrats. Heist of the century—or at least the sandbox." Bill sighed. "Fine. But if a kid pukes on me, you're fencing the slime-soaked Barbie solo." As they folded the plans, laughter bubbled between them, two crooks plotting the most ridiculous score of their sticky-fingered lives.
created_at
"2026-02-20T19:24:36.336Z"
2/20/2026, 7:24:36 PM
model
"x-ai/grok-4.1-fast"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
338
moderation_latency
(null)
generation_time
11181
tokens_prompt
77
tokens_completion
821
native_tokens_prompt
216
native_tokens_completion
1005
native_tokens_completion_images
0
native_tokens_reasoning
328
native_tokens_cached
154
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0005226
router
(null)
provider_responses
0
endpoint_id
"9821aa0e-dce3-4c27-9716-161359123d0b"
id
"a9c2b5c3-c8e9-0d51-98ea-ba591b44edeb"
is_byok
false
latency
104
model_permaslug
"x-ai/grok-4.1-fast"
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771615476-IC1vkcOyWw6vK4NaxDiI"
upstream_id
"a9c2b5c3-c8e9-0d51-98ea-ba591b44edeb"
total_cost
0.0005226
cache_discount
0.0000231
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
464
50.31%
Dialogue to Total Word Ratio
Ratio: 73.71%, Deviation: 3.71%
neededClean
false
wordsTotal
464
wordsDialogue
342
25.1528%