NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY8QDBHC15Y5F9VY6WS18CZ
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
32.0s
Cost:
$0.013610
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
86.38%
Matches word count
(details)
25.13%
Dialogue to Total Word Ratio
(details)
55.7555%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly paced the dingy motel room, her red wig slightly askew, while Bill hunched over a crumpled blueprint of the suburban McMansion. Empty coffee cups littered the table, and a half-eaten pizza box added to the chaos. The air smelled of stale pepperoni and desperation. "Alright, Bill, listen up," Molly said, slamming her fist on the table. "This kids' party is our golden ticket. Balloons, cake, and a mountain of presents. We're talking drones, bikes, and enough gift cards to fund our retirement in Fiji." Bill scratched his scruffy beard, eyeing the blueprint. "Fiji? Last time you said Bermuda. Anyway, the party's at the Johnsons' place. Rich folks, right? Security's tight—cameras, alarms, and probably a guard dog that thinks it's a wolf." Molly snorted. "A guard dog? Please, it's a yappy Pomeranian named Fluffykins. I scoped it out. The party's tomorrow at noon. Kids everywhere, screaming and sugar-high. Perfect cover." "Cover? Molly, we're not blending in as clowns. I'm six-foot-five with a face like a mugshot. How do we even get in?" She grinned, pulling a clown costume from her duffel bag. "Ta-da! You're Bozo the Burglar. I'll be the magician's assistant. We'll crash as entertainers. I hacked their email— they think we're from 'Party Pros Unlimited.'" Bill groaned, holding up the oversized polka-dot pants. "This is humiliating. What if the real entertainers show up?" "Then we improvise! 'Sorry, folks, double booking!' While the parents freak, we grab the gifts. The present pile's in the living room, near the back door. Easy exit to the van." He shook his head. "Easy? With thirty rugrats running around? One of 'em might rat us out. 'Mommy, the clown's stealing my Xbox!'" Molly laughed. "That's why we distract 'em. You juggle while I do magic tricks—poof, the presents vanish into our sacks. We'll use the cake explosion as cover. Rig a fake volcano cake with smoke bombs. Chaos ensues, we bolt." "Smoke bombs? Where'd you get those?" "Online, duh. Same place I got the fake IDs. Bill, think big! These presents are worth thousands. Little Timmy's drone alone could fetch five hundred on the black market." Bill smirked. "Little Timmy? You're getting sentimental. Remember the museum job? You tripped over a velvet rope." "That was one time! This is foolproof. In, out, gone before the piñata breaks." He leaned back, crossing his arms. "Foolproof, huh? What about the parents? Soccer moms with tasers?" Molly waved him off. "They're busy with wine spritzers. We'll be ghosts. Now, practice your juggling. And remember, if anyone asks, your clown name is Wacky Willie." "Wacky Willie? That's the best you got?" "Better than Sticky-Fingered Steve. Come on, partner—let's make this the heist of the century!" Bill sighed but grabbed the balls. "Fine, but if Fluffykins bites me, you're paying the vet bill." As they rehearsed, laughter echoed through the room, masking the absurdity of their plan.
created_at
"2026-02-20T19:34:16.7Z"
2/20/2026, 7:34:16 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
7066
moderation_latency
(null)
generation_time
31948
tokens_prompt
77
tokens_completion
769
native_tokens_prompt
744
native_tokens_completion
870
native_tokens_completion_images
0
native_tokens_reasoning
172
native_tokens_cached
743
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01361025
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"80a480cb-a4e8-6170-8ca1-4ffe551e8b75"
is_byok
false
latency
788
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771616056-DtwpcvQ3jFWbjwcCK5Yu"
upstream_id
"80a480cb-a4e8-6170-8ca1-4ffe551e8b75"
total_cost
0.01361025
cache_discount
0.00167175
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
86.38%
Matches word count
n/a
neededClean
false
words
489
25.13%
Dialogue to Total Word Ratio
Ratio: 75.26%, Deviation: 5.26%
neededClean
false
wordsTotal
489
wordsDialogue
368
55.7555%