NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY8R8RK01WMFFTFRCXDDG08
Run Details
Model:
Grok 4.1 Fast
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
13.1s
Cost:
$0.000518
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
86.38%
Matches word count
(details)
99.98%
Dialogue to Total Word Ratio
(details)
93.1808%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly slammed a crumpled party invitation on the kitchen table, her eyes gleaming like a cat eyeing a goldfish bowl. Bill, nursing a beer and a black eye from their last "score," squinted at it. The venue was "Little Timmy's Superhero Bash" at the community center, complete with bounce house, face painting, and a mountain of presents. "Okay, Bill, listen up," Molly said, jabbing the photo of a pile of gift-wrapped boxes. "We hit this kids' party hard. In and out in ten minutes. We snag every present under that piñata. Designer toys, cash-stuffed cards—ka-ching!" Bill choked on his beer. "A children's party? Molly, those rugrats are feral! Last time we boosted from a playground, I got grape juice in places grape juice shouldn't go. And the parents? They're like rabid wolverines in yoga pants." "Exactly!" Molly grinned, pulling out two floppy clown costumes from a duffel bag. "Disguises! We go in as Bozo and Betty the Clown. I'll juggle, you'll do balloon animals. Distract the kiddos with magic tricks while I sack up the loot." Bill poked the red nose, which honked sadly. "Juggling? I can't juggle peanuts without dropping 'em. And balloon animals? Mine always look like deflated condoms. What if some snot-nosed brat rats us out? 'Mommy, the clown stole Timmy's Xbox!'" Molly waved him off, sketching a quick map. "Plan's foolproof. We arrive late, when the cake's cut and sugar highs peak. Chaos city! You start the diversion: scream 'Pin the tail on the donkey—naked!' Kids swarm you. I slip behind the present table, stuff a Hefty bag full. Exit via bounce house—inflate it to block the door." Bill's face paled. "Bounce house escape? We'll look like idiots bouncing out with arms full of Barbies. And what about the birthday boy? Little Timmy in his cape? He'll cry rivers, summon CPS on us." "Pfft, Timmy's three feet of whine. We'll leave him a decoy—a sock full of candy. He'll think it's treasure." Molly mimed stuffing a bag. "Prime targets: that massive Lego set, the drone from Uncle Rich, gift cards galore. Pawn 'em quick, retire to Margaritaville." Bill rubbed his chin. "Margaritaville? With what, My Little Pony figurines? Face it, Moll—this is nuts. Why not rob a bank? Clean, no sticky fingers." "Banks got alarms, lasers, armed guards. Kids? They got sippy cups and sticky buns. Low security, high reward!" She slapped his back. "Trust me, Bill. We're gonna clown this town." He sighed, slipping on the wig. Itched like hell. "Fine. But if a piñata explodes candy in my face, you're driving getaway. And no singing 'Happy Birthday'—I sound like a dying walrus." Molly honked her nose triumphantly. "Deal. To the van, partner. Timmy's toys are ours!" As they piled gear into the rusty van, Bill muttered, "World's dumbest heist. What could go wrong?" Molly just laughed, gunning the engine toward sugar-coated glory.
created_at
"2026-02-20T19:34:44.77Z"
2/20/2026, 7:34:44 PM
model
"x-ai/grok-4.1-fast"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
377
moderation_latency
(null)
generation_time
12905
tokens_prompt
77
tokens_completion
805
native_tokens_prompt
216
native_tokens_completion
995
native_tokens_completion_images
0
native_tokens_reasoning
299
native_tokens_cached
154
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0005176
router
(null)
provider_responses
0
endpoint_id
"9821aa0e-dce3-4c27-9716-161359123d0b"
id
"a9b3a2b2-7573-b836-5999-71dad0080de3"
is_byok
false
latency
154
model_permaslug
"x-ai/grok-4.1-fast"
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771616084-OP9vE71xkKlht3xa2FZH"
upstream_id
"a9b3a2b2-7573-b836-5999-71dad0080de3"
total_cost
0.0005176
cache_discount
0.0000231
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
86.38%
Matches word count
n/a
neededClean
false
words
489
99.98%
Dialogue to Total Word Ratio
Ratio: 69.94%, Deviation: 0.06%
neededClean
false
wordsTotal
489
wordsDialogue
342
93.1808%