NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHT9EDJA1HYGHJMWGYTK0RH
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
15.8s
Cost:
$0.001833
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.01%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0049%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her fake mustache in the rearview mirror of their battered van parked two blocks from the brightly colored house. “Bill, I’m telling you, this is our ticket. Twenty screaming kids, one stressed-out mom, and a mountain of wrapped presents stacked like a pirate’s hoard. In and out before the piñata even hits the floor.” Bill snorted, peeling the wrapper off a protein bar. “You said the same thing about the church bake sale, Mol. We stole six pies and a diabetic’s insulin pen. This time there’s gonna be clowns. I hate clowns. They see through people.” “They see through makeup, not criminal masterminds,” Molly said, checking the cartoon animal masks on the dashboard. “We go in as Coco the Clown and Bongo the Bear. I’ve got the squeaky shoes. You’ve got the juggling pins filled with sleeping powder. One accidental drop near the punch bowl and the whole birthday circus naps like it’s naptime.” Bill rubbed his temples. “Sleeping powder in the punch? That’s dark, even for us. What if little Timmy’s allergic and his face swells up like a balloon animal? We’ll be on the news as the Worst Uncles Ever.” “Relax. It’s just enough to make everyone yawn and blame the sugar crash. While they’re dozing, we bag every gift with ‘To Emily, Love Grandma’ on it. I’ve timed it. Seven minutes from back door to getaway. The birthday girl gets one pity present; we get Xbox, Legos, and whatever overpriced doll screams when you pull its hair.” “You’re assuming the mom won’t notice two clowns she never hired,” Bill said, skepticism thick as frosting. Molly grinned. “That’s the beauty. I called yesterday pretending to be from ‘Clowns-R-Us.’ She thinks we’re the entertainment. She’ll practically shove us inside and beg us to make balloon animals. We make one pathetic giraffe, spike the punch, and commence Operation Birthday Heist.” Bill stared at the colorful balloons tied to the mailbox down the street. “This feels wrong. Stealing from children. My mom always said karma’s a boomerang. I don’t want it smacking me in the face during my next parole hearing.” “Your mom also said never trust a man in spandex. Look how that turned out with your cousin Kevin. Besides, we’re not keeping everything. We’ll donate half the ugly sweaters to charity. The other half we sell on Craigslist. Think of it as redistribution of wealth. From spoiled six-year-olds to two lovable rogues who need gas money and new bolt cutters.” Bill sighed, pulling on the oversized bear head. The fake fur smelled like regret and mothballs. “Fine. But if this goes south and I end up in a police sketch wearing this thing, I’m ratting you out. I’ll tell them it was all your idea and you kicked my childhood dog.” Molly laughed, slipping on the red nose. “Deal. Now let’s go ruin a birthday party like professionals. And remember: lots of honking, minimal eye contact, and if anyone asks where the real clown is, tell them he’s stuck in traffic behind a hearse full of sad mimes.” They bumped fists, stepped out of the van, and waddled toward the sound of squealing children and imminent parental regret.
created_at
"2026-05-01T13:05:16.728Z"
5/1/2026, 1:05:16 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
420
moderation_latency
(null)
generation_time
15700
tokens_prompt
77
tokens_completion
834
native_tokens_prompt
182
native_tokens_completion
696
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0018331
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"a8e1efe5-7c09-5a3b-bf89-2b88fef85206"
is_byok
false
latency
85
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777640716-YZ7S7DAKNvIDIgr92yU1"
session_id
(null)
api_type
"completions"
id
"gen-1777640716-CT8jA0zQEw6RMd2B2gIa"
upstream_id
"a8e1efe5-7c09-5a3b-bf89-2b88fef85206"
total_cost
0.0018331
cache_discount
0.0001344
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.01%
Matches word count
n/a
neededClean
false
words
531
0.00%
Dialogue to Total Word Ratio
Ratio: 79.78%, Deviation: 29.78%
neededClean
false
wordsTotal
534
wordsDialogue
426
0.0049%