NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ6TDYV1HACS3PD7WMXAYX
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
4.4s
Cost:
$0.004634
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0004%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her ridiculous party hat and glared across the sticky kitchen table at Bill. “This is the dumbest idea you’ve ever had, and you once suggested we rob a funeral.” Bill grinned around a mouthful of stolen birthday cake. “Exactly. Nobody expects the clown-shaped criminals. We hit little Timmy’s seventh birthday bash at three o’clock sharp. In and out before the ice cream melts.” “We’re robbing toddlers, Bill. Toddlers.” “Toddlers with excellent taste in presents. You saw the registry. That kid registered for a drone, a Switch, and something called a ‘Super Squishy Unicorn Throne.’ We’re talking resale value, Mol.” Molly pinched the bridge of her nose. “How exactly do we blend in? You’re six-foot-four and look like a parole officer who lost a fight with a tattoo gun.” “I’ll be the magician. Cape, top hat, the works. I’ll make the birthday boy’s watch disappear—permanently. You’ll be my lovely assistant.” “I’m not wearing the sparkly leotard.” “You’ll wear the sparkly leotard. It’s non-negotiable. Also, I already bought it. It’s pink. With stars.” Molly stared at him for a long moment. “I hate you.” “You love me. Focus. We enter through the bouncy castle. I’ve mapped the airflow. If we crawl underneath while the kids are distracted by the piñata, we pop up behind the gift table like we belong.” “Like we belong,” Molly repeated flatly. “Two grown adults emerging from a bouncy castle like sweaty moles. Brilliant.” Bill waved his fork. “I’ve got a smoke bomb shaped like a whoopee cushion. When Mom starts cutting the cake, I trigger it. Total chaos. You grab the big boxes while I charm the grandparents. Old people love me. I’ve got references.” “You have a rap sheet, not references.” “Semantics. Look, we load the presents into the fake ambulance I parked two streets over. Then we peel out to the sounds of confused parents and one very angry bouncy-castle attendant.” Molly leaned forward. “What if the kid cries? I can’t handle it if the kid cries, Bill. I have a reputation.” “Which kid? There’s like thirty of them. They cry at the drop of a hat anyway. We’ll give him a balloon animal on the way out. I’m learning balloon animals, by the way. I can make a giraffe that looks like a sausage.” Molly rubbed her temples. “We’re going to hell.” “Probably, but first we’re going to a six-year-old’s backyard with a table full of untraceable electronics and collectible stuffed animals. Think of the fence, Mol. Think of the money. Think of the look on Detective Ramirez’s face when he realizes the Children’s Party Bandit struck again.” Molly was quiet for a second, then sighed. “Fine. But I’m not wearing the leotard. I’ll be the disgruntled face-painter. And if anyone asks, your name is Professor Fizzywinks and you’re my emotionally unstable brother.” Bill clapped his hands together. “That’s the spirit! Professor Fizzywinks and his emotionally unstable sister. This is going to be legendary.” Molly stood up and pointed at him. “If I hear one single ‘happy birthday’ joke while we’re committing felony theft, I will shoot you with your own water-squirting flower.” “Fair enough,” Bill said, already reaching for another slice of cake. “Now help me practice my disappearing coin trick. The toddlers are gonna love this.”
created_at
"2026-03-12T21:28:26.308Z"
3/12/2026, 9:28:26 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
318
moderation_latency
(null)
generation_time
4131
tokens_prompt
77
tokens_completion
916
native_tokens_prompt
182
native_tokens_completion
750
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0046336
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"a9b2c8e8-9f3a-0970-0c70-08fcab52771c"
is_byok
false
latency
302
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350906-tZ2NOZmSglteT2kc3AP5"
upstream_id
"a9b2c8e8-9f3a-0970-0c70-08fcab52771c"
total_cost
0.0046336
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
547
0.00%
Dialogue to Total Word Ratio
Ratio: 85.25%, Deviation: 15.25%
neededClean
false
wordsTotal
556
wordsDialogue
474
0.0004%