NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHQGYG7Q95RDASNMXJW6X4E
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
12.5s
Cost:
$0.001886
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.08%
Dialogue to Total Word Ratio
(details)
0.0404%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her ridiculous clown wig and glared across the tiny kitchen table at Bill. “This is the dumbest idea you’ve ever had, and you once suggested we rob a funeral home wearing tap shoes.” Bill leaned forward, eyes gleaming with the manic joy of a man who’d already pictured himself swimming in stolen My Little Ponies. “Hear me out, Mol. It’s brilliant. Twenty screaming kids, one stressed-out mom, zero security. The birthday boy’s getting a bouncy castle. While they’re all hypnotized by sugar and cheap magic tricks, we slip in, we slip out, and we leave with every single wrapped box under that balloon arch.” “You want us to rob a children’s party,” Molly said flatly. “Like actual monsters.” “Robin Hood stole from the rich. We’re stealing from obnoxious parents who spent four hundred bucks on a llama petting zoo. Different vibe.” Molly rubbed her temples. “How do we even get invited? We’re not exactly on the mommy-and-me circuit.” Bill grinned, producing a glittery sheet of paper from his jacket. “Already RSVP’d. We’re ‘Uncle Bill and Aunt Molly from Cincinnati.’ I said we’re in town for a taxidermy convention. They bought it.” “You signed us up as fake relatives? To children we’ve never met?” “Little Timmy thinks I’m his mom’s long-lost cousin who once arm-wrestled a bear. He’s six. He believes anything.” Molly stared at him for a long moment. “We need a better exit plan than running past a bunch of toddlers in party hats. Those little demons are fast when motivated by injustice.” “We’ll use the bouncy castle as a diversion. I cut a hole in the back. While chaos reigns, we load the gifts into the fake UPS van parked out front.” “Fake UPS van? When did we get a fake UPS van?” “Borrowed it from Derek. He painted over the real logo. It now says ‘Super Presents Unlimited.’ Subtle, right?” Molly closed her eyes. “Bill. We’re going to prison. I can see the headline already: ‘Clown Thieves Rob Six-Year-Old’s Pokémon Party, Leave Only Socks.’” Bill waved a dismissive hand. “The socks are decoys. We replace every big gift with a single pair of tube socks wrapped in the original paper. By the time they notice, we’re halfway to Canada with enough LEGO Star Wars sets to build our own Death Star.” Molly leaned in, voice dropping to a conspiratorial whisper. “Okay, fine. But if I have to wear that polka-dot dress, you’re wearing the full mascot head. I want you sweating like a sinner in church while you’re pretending to be Chuck E. Cheese’s depressed cousin.” Bill’s grin widened. “Deal. But I’m bringing the whoopee cushion. Nothing creates more distraction than a strategic fart at the cake table.” “You’re an actual child.” “And that’s why this is going to work. Kids trust other kids. I’m basically five-foot-ten of arrested development. Now pass the juice box. We need to rehearse our characters. I’m thinking I cry during ‘Happy Birthday.’ Really sell the waterworks.” Molly sighed, already reaching for the juice. “If we get caught, I’m telling the cops you kidnapped me.” “Fair enough,” Bill said, clinking his juice box against hers. “To the heist of the century. Or at least the most pathetic one.”
created_at
"2026-05-01T12:16:56.845Z"
5/1/2026, 12:16:56 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
810
moderation_latency
(null)
generation_time
12426
tokens_prompt
77
tokens_completion
882
native_tokens_prompt
182
native_tokens_completion
717
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0018856
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"b238b2c8-52e0-9bbb-a09d-35b545a0d0f1"
is_byok
false
latency
81
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777637816-UsyMzNUKQbk7rdIb4B9J"
session_id
(null)
api_type
"completions"
id
"gen-1777637816-cvt9j3TyPhvQo1SB17Ca"
upstream_id
"b238b2c8-52e0-9bbb-a09d-35b545a0d0f1"
total_cost
0.0018856
cache_discount
0.0001344
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
538
0.08%
Dialogue to Total Word Ratio
Ratio: 81.93%, Deviation: 11.93%
neededClean
false
wordsTotal
548
wordsDialogue
449
0.0404%