NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHS5DYYVSY51RX94YA1RYHX
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
12.7s
Cost:
$0.001860
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
86.38%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
43.1902%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the overgrown hedge, peering through binoculars at the balloons bobbing above the backyard fence. The sounds of squealing kids and off-key singing drifted over like a sugar-crazed alarm system. She lowered the binoculars and elbowed Bill, who was fiddling with a comically large pair of pliers. “Remind me why we’re robbing a six-year-old’s birthday party again?” she whispered. Bill grinned, his mustache twitching like it had its own criminal record. “Because little Timmy’s uncle is a tech billionaire who buys presents in bulk, Mol. We’re not stealing toys. We’re stealing unopened MacBooks, limited-edition sneakers, and one of those robot dogs that costs more than my car. It’s basically a charity heist. For us.” Molly rolled her eyes so hard she nearly pulled a muscle. “You said the same thing about the petting zoo. We ended up with a traumatized alpaca in the getaway van.” “That alpaca was a valuable asset,” Bill protested. “Until it spat on the satnav. Look, this time it’s foolproof. We wait till they start musical chairs, slip in through the side gate dressed as the hired magicians, grab the gift table, and vanish before the piñata even hits the ground.” Molly snorted. “Dressed as magicians? Bill, your idea of magic is making my patience disappear. Last time you tried sleight of hand you dropped the diamond necklace into a bowl of punch.” “That was intentional misdirection,” he said, puffing out his chest. “Anyway, I already rented the costumes. Top hats, capes, the works. I even learned a card trick. Pick a card, any card.” “I pick the card where we don’t traumatize thirty sugar-high children by stealing their new bicycles,” Molly muttered. “What’s our exit strategy when the parents notice two grown idiots in cheap velvet trying to wheel a Barbie Dreamhouse out the gate?” Bill tapped his temple. “Smoke bombs. I got the good ones this time, not the ones that smell like burnt hot dogs. We pop two by the bouncy castle, kids think it’s part of the show, we leg it to the van. I’ve got it parked behind the ice-cream truck. Genius, right?” “Genius is not the word I’d use,” Molly said, checking her watch. “The cake’s due in twenty minutes. Once they’re distracted by chocolate frosting we move. But if I see one parent reaching for their phone, we abort. I’m not doing another police lineup. My mugshot still has cake on it from the bakery job.” Bill offered her a high-five that she reluctantly returned. “See? Teamwork. We’ll be in and out faster than Timmy can say ‘Where’s my new drone?’” Molly adjusted her fake mustache, already itching under her nose. “If this goes south, I’m blaming you. And I’m keeping the robot dog.” “Fair,” Bill said, winking. “Just don’t let it follow you home. Those things get clingy.” They shared a final conspiratorial glance as distant laughter swelled and the first notes of “Happy Birthday” began. The thieves slipped toward the gate, capes fluttering like confused bats, ready to ruin the best worst party in neighborhood history.
created_at
"2026-05-01T12:45:36.61Z"
5/1/2026, 12:45:36 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
451
moderation_latency
(null)
generation_time
12678
tokens_prompt
77
tokens_completion
823
native_tokens_prompt
182
native_tokens_completion
680
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0018603
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"cdf2ed7a-4048-99ed-b9c3-09e7e2947097"
is_byok
false
latency
114
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777639536-kCn44gGTfSYFQCWotHfI"
session_id
(null)
api_type
"completions"
id
"gen-1777639536-WRAVutT1wimJJz1bCb9R"
upstream_id
"cdf2ed7a-4048-99ed-b9c3-09e7e2947097"
total_cost
0.0018603
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
86.38%
Matches word count
n/a
neededClean
false
words
511
0.00%
Dialogue to Total Word Ratio
Ratio: 69.75%, Deviation: 39.75%
neededClean
false
wordsTotal
519
wordsDialogue
362
43.1902%