NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHSVMY0ACN408T2B33BMK6J
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
11.9s
Cost:
$0.001828
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.08%
Matches word count
(details)
7.51%
Dialogue to Total Word Ratio
(details)
3.7960%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill crouched behind the overgrown hedge bordering the backyard, the distant squeals of children’s laughter mixing with the sickly-sweet scent of birthday cake. Bill adjusted his fake mustache and winced. “Remind me again why we’re robbing a kiddie party, Moll. My dignity’s already filing for divorce.” “Because, genius, little Timmy’s rich uncle rented the biggest bouncy castle in three counties and every spoiled brat in the zip code showed up loaded with gift bags. We hit the present table during musical chairs, we’re out in ninety seconds with enough loot to retire to that beach hut in Belize.” Bill snorted. “Musical chairs? Last time I played that I got a concussion and a restraining order from a six-year-old.” “You’re exaggerating. Probably. Look, I cased the joint. The adults are all half-cut on supermarket prosecco. The clown’s on his third smoke break. We go in as the replacement magicians. I palmed their business cards yesterday.” “You stole a children’s entertainer’s business cards? That’s a new low, even for us.” “Low is relative. Last month we stole a food truck. This is practically philanthropic. Think of it as wealth redistribution with balloons.” Bill peeked through the hedge. A small girl in a tutu marched past carrying a bright pink parcel the size of a toaster. “That one’s got an iPad in it. I can smell the unboxing video from here.” Molly grinned. “Exactly. We wait for the piñata. When the candy hits the ground it’s pandemonium. That’s our window. You create a diversion, I bag the table.” “Diversion? I’m not dressing as a dinosaur again. That zipper still haunts me.” “No dinosaur. You’re going to pretend you’re the hired face-painter who’s tragically allergic to glitter. Start sneezing like you’ve got tuberculosis. Kids love chaos. While they’re screaming and the parents are flapping, I’ll be the helpful big sister clearing the gifts out of harm’s way. Straight into our suspiciously large ‘lost property’ sack.” Bill rubbed his temples. “We’re going to hell, Molly. There’s a special circle for people who steal Elsa dolls.” “Relax. We leave one crappy board game and a half-eaten bag of Haribo so it looks like an inside job. By the time they figure it out we’ll be three states away eating lobster with our fake passports.” A burst of applause erupted from the garden. Someone shouted, “Who wants to see the magician?” Bill sighed. “That’s our cue, isn’t it?” “Unless you’d rather go back to robbing vending machines for a living.” He stood, brushed grass off his knees, and straightened his cheap bow tie. “Fine. But if any of those little monsters tries to bite me, I’m keeping the giant stuffed unicorn as emotional support.” Molly punched his arm. “That’s the spirit. Now look adorable and clueless. And whatever you do, don’t say the code word ‘pineapple’ unless the cops show up.” Bill blinked. “Why would I say pineapple?” “Because it’s funny and I like watching you panic. Now move, partner. Those presents aren’t going to steal themselves.” They stepped through the hedge with matching fake smiles, ready to ruin the best day of several tiny lives and possibly their own karma in the process.
created_at
"2026-05-01T12:57:44.645Z"
5/1/2026, 12:57:44 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
549
moderation_latency
(null)
generation_time
11840
tokens_prompt
77
tokens_completion
883
native_tokens_prompt
182
native_tokens_completion
694
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0018281
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"8017d29f-c8dd-9051-9ebb-8286c509941a"
is_byok
false
latency
170
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777640264-3liGBqAIqDmUzz2H39Ay"
session_id
(null)
api_type
"completions"
id
"gen-1777640264-Y12FJwur9o6juf7mH4M1"
upstream_id
"8017d29f-c8dd-9051-9ebb-8286c509941a"
total_cost
0.0018281
cache_discount
0.0001344
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.08%
Matches word count
n/a
neededClean
false
words
529
7.51%
Dialogue to Total Word Ratio
Ratio: 77.20%, Deviation: 7.20%
neededClean
false
wordsTotal
535
wordsDialogue
413
3.7960%