NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHR5FDA6WJNF8BB2KWQ1HEN
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
9.1s
Cost:
$0.001870
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
97.63%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
48.8138%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her fake mustache in the rearview mirror while Bill crammed another handful of Cheetos into his mouth from the driver’s seat. “This is the dumbest idea you’ve ever had,” he mumbled around the orange dust. “It’s brilliant and you know it,” Molly shot back. “Kids’ birthday party. Twenty screaming six-year-olds, one stressed-out mom, zero adults paying attention. The presents are just sitting there like shiny little idiots begging to be adopted.” “Adopted. That’s what we’re calling grand larceny now?” “Focus, Bill. We hit the bounce house first. I’ll create a diversion by pretending I’m the hired magician. You slip in as the disgruntled balloon animal guy.” “I don’t know how to twist balloons, Mol.” “That’s why they’re disgruntled, genius. You just wave a half-inflated wiener dog around and look disappointed. Kids won’t notice. They’re distracted by sugar and existential dread.” Bill licked Cheeto dust off his fingers. “What if they’ve got one of those face-painting ladies? Last time I saw one of those I almost cried because she turned me into a ‘brave knight’ and I looked like a depressed zebra.” “You’ll live. We go in at peak cake time. When that mom yells ‘who wants ice cream?’ we make our move. I’ll yell ‘Abracadabra’ and pretend to pull a rabbit out of my hat. Except the rabbit is you in the backyard grabbing the gift table.” “That’s a terrible magic trick.” “It’s performance art. While they’re all clapping for the rabbit that isn’t there, you shove the presents into the duffel bag shaped like a giant teddy bear. I spent three hours hot-gluing googly eyes on it. Show some respect.” Bill sighed. “What about the birthday kid? Kid’s gonna be crushed.” “Kid’s getting a Nintendo Switch from Grandma. We’re doing the family a favor. Less fighting over batteries on Christmas morning. We’re basically therapists.” “You’re insane.” “I’m a visionary. Now, the get-away. I’ve got the ice-cream truck parked two blocks over. We ditch the costumes, blast ‘Pop Goes the Weasel’ and disappear into suburban traffic like the criminals we are.” “Ice-cream truck? Where did you even—” “Borrowed it. Don’t ask questions you don’t want the answer to. You just have to sell sprinkles if anyone flags us down. Act natural.” “None of this is natural, Molly. We’re grown adults planning to steal Barbie dreamhouses from toddlers.” “Exactly. The absurdity is our greatest weapon. No one will believe it happened. The cops will think it’s some kind of performance prank. By the time they figure it out we’ll be three states away eating stolen Funfetti straight from the box.” Bill stared at her for a long moment. “I hate how much sense you make when you’re clearly deranged.” “That’s why we make such a good team. You provide the moral hesitation, I provide the chaotic momentum. Now put on the stupid clown wig. We’ve got a heist to ruin some poor child’s sense of trust in the world.” “God help us,” Bill muttered, reaching for the rainbow afro. “God’s busy. We’ve got the balloon animals.”
created_at
"2026-05-01T12:28:09.522Z"
5/1/2026, 12:28:09 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
346
moderation_latency
(null)
generation_time
8936
tokens_prompt
77
tokens_completion
860
native_tokens_prompt
182
native_tokens_completion
684
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0018703
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"43ea8b91-e5db-93c1-b4ea-d82b80c1a7a7"
is_byok
false
latency
88
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777638489-HS8tlm62e2SoEmyvFMc8"
session_id
(null)
api_type
"completions"
id
"gen-1777638489-rWLBx1r393R0PJ2G4Wwq"
upstream_id
"43ea8b91-e5db-93c1-b4ea-d82b80c1a7a7"
total_cost
0.0018703
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
97.63%
Matches word count
n/a
neededClean
false
words
507
0.00%
Dialogue to Total Word Ratio
Ratio: 89.13%, Deviation: 19.13%
neededClean
false
wordsTotal
515
wordsDialogue
459
48.8138%