NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHSQRP3TM6ZDQM5HA70ATCM
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
13.8s
Cost:
$0.001793
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
86.38%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
43.1902%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her oversized clown wig and stared at the crayon-drawn map on the kitchen table. "Bill, this is the dumbest idea you've ever had, and you've had some stinkers. We're breaking into a children's birthday party? For presents?" Bill grinned like a man who'd just invented sliced bread. "Exactly! Think about it, Mol. Twenty sugar-high kids, one distracted mom, and a mountain of wrapped toys nobody's guarding. It's perfect. We go in as the entertainment. I juggle, you do that creepy balloon animal thing you do." "I don't do creepy balloon animals. I do artistic balloon sculptures. Last time a kid cried because I made a giraffe that looked like it was plotting revenge." Molly tapped the map. "How do we even get out? These things have those stupid safety gates everywhere. And what about the goody bags? Those are basically landmines of plastic junk." "Simple," Bill said, leaning in. "We perform, we 'accidentally' knock over the present table during the big finale. While everyone's screaming and laughing, we shove everything into the giant teddy bear costume I'm wearing. Then we yell 'Happy birthday!' and waddle out the back door like we meant to do that." Molly rubbed her temples. "Bill, the bear costume is two sizes too small. You'll look like a depressed grizzly who lost a bet. And what if they want us to stay for cake? I can't eat cake in a clown nose. It gets all sticky." "We tell them we're on a gluten-free, fun-free diet. Look, the birthday kid's turning six. His name's Timmy. He wants a robot dinosaur. We steal that first. It's the big one. Then we grab the rest while the parents are filming on their phones. Nobody watches the actual humans when there's a screen involved." "What about the other kids? They'll rat us out. 'Mommy, the clown lady took my Lego spaceship!' Kids are tiny narcs, Bill." "We give them decoy balloons. I stuff them with those cheap rings from the dollar store. They'll be so busy fighting over plastic jewelry they won't notice their real presents are gone until tomorrow. By then we're in Mexico living off stolen Pokémon cards." Molly stared at him. "Mexico? With children's toys? We're not even good at this. Last heist we stole a guy's lunchbox thinking it was a briefcase. It had a juice box in it, Bill. A juice box." "This is different," he insisted. "High risk, high reward. Those presents are worth hundreds. We'll fence them on the dark web. 'Lightly loved children's gifts. Slightly sticky.' We'll be legends." "Legends who get arrested by a PTA committee," Molly grumbled. She sighed, then smirked. "Fine. But if we get caught, I'm blaming you. And I'm keeping the robot dinosaur. That thing's mine." Bill clapped his hands. "That's my girl! Now help me practice my juggling. If I drop one more rubber chicken, the whole plan falls apart." "You're going to drop all of them, aren't you?" "Probably. But that's when the distraction happens. Chaos is our friend, Molly. Chaos... and screaming children."
created_at
"2026-05-01T12:55:37.418Z"
5/1/2026, 12:55:37 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
474
moderation_latency
(null)
generation_time
13629
tokens_prompt
77
tokens_completion
769
native_tokens_prompt
182
native_tokens_completion
653
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0017928
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"9395adc6-3691-99af-8e61-534110f1cd9d"
is_byok
false
latency
94
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777640137-X10s3ekz39wKbndbueot"
session_id
(null)
api_type
"completions"
id
"gen-1777640137-v7DOsgflWhbQq3nVFLrk"
upstream_id
"9395adc6-3691-99af-8e61-534110f1cd9d"
total_cost
0.0017928
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
86.38%
Matches word count
n/a
neededClean
false
words
511
0.00%
Dialogue to Total Word Ratio
Ratio: 89.32%, Deviation: 19.32%
neededClean
false
wordsTotal
515
wordsDialogue
460
43.1902%