NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHSMBHXYSJ0BA3QWS6NZWJR
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
10.4s
Cost:
$0.001728
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
27.17%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
13.5829%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedgerow, peering through a gap at the bouncy castle wobbling like a drunk hippo. “This is the worst idea you’ve ever had, Bill. And you once suggested we rob a police station with a water pistol.” Bill adjusted his cartoonishly large sunglasses, the ones with the googly eyes. “It’s not a heist, Mol, it’s a redistribution of wealth. Those presents are just sitting there, taunting us. Think of it as performance art. We’re the world’s most charming burglars.” “Charmingly stupid,” she muttered, checking her glitter-covered backpack. “We look like rejected mascots from a bankrupt theme park.” “Exactly! Camouflage, darling. Who’s going to suspect the purple dinosaur and the depressed unicorn? The kids will think we’re the entertainment.” He pulled out a crudely drawn map on the back of a pizza menu. “Here’s the plan. I create a diversion by pretending to be the hired magician. While they’re all watching me pull a rubber chicken out of my trousers, you slip in, bag the presents, and we’re gone before the cake arrives. Simple.” Molly stared at him. “You want to do magic? Bill, the last time you tried card tricks you set your own tie on fire.” “That was intentional misdirection,” he said, waving a hand. “Besides, I’ve upgraded. I bought a cape. It’s reversible. One side says ‘Happy Birthday,’ the other says ‘Sorry About Your Presents.’” She pinched the bridge of her nose. “We’re going to prison. I can feel it. Tiny plastic handcuffs and everything.” “Relax. I’ve thought of everything. Got the smoke bombs, the fake clown nose, and I even learned how to juggle. Badly. The kids will love it.” Bill grinned, showing the gap where his front tooth used to be. “And if anyone asks, we’re from ‘Parties R’ Us.’ I’ve got the business cards. They’re slightly damp but legible.” Molly sighed, adjusting her unicorn horn which kept sliding into her eyes. “Fine. But if we get caught, I’m telling them you kidnapped me. I’ll cry. I do a very convincing kidnapped face.” “That’s my girl.” He clapped her on the shoulder, nearly knocking her into the hedge. “Now, quick question: on a scale of one to ten, how bad would it be if I accidentally released three dozen balloons during the escape?” “Eleven,” she said flatly. “Balloon-related evidence is how they caught the Nottingham Squirrel Gang. I refuse to be outsmarted by party supplies.” Bill chuckled. “Fair enough. We’ll stick to the original plan. In, out, and home in time for pizza. Our pizza. With the money from selling slightly used My Little Ponies on the dark web.” “You’re an actual monster,” Molly said, but she was smiling despite herself. “Monster with a plan,” he corrected, tugging his purple dinosaur tail into place. “Ready to ruin some childhoods?” She zipped up her glitter backpack with a resigned click. “Let’s go steal some joy.”
created_at
"2026-05-01T12:53:45.666Z"
5/1/2026, 12:53:45 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
381
moderation_latency
(null)
generation_time
10262
tokens_prompt
77
tokens_completion
789
native_tokens_prompt
182
native_tokens_completion
654
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0017281
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"a6dab8f0-705f-36fb-8082-b986a19f9a93"
is_byok
false
latency
77
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777640025-fOZ5AwjrraV1ZWgtiVUr"
session_id
(null)
api_type
"completions"
id
"gen-1777640025-HRdx1FVcdVXHNc0kJv3v"
upstream_id
"a6dab8f0-705f-36fb-8082-b986a19f9a93"
total_cost
0.0017281
cache_discount
0.0001344
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
27.17%
Matches word count
n/a
neededClean
false
words
481
0.00%
Dialogue to Total Word Ratio
Ratio: 71.99%, Deviation: 41.99%
neededClean
false
wordsTotal
482
wordsDialogue
347
13.5829%