NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ7Q8KMWRD12Q9E5V1BGEG
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
4.2s
Cost:
$0.004436
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.01%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0072%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her oversized sunglasses and leaned across the wobbly diner table. “Okay, Bill, hear me out. We hit the birthday party at exactly three-fifteen, right when they’re singing Happy Birthday and every adult is pretending they’re not filming vertical videos.” Bill slurped his coffee, eyes narrowing. “A children’s party, Mol. There’ll be clowns. I hate clowns. They have tiny hands and giant lies.” “Relax, we’re not robbing the clown. We’re robbing the gift table. Fifty kids, fifty presents, all wrapped in shiny paper that screams ‘expensive.’ We’ll be in and out before the piñata gets murdered.” “Still feels wrong,” Bill muttered. “Stealing from kids? That’s a new low even for us.” Molly rolled her eyes so hard her sunglasses nearly flew off. “We’re not stealing from kids, we’re stealing from their parents. Those gifts are guilt purchases anyway. Half the dads bought whatever the Amazon algorithm told them to. We’re basically performing a public service.” Bill drummed his fingers. “Fine. What’s the play?” “Simple. You dress as the world’s most pathetic magician. I’m your lovely assistant, Miss Mysteriosa. We show up claiming the real entertainers canceled. Parents will be so relieved they won’t ask questions. While you’re fumbling card tricks and making balloon animals that look like depressed weasels, I slip over to the gift table.” “And do what exactly? Stuff forty Barbie Dreamhouses into my cape?” “Negative. We use the wagon. Remember the red Radio Flyer in the van? We tell everyone it’s part of the act, call it the Magic Treasure Chest. I load the presents while you yell ‘Abracadabra’ really loud to cover the sound of wrapping paper tearing.” Bill stared at her. “You want me to yell Abracadabra at a bunch of six-year-olds?” “Loudly and with jazz hands. Commit, Billy-boy.” “What if some kid wants to help load the wagon?” “Then you let them. Kids love chores if you call them magic. Tell them they’re assistant sorcerers. They’ll feel important and we’ll get free labor.” Bill rubbed his temples. “This is insane. What’s our exit strategy when the real magician shows up?” “There is no real magician. I already called and canceled him this morning. Paid the cancellation fee with that credit card we stole from the dentist. We’re the only show in town.” “You canceled the clown too, didn’t you?” “Obviously. Clowns are unpredictable. Could ruin everything.” Bill sighed, half laughing. “We’re going to hell, Molly. Directly to hell on a slide made of stolen Legos.” “Maybe. But we’ll arrive with excellent Christmas presents for ourselves. Now stop crying into your coffee and tell me you’re in.” Bill looked at her for a long moment, then grinned despite himself. “I’m in. But if any child cries, I’m giving the presents back. I have limits.” Molly slapped the table. “That’s the spirit. Now let’s go practice your balloon weasels. They need to be tragic enough to distract from grand theft toddler.” “Tragic balloon animals. My acting coach would be so proud,” Bill said, standing up. “Baby, your acting coach is in prison. We’re the only audience you’ve got.” They both laughed, the sound carrying across the empty diner like the opening notes of perfectly orchestrated chaos.
created_at
"2026-03-12T21:28:55.832Z"
3/12/2026, 9:28:55 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
326
moderation_latency
(null)
generation_time
4055
tokens_prompt
77
tokens_completion
910
native_tokens_prompt
182
native_tokens_completion
717
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0044356
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"5a28d603-5a69-aadb-6d4e-9beb3fda9134"
is_byok
false
latency
258
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350935-qdRMRxGwNDzOYrSyPA0R"
upstream_id
"5a28d603-5a69-aadb-6d4e-9beb3fda9134"
total_cost
0.0044356
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.01%
Matches word count
n/a
neededClean
false
words
531
0.00%
Dialogue to Total Word Ratio
Ratio: 84.11%, Deviation: 14.11%
neededClean
false
wordsTotal
535
wordsDialogue
450
0.0072%