NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHYVGWP0Q46RAXNRFPTAWNV
Run Details
Model:
Grok 4.20 (Beta)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
3.9s
Cost:
$0.004016
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
68.10%
Matches word count
(details)
1.22%
Dialogue to Total Word Ratio
(details)
34.6610%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill crouched behind a hedge across the street from the pink-and-purple bounce house that dominated little Emily Harrison’s seventh birthday party. The muffled screams of sugar-crazed children echoed like a war zone. “Bill, I still can’t believe we’re doing this,” Molly hissed. “We’re about to ruin a children’s party. That’s a new low even for us.” “Relax, it’s not ruining, it’s redistributing joy,” Bill whispered back, adjusting his too-small Paw Patrol ski mask. “Those presents are just sitting there like shiny little gift cards to freedom. You want to pay rent this month or not?” “I want to not go to prison for stealing Barbie Dreamhouses, that’s what I want.” “Nobody goes to prison for stealing from kids. The cops would laugh too hard to cuff us. Besides, we’re wearing the disguises. Who’s gonna suspect the Easter Bunny and a depressed Minion?” Molly stared at him. “I’m the Easter Bunny. You’re the depressed Minion. That’s your master plan?” “Kids love bunnies. And everyone feels sorry for Minions lately. It’s perfect. Now here’s the play: I go in first, start the conga line distraction by yelling ‘Let’s dance before the piñata explodes!’ Then you slip behind the gift table, shove everything into the big pink pony backpack, and we’re out before the cake arrives.” “That’s your entire plan? A conga line?” “It’s foolproof. Kids can’t resist a conga line. It’s like catnip with a beat. While they’re snaking around the yard, nobody’s watching the presents.” “What if one of them wants to hug the Easter Bunny? I’m wearing eight pounds of felt and I already smell like regret.” “Then you hug the kid. Sell it. Tell them the Easter Bunny moonlights as a party entertainer because carrots are expensive. Make up a sob story about inflation. Kids love drama.” Molly rubbed her temples. “Bill, there are at least twenty of them in there. One of them is definitely going to recognize us from last month’s ‘lost magician’ gig at the community center.” “Impossible. We had mustaches then. And you were wearing that ridiculous turban. No overlap. Look, we hit the table at exactly 3:17, right after they sing Happy Birthday and before the sugar crash hits. Chaos is our friend.” “I hate you so much right now.” “You say that every job and yet here we are, still partners. Admit it, you love the thrill.” “I love not being broke. There’s a difference.” Bill peeked through the hedge again. “Okay, the clown just arrived. That’s our cue. Clowns always buy us six minutes of horrified fascination. Ready, Easter Bunny?” Molly sighed deeply, ears flopping. “If we get caught, I’m telling the cops the depressed Minion kidnapped me.” “Fair enough. On three. One… two…” They both stood up in their ridiculous costumes, took one last look at each other, and stepped toward the sound of screaming children and unguarded presents like the pathetic criminals they were.
created_at
"2026-03-12T21:22:16.09Z"
3/12/2026, 9:22:16 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
379
moderation_latency
(null)
generation_time
3746
tokens_prompt
77
tokens_completion
811
native_tokens_prompt
182
native_tokens_completion
647
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0040156
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"c5328fd8-169a-9de2-ba69-807b2d58534b"
is_byok
false
latency
341
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350536-IYYB6K2rM7JeD2x8mxk5"
upstream_id
"c5328fd8-169a-9de2-ba69-807b2d58534b"
total_cost
0.0040156
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
68.10%
Matches word count
n/a
neededClean
false
words
486
1.22%
Dialogue to Total Word Ratio
Ratio: 79.39%, Deviation: 9.39%
neededClean
false
wordsTotal
490
wordsDialogue
389
34.6610%