NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHQWHPSPHMSF18K4BA94M4R
Run Details
Model:
Grok 4.20 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
2m 10s
Cost:
$0.034830
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
25.29%
Dialogue to Total Word Ratio
(details)
12.6463%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The two thieves huddled in the back of their nondescript white van parked a block away from the target house. Colorful balloons could be seen tied to the mailbox in the distance, taunting them with their cheerful presence. Molly unrolled a hand-drawn map on a cardboard box between them, her face illuminated by a small flashlight. She had a determined look that Bill had come to both admire and fear over the years. The air was thick with the smell of old fast food wrappers and anxiety. "So, what's the plan for this kiddie party heist?" Bill asked cautiously. "We go in disguised as entertainers," Molly replied confidently. "What kind of entertainers?" Bill inquired with growing concern. "You'll be the clown, of course. I'll be the magician," Molly explained with a smirk. "Why do I have to be the clown? I hate clowns," Bill protested loudly. "Because you're funnier than you think. Now, the presents are usually piled on a table in the living room," Molly continued, ignoring his complaint. "How do we get past the parents?" Bill questioned, rubbing his temples. "Easy. Parents are distracted by their phones and gossiping. We strike during the cake singing part," Molly said, miming the action of grabbing boxes. "And if a kid recognizes us?" Bill worried, imagining the scenario. "Kids won't suspect a thing if we act the part. Just do some juggling or face painting," Molly reassured him. "Juggling? I can barely juggle my own thoughts," Bill joked nervously. "Then just hand out the party bags while I load the van with the real loot," Molly suggested, her plan becoming more elaborate. "This is crazy. Stealing toys from tots. What's next, raiding lemonade stands?" Bill sighed in exasperation. "Don't be such a baby. It's all in good fun. Think of it as redistributing wealth from spoiled brats to us," Molly laughed. "Redistributing? More like ruining birthdays. But okay, I'm in if you promise we don't pop any balloons. The noise gives me hives," Bill admitted reluctantly. "No popped balloons. We'll be in and out before the ice cream melts. Meet at the van with the goods," Molly confirmed, rolling up the map. "If we get caught, it's your fault for this harebrained scheme," Bill warned with a pointed finger. "Noted. Now let's practice our lines. Happy birthday to you, now where are the presents due?" Molly practiced in a silly voice. "This better work or we're retiring from thieving," Bill muttered under his breath. Outside, the sounds of children playing in the yard drifted over, serving as a reminder of the peculiar target they had chosen for their latest caper. The van's interior was cramped with various props for their disguises, including oversized shoes, a red nose, and a cape for the magic act. Molly seemed thrilled by the challenge, while Bill looked like he might back out at any moment. They adjusted their plans several times, going over every possible variable from the timing of the games to the location of the gift table. The comedic duo continued their discussion late into the afternoon, fine tuning the details until they felt as ready as they could be for breaking into the children's party.
created_at
"2026-05-01T12:23:16.959Z"
5/1/2026, 12:23:16 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
125689
moderation_latency
(null)
generation_time
129896
tokens_prompt
77
tokens_completion
4207
native_tokens_prompt
184
native_tokens_completion
13867
native_tokens_completion_images
(null)
native_tokens_reasoning
13211
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0348303
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"9c9e2af8-6bab-91c4-9b5a-0cf181781093"
is_byok
false
latency
348
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777638196-bfbNNTIrL9h0IXCYoIQ6"
session_id
(null)
api_type
"completions"
id
"gen-1777638196-kqvO7JKwgGOfK8IAmMH3"
upstream_id
"9c9e2af8-6bab-91c4-9b5a-0cf181781093"
total_cost
0.0348303
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
533
25.29%
Dialogue to Total Word Ratio
Ratio: 44.76%, Deviation: 5.24%
neededClean
false
wordsTotal
534
wordsDialogue
239
12.6463%