NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ34D90BE2AQ9D8X83J6XF
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
38.4s
Cost:
$0.047100
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
51.93%
Matches word count
(details)
54.42%
Dialogue to Total Word Ratio
(details)
53.1728%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill were huddled in the cluttered back of their old van that was parked near the bustling children's party filled with balloons and laughter from the excited kids. Molly spread out the plans on a small table while Bill looked on anxiously, his face showing clear signs of doubt about the whole operation they were about to undertake. "So Bill, we'll crash this party like we belong there," Molly began enthusiastically. "Disguised as clowns of course." "Clowns? Molly that's crazy. Kids are scared of clowns," Bill protested with a shudder. "Not these kids. They've seen it all. We'll use the bouncy castle for cover," Molly explained while pointing at her diagram. Bill frowned deeply as he considered the idea. "How do we steal the presents without anyone noticing? There are like twenty kids there." "Easy. Distract them with magic tricks while I load the bags. When the cake comes out everyone will be busy," Molly said confidently as she mimed pulling a rabbit from a hat. "But what if they recognize us?" Bill asked worriedly biting his lip. "Who would recognize two thieves at a kid's party? We'll paint our faces. You get the big red nose," Molly laughed imagining the scene. "I hate red noses. They make me sneeze. Remember the last heist?" Bill complained recalling the failure. "That was different. This time it's flawless. We go in at the peak of the games," Molly assured him with a pat on the back. "Peak of the games? Like musical chairs? This is insane," Bill said while shaking his head in disbelief. "Insane but brilliant. The presents are piled high. We'll take them all and be gone before the parents even blink," Molly replied trying to pump him up. "Okay but no hurting feelings. If a kid sees me I tell him it's a game," Bill insisted trying to find some moral ground. "It's a game alright. The game of who gets the toys. Us," Molly said with a chuckle at his expression. "Us getting toys from toddlers. Sounds bad when you say it," Bill noted with a guilty tone. "Don't think of it that way. Think of the profit. Those presents will fetch good money on the black market for kid stuff," Molly said persuasively. "Black market for toys? Who buys stolen birthday gifts?" Bill questioned still not convinced. "Everyone. Desperate parents. We'll be doing a service," Molly joked to lighten the mood. "Service? You're stretching it. But fine. What's the signal to run?" Bill inquired finally coming around. "Shout 'let's eat cake' and we bolt," Molly answered with a smile. "'Let's eat cake'. Perfect for a party," Bill agreed at last. "Then it's settled. Load up the clown wigs and let's get ready to steal those presents," Molly concluded as she started packing. They double checked their bag of props smiling at the absurdity of their new scheme as the party music played in the distance making the situation even more comical in their minds as they prepared for the heist of the birthday presents from the unsuspecting children and their families.
created_at
"2026-03-12T21:26:25.454Z"
3/12/2026, 9:26:25 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
36324
moderation_latency
(null)
generation_time
38190
tokens_prompt
77
tokens_completion
782
native_tokens_prompt
184
native_tokens_completion
7827
native_tokens_completion_images
0
native_tokens_reasoning
7198
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0470996
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"72f140b2-2616-abee-11e0-b0983d50184f"
is_byok
false
latency
272
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350785-LmV0mdR9wfG1TL0v6z4s"
upstream_id
"72f140b2-2616-abee-11e0-b0983d50184f"
total_cost
0.0470996
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
51.93%
Matches word count
n/a
neededClean
false
words
516
54.42%
Dialogue to Total Word Ratio
Ratio: 53.49%, Deviation: 3.49%
neededClean
false
wordsTotal
516
wordsDialogue
276
53.1728%