NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ01M9NC4GNEN50568RBM0
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
45.5s
Cost:
$0.060552
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
98.30%
Dialogue to Total Word Ratio
(details)
49.1486%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
They had been planning this ridiculous job for days, arguing over every detail. The idea to target a children's party had come to Molly in a dream, or perhaps a nightmare, but she was convinced it would be their ticket to easy street. In their dilapidated van parked discreetly near the venue, Molly and Bill prepared for the most unusual heist of their careers. The sounds of laughter and music from the children's party could be heard outside. Molly peered through the binoculars at the colorful balloons across the street while Bill paced nervously beside her in the cramped van. "Okay Bill, let's review the plan one more time. We sneak in during the magic show acting like hired help. You distract the kids with jokes while I gather the presents in a big bag disguised as a garbage sack." "But what if the birthday boy catches us? Kids are sharp these days with their video games and stuff. I don't want to be chased by a gang of six year olds," Bill replied with concern evident in his voice. "Relax, we'll use the smoke bomb from the party poppers. When it goes off everyone will think it's part of the entertainment. Then we run to the back yard and jump into the getaway vehicle which is actually a rented ice cream truck." Bill shook his head in disbelief. "An ice cream truck? That's your idea of subtle? Every kid will come running after us for popsicles. This is the dumbest heist ever," Bill complained. "It's brilliant because who suspects the ice cream man? We'll have the bell ringing and everything. Plus I have a rocket launcher that shoots confetti to blind any pursuers," Molly countered confidently. Bill stopped pacing and stared at her. "Rocket launcher? Where did you get that? And isn't that overkill for a children's party? I signed up for stealing not starting a war with toddlers," Bill protested. "It's not a real rocket launcher it's a toy one modified. Now stop whining and focus. The presents include a new bike and a dollhouse we can sell for good money. You take the left side of the table I take the right." "Fine but if we get recognized I'm growing a beard and moving to Canada. No more clowning around with these schemes. Why can't we rob a bank like normal criminals?" Bill asked. Molly smiled reassuringly at her partner. "Because banks have guns and alarms. This has cake and laughter. It's the perfect crime if we don't mess up. Remember the code if something goes wrong yell 'happy birthday' and we scatter." "Happy birthday for abort. Got it. But I still think releasing the chickens is a bad idea," Bill said doubtfully. "No chickens this time I promise. Just us two thieves against a horde of sugar high kids. We'll be in and out in five minutes flat." "Once we have the loot we'll split it fifty fifty just like always. No cheating this time Bill." "I never cheat. Well maybe a little but not on you. Let's make this our best heist yet," Bill responded confidently as he finally sat down. The tension in the van was thick but their determination was stronger as they prepared to execute the comedic caper.
created_at
"2026-03-12T21:24:44.304Z"
3/12/2026, 9:24:44 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
43635
moderation_latency
(null)
generation_time
45313
tokens_prompt
77
tokens_completion
764
native_tokens_prompt
184
native_tokens_completion
10069
native_tokens_completion_images
0
native_tokens_reasoning
9428
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0605516
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"d0780abb-180a-de6c-f12d-f88c4359a35c"
is_byok
false
latency
273
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350684-Af90uK4jKjCMuyYxQUT4"
upstream_id
"d0780abb-180a-de6c-f12d-f88c4359a35c"
total_cost
0.0605516
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
546
98.30%
Dialogue to Total Word Ratio
Ratio: 69.41%, Deviation: 0.59%
neededClean
false
wordsTotal
546
wordsDialogue
379
49.1486%