NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ7EB3E55NG47CWHBQMEQZ
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
19.3s
Cost:
$0.030444
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In the dimly lit back of their beat-up van, which was parked discreetly across the street from the brightly decorated hall hosting the children's birthday party, Molly and Bill sat with their heads close together, examining a hand-drawn map and some blurry photographs they had taken earlier. The sound of cheerful music and high-pitched giggles floated through the air, making the whole endeavor feel even more ridiculous. Balloons of every color swayed gently in the wind outside, and a large sign announced the celebration for little Timmy's seventh birthday, complete with promises of games, cake, and of course, a mountain of presents that the duo had set their sights on. This is crazy, Bill. We're actually going to steal presents from kids? Molly asked, her brow furrowed in concern as she adjusted her ridiculous fake mustache disguise that Bill had insisted upon. Bill chuckled softly, his eyes gleaming with the kind of mischief that usually got them into trouble. Hey, it's not stealing if we consider ourselves the fun police. Those gifts are too good for one kid. We'll distribute them to... uh, ourselves. Molly rolled her eyes dramatically but couldn't quite suppress the smirk that tugged at the corners of her mouth. The absurdity of their plan was not lost on her, yet here they were, preparing to infiltrate a place full of sugar-high toddlers and overprotective parents. She pointed at the map, tracing a path through the side entrance that looked least guarded. Okay, genius. How do we get in without raising suspicion? The place is crawling with parents and screaming children. Simple, Bill said, pulling out a large colorful bag filled with props from under his seat. We pose as the surprise entertainers. I'll juggle and tell bad jokes while you distract everyone with magic tricks. Then, when the cake arrives, we make our move on the gift table. She leaned in closer, her voice dropping to a conspiratorial whisper as she glanced nervously out the window at the party guests arriving. What about the alarm system? I heard they have security for these events nowadays. Pfft, kid stuff, Bill replied dismissively, waving his hand as if swatting away her worries like an annoying fly. We'll use the classic diversion: release some balloons inside to create chaos. In the pandemonium, we grab the goods and slip out the back. The van fell into a brief silence as Molly considered the plan, watching through the tinted glass as a group of children ran around the lawn with party hats askew. Outside, a colorful pinata in the shape of a superhero was being enthusiastically whacked by a blindfolded kid, sending candy flying in all directions. It really did seem like the perfect chaotic cover for their operation, though the ethical implications nagged at her slightly. Still, in their line of work, one couldn't be too picky about the targets, especially when the loot looked so promising. Fine, but if we get caught, I'm blaming your stupid mustache idea, Molly quipped, finally giving in with a reluctant smile. Bill laughed heartily, patting her on the shoulder. Fair enough. Now let's synchronize our watches. We strike at cake time. Those presents will be ours before the kids even know what's missing. They reviewed the layout one more time, carefully marking potential escape routes through the backyard and timing their entry to coincide with the peak of the festivities. The heist, which they had dubbed Operation Birthday Bandit, was shaping up to be their most whimsical caper yet, filled with risks but also the potential for hilarious stories if they succeeded. As they prepared their disguises and gathered their tools, the two thieves couldn't help but feel a thrill of excitement mixed with the comedic dread of what could go wrong in a room full of six-year-olds.
created_at
"2026-03-12T21:28:46.695Z"
3/12/2026, 9:28:46 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
17217
moderation_latency
(null)
generation_time
18971
tokens_prompt
77
tokens_completion
970
native_tokens_prompt
184
native_tokens_completion
5051
native_tokens_completion_images
0
native_tokens_reasoning
4302
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0304436
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"d67f1af3-170c-e15d-cd8b-c793fe8014e0"
is_byok
false
latency
266
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350926-KKLbUbzxZn2x2eXbduNI"
upstream_id
"d67f1af3-170c-e15d-cd8b-c793fe8014e0"
total_cost
0.0304436
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
635
0.00%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 30.00%
neededClean
false
wordsTotal
641
wordsDialogue
0
0.0000%