NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ4CXCJKQNAVK2KTJZKF62
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
1m 2s
Cost:
$0.074292
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
9.61%
Matches word count
(details)
22.15%
Dialogue to Total Word Ratio
(details)
15.8790%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In a small dimly lit apartment cluttered with heist gear and half-eaten takeout, two figures huddled over a crudely drawn map of a local community hall. Molly, with her curly hair tied back, pointed at different areas while her partner Bill looked on with a mixture of amusement and trepidation. They were experienced thieves but this job was a new low or perhaps a new high in absurdity. "Bill, this is the big one. We are going to crash that children's party and take every single present they have stacked up there," Molly declared with a gleam in her eye. Bill leaned back in his chair and crossed his arms. "Are you serious? A children's party? What kind of thieves are we becoming? Next we'll be stealing candy from babies. Literally." "Stop being such a wimp. It's not like we are taking their favorite toys from their hands. These are unopened gifts from guests. We can sell the new items and make a nice profit. No one gets hurt and the kids will get over it," Molly shot back as she adjusted the map. "But the plan has to be flawless. How do we get in without raising suspicion?" Bill inquired while rubbing his temples. "Simple. We pose as the hired clowns. I have the outfits ready. You will be Bozo the magician and I will be Bubbles the clown. You keep the kids and parents entertained with your so called magic tricks and silly jokes. That gives me the opportunity to sneak the presents out the back door to our waiting van," Molly explained. Bill chuckled despite himself. "Me, a magician? I can barely pull a coin from behind an ear without dropping it. And what if they expect real entertainment? My jokes are terrible." "Your jokes are perfect for kids. They have low standards. Just do the one about the chicken crossing the road or something. Keep them laughing. I will be busy emptying the gift table. We'll have the loot out in under ten minutes," Molly reassured him as she stood up to demonstrate a fake magic trick with a coin. "Then what? We can't just drive off with a van full of wrapped boxes without looking suspicious," Bill pointed out. "The van is decorated with birthday balloons and signs. It will look like we are part of the event. After we load up, we drive away slowly like normal party service people," Molly explained. "Okay, I see the vision. It's crazy but it might just work. Let's go over the timing again to make sure we don't overlap with the cake cutting," Bill said. They spent the next few minutes fine tuning their strategy, with Molly doing most of the talking and Bill adding his concerns and suggestions. The pair seemed to be getting more confident in their ridiculous scheme as the conversation progressed.
created_at
"2026-03-12T21:27:06.929Z"
3/12/2026, 9:27:06 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
60061
moderation_latency
(null)
generation_time
61589
tokens_prompt
77
tokens_completion
685
native_tokens_prompt
184
native_tokens_completion
12359
native_tokens_completion_images
0
native_tokens_reasoning
11804
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0742916
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"89b33d5b-f81a-17ae-bf85-d1e3cf8e754d"
is_byok
false
latency
325
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350826-hiDV9hzcEmWJn9mpk2Vx"
upstream_id
"89b33d5b-f81a-17ae-bf85-d1e3cf8e754d"
total_cost
0.0742916
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
9.61%
Matches word count
n/a
neededClean
false
words
478
22.15%
Dialogue to Total Word Ratio
Ratio: 64.51%, Deviation: 5.49%
neededClean
false
wordsTotal
479
wordsDialogue
309
15.8790%