NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZJ5SG4ZSJZY753V5J2P3N
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
37.6s
Cost:
$0.057534
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
29.95%
Dialogue to Total Word Ratio
(details)
14.9756%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill huddled together in their secret basement lair. Breaking into a children's party to steal all the presents was a plan born from desperation and a lack of better ideas. They knew it sounded absurd, but in the world of small time thieves, sometimes you had to think outside the box or in this case, inside the gift wrap. Molly paced the room while Bill sat with a notebook, jotting down ideas that mostly involved food and avoiding angry parents. "Okay, let's go over this one more time. We arrive at the party disguised as clowns hired for the event. The real ones have been sent on a wild goose chase across town," Molly said. The map they had was crudely drawn but accurate enough to show the layout of the house, the location of the gift table, and the best escape routes through the backyard filled with lawn games and a large inflatable slide. They had spent the better part of the morning gathering their supplies, which included colorful wigs, face paint, oversized shoes, and a sack that looked like it belonged at a circus. "Clowns? I hate clowns. Why can't we be magicians? Magicians are cooler and they have capes to hide stuff under," Bill complained. The party was expected to have about twenty kids, all under the age of ten, along with several adults who might be distracted by their own conversations and the need to keep the little ones in line. This provided the perfect opportunity for the two thieves to slip in and out unnoticed if they played their roles correctly. "Because clowns are expected at these things, Bill. Now stop whining and listen. While the kids are playing games, we start collecting the presents into our big red bag," Molly said. Bill was nervous about interacting with the children, fearing he might say something wrong or trip over his own feet in the oversized clown shoes. Molly, however, was confident in her ability to charm the crowd if necessary and keep the plan on track. They reviewed the timing again and again, making sure they wouldn't overlap with the gift opening portion of the party. The neighborhood was quiet, which meant less chance of interference from nosy neighbors, but they still planned to use code words in case something unexpected happened. "We'll replace them with empty boxes wrapped nicely. By the time they find out, we'll be long gone. The key is timing it right after the cake but before the opening of gifts," Molly explained. "The signal is me yelling ice cream truck and then we bolt for the back door. Our getaway car will be waiting in the alley," she added. As the clock ticked closer to the start time, the tension in the room grew, but so did their excitement for this unusual score. Bill nodded in agreement, feeling a bit more confident now that the plan was laid out clearly. They gathered their materials and began to put on their disguises, laughing at how silly they looked in the mirror. This heist was sure to go down in their personal history as the funniest one yet, even if it didn't net them as much as they hoped. The thought of pulling off such a silly crime made them both smile as they prepared to head out.
created_at
"2026-03-12T21:34:38.39Z"
3/12/2026, 9:34:38 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
35570
moderation_latency
(null)
generation_time
36972
tokens_prompt
77
tokens_completion
781
native_tokens_prompt
184
native_tokens_completion
9566
native_tokens_completion_images
0
native_tokens_reasoning
8926
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0575336
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"91b96590-85a8-5101-7db1-5cd7f98864a8"
is_byok
false
latency
273
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351278-TfiQf3EpI7Zb3MDmzVN2"
upstream_id
"91b96590-85a8-5101-7db1-5cd7f98864a8"
total_cost
0.0575336
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
558
29.95%
Dialogue to Total Word Ratio
Ratio: 25.09%, Deviation: 4.91%
neededClean
false
wordsTotal
558
wordsDialogue
140
14.9756%