NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHQG46QBKB9AJ5M4311Z4QY
Run Details
Model:
Grok 4.20 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
1m 48s
Cost:
$0.019488
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill sat at a small table covered with sketches of the party venue in their cramped hideout apartment. The two thieves were finalizing their outrageous plan to steal all the presents from the children's party. They spoke in hushed tones despite being alone. "Bill, stealing all the presents from a children's party is the most ridiculous idea you've ever had. How are we supposed to pull this off without a bunch of angry parents and crying kids on our tail? This could be a PR nightmare for our criminal careers," Molly asked. "Relax, Molly. It's brilliant because it's so unexpected. No one will suspect the friendly clown and his assistant. I'll keep the children entertained with my hilarious antics and balloon twisting while you quietly relocate the gifts to our waiting cart. We'll be in and out before the cake is cut and the parents realize what happened," Bill replied. "But your balloon twisting always ends in disaster. Remember the poodle that looked like a giraffe? And what if the kids want autographs or something? We need a better cover story. Maybe we can be the uncle and aunt from out of town who are helping with the party and know where the presents are kept," Molly said. "Uncles and aunts get asked too many questions like who is whose kid. The entertainer gig is perfect. I've already made the call pretending to be the mother and confirmed our arrival for the event. When I yell magic time that's your cue to start bagging the presents fast. The distraction will be me pulling a never-ending scarf from my sleeve that will mesmerize them all for minutes," Bill explained. "Okay, fine. But we need an escape plan that doesn't involve tripping over our oversized shoes in a panic. The back door leads to the alley where the van will be parked with the engine running. We'll have it loaded in under two minutes flat. What do we do if one of the kids catches us in the act of taking their new toys?" Molly inquired. "We tell them it's part of the show and not to worry. Yes little Timmy the presents are going into the magic vanishing box to reappear at the end of the party after the cake. By then we'll be long gone down the road. This score will keep us in business for months with all those expensive toys that will fetch a pretty penny on the black market," Bill said. "I still feel bad about it but you're right it's a soft target with no alarms no cameras and just lots of noise and chaos to cover our moves. Let's review the timing once more. We arrive at two forty five set up our props quietly and strike right after the presents are placed on the table but before they're opened by the excited children," Molly stated. "Exactly. Trust me this will be legendary in thief circles. The great children's party heist will go down in history. Our names will be whispered with awe by fellow crooks everywhere," Bill added. "Or we'll be known as the lowest of the low in the criminal underworld. But okay let's make it happen anyway. Just promise me no real magic tricks that could go wrong and set the place on fire or cause any real harm to the little ones," Molly concluded as they shook hands on the deal.
created_at
"2026-05-01T12:16:29.917Z"
5/1/2026, 12:16:29 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
101884
moderation_latency
(null)
generation_time
108250
tokens_prompt
77
tokens_completion
2771
native_tokens_prompt
184
native_tokens_completion
7757
native_tokens_completion_images
(null)
native_tokens_reasoning
7115
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0194881
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"b4323717-4612-9963-b46e-e80465936fb6"
is_byok
false
latency
109
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777637789-AP7TaLpSqphOSwOUCXSF"
session_id
(null)
api_type
"completions"
id
"gen-1777637789-p6sw4jK702bOTy98eevG"
upstream_id
"b4323717-4612-9963-b46e-e80465936fb6"
total_cost
0.0194881
cache_discount
0.0001344
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
572
0.00%
Dialogue to Total Word Ratio
Ratio: 87.78%, Deviation: 17.78%
neededClean
false
wordsTotal
573
wordsDialogue
503
0.0000%