NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHSP89TX3P1BWW8DY188A6E
Run Details
Model:
Grok 4.20 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
4m 23s
Cost:
$0.027298
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
99.83%
Dialogue to Total Word Ratio
(details)
49.9137%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill crouched silently behind a large prickly shrub that scratched at their clothing, observing the chaotic children's party with its massive bouncing castle and long tables laden with an abundance of brightly wrapped presents of various shapes and sizes. The sun was high in the clear blue sky, and the air smelled deliciously of barbecue meats and sweet vanilla frosting. This was their target, the ultimate soft target for their latest outrageous scheme that had them both questioning their life choices. "So, the plan is set then," Bill muttered, checking his watch for the tenth time. "We blend in as distant relatives, make small talk, then when they bring out the piñata we make our move." Molly sighed, adjusting her ill-fitting wig. "Blending in as relatives? With your face, Bill? You look more like a bank robber than an uncle. And the piñata move is risky because all the kids will be swinging that bat. We could get hit." "Risk is our middle name," Bill said with a smirk. "We wait for the swing, create a diversion by releasing the balloons early, and in the mayhem snatch the table of gifts. Simple." "Simple for you maybe," Molly replied. "I'm the one who has to distract the parents with my terrible magic tricks. What if they ask me to pull a rabbit out? I have no rabbit, Bill." "You use your imagination," Bill suggested. "Pull a toy from behind their ear or something. Then we load the stuff into the getaway van parked nearby. After that it's smooth sailing to the fence who deals in hot toys." "What about the moral issue here?" Molly asked, her voice dropping. "These are kids' presents. It's not like stealing from a corporation. I might have trouble sleeping if little Emma doesn't get her princess set." Bill rolled his eyes. "We'll donate some of the proceeds to a toy drive. That way it's like we're redistributing the wealth. Robin Hood style but with more party hats." "Robin Hood didn't wear a disguise as a goofy aunt," Molly quipped. "Fine, let's do it your way. But if we get caught, I'm pretending I don't know you." "That's the spirit," Bill said enthusiastically. "Now let's review the signals. If I tug my ear, it means abort. If I tug my nose, it means go for it." The pair continued their preparations with whispered instructions and occasional giggles at the ridiculousness of it all, treating the childish event like a high security vault in a Hollywood blockbuster. Their hearts beat faster and louder as the party reached its frenetic peak with games and music blaring, knowing that soon they would launch their comedic caper to steal the presents from under the noses of the unsuspecting celebrants. The absurdity of the situation was not lost on either of them as they exchanged nervous glances, but the lure of easy bounty was too strong to ignore despite the potential for complete humiliation. With one final nod of agreement, they stood up straightened their disguises and began to walk casually toward the party entrance, ready to execute their plan with as much grace as two bumbling incompetent thieves could possibly muster in such a silly scenario.
created_at
"2026-05-01T12:54:47.871Z"
5/1/2026, 12:54:47 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
262668
moderation_latency
(null)
generation_time
262749
tokens_prompt
77
tokens_completion
1039
native_tokens_prompt
184
native_tokens_completion
10854
native_tokens_completion_images
(null)
native_tokens_reasoning
10217
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0272978
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"00a2026f-c110-9966-9d35-4c82f5a6fe67"
is_byok
false
latency
85
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777640087-lWaYnOuCXaclU78GjPCW"
session_id
(null)
api_type
"completions"
id
"gen-1777640087-aDOiGjcAUpCctisXMb2O"
upstream_id
"00a2026f-c110-9966-9d35-4c82f5a6fe67"
total_cost
0.0272978
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
537
99.83%
Dialogue to Total Word Ratio
Ratio: 50.19%, Deviation: 0.19%
neededClean
false
wordsTotal
538
wordsDialogue
270
49.9137%