NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHRG21G17HTZGCR30AKGXJV
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
12.3s
Cost:
$0.001918
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
3.62%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
1.8117%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her fake clown nose for the third time, smearing greasepaint across her cheek. “Bill, if you say ‘it’s just like the diamond job’ one more time I’m going to shove this balloon animal where the sun doesn’t shine.” Bill grinned, teeth flashing beneath a crooked party hat. “It is just like the diamond job, Mol. Except the diamonds scream if you pick them up too fast and they’re all hopped up on juice boxes.” “We are not calling the presents ‘diamonds.’ That’s how we got caught last time, remember? You kept yelling ‘secure the ice!’ in front of thirty traumatized six-year-olds.” “Fine, fine. We infiltrate the bouncy castle at precisely 14:45 when the magician’s doing his crap rabbit routine. That’s our window. Kids’ll be distracted, parents half-drunk on warm prosecco, and the birthday boy—little Timmy Turnbull—will be face-first in cake.” Molly leaned over the blueprint they’d drawn on the back of a stolen My Little Pony invitation. “You still think the puppet theater is the best exfil route?” “Absolutely. Those velvet curtains are basically a getaway car with sequins. I’ll create a diversion by ‘accidentally’ releasing Gerald the Gerbil from his cage. Chaos ensues. You load the presents into the giant panda costume I borrowed from the mall.” “That’s not a panda, Bill. That’s a very depressed polar bear with mange. And it smells like regret and Cheetos.” “Details. While I’m doing the gerbil rodeo, you drag the bear suit past the gift table. Kids will think it’s part of the entertainment. We’ve got seven minutes before the ice-cream truck arrives and every sugar-crazed gremlin in a ten-mile radius swarms the driveway.” Molly pinched the bridge of her nose, leaving a purple thumbprint. “What about the piñata? Last time we ignored a piñata we almost died under a barrage of miniature Snickers.” “We leave the piñata. Rule number one: never come between children and their suspended sugar sculpture. That thing’s a war crime waiting to happen.” She sighed. “This is insane. We’re grown adults planning to rob a children’s party. There’s probably a support group for this.” “Hey, we’re providing a valuable public service. Those kids get way too many toys. We’re practicing population control on the plastic crap market. Plus I saw the gift table, Mol. There’s a limited-edition Lightsaber in there. Limited edition. I’ve been good. Relatively.” “You once set a hotel lobby on fire because the concierge was rude.” “Exactly. Relatively.” Molly checked her watch. “Fine. We hit the clown car at 2:30, you do your creepy balloon tricks to draw the crowd, I swap the real presents with the decoy box of educational math books I stole from the library.” Bill’s face lit up. “And then we vanish like classy thieves.” “Like classy thieves who are definitely going to hell.” He offered his pinky. “Pinky swear we don’t keep the Lightsaber?” Molly hooked her finger around his. “We’ll flip for it in the getaway van. Now stop smiling like that. You look like a children’s party predator.” “That’s the spirit,” Bill said, adjusting his party hat. “Let’s go make some memories nobody’s ever going to believe.”
created_at
"2026-05-01T12:33:56.297Z"
5/1/2026, 12:33:56 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
645
moderation_latency
(null)
generation_time
12135
tokens_prompt
77
tokens_completion
881
native_tokens_prompt
182
native_tokens_completion
703
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0019178
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"22e6bcfb-ad1f-947e-b6ef-1e344b8bd9b2"
is_byok
false
latency
357
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777638836-3r9VMVNmcPUit2d7Sljw"
session_id
(null)
api_type
"completions"
id
"gen-1777638836-thm5FuHfJKxNnd7marSW"
upstream_id
"22e6bcfb-ad1f-947e-b6ef-1e344b8bd9b2"
total_cost
0.0019178
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
3.62%
Matches word count
n/a
neededClean
false
words
524
0.00%
Dialogue to Total Word Ratio
Ratio: 85.34%, Deviation: 35.34%
neededClean
false
wordsTotal
532
wordsDialogue
454
1.8117%