NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHRFC17RJDQWQFNM471H864
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
22.5s
Cost:
$0.001928
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
2.01%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
1.0058%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her oversized clown wig in the rearview mirror of their beat-up van, parked across from the bouncy castle that was currently terrorizing a horde of sugar-crazed six-year-olds. “Bill, if you say ‘piñata protocol’ one more time, I’m shoving that rubber chicken up your nose.” Bill grinned around a mouthful of Cheetos, orange dust coating his fake mustache like radioactive pollen. “Piñata protocol is solid, Mol. We wait for the lights to go out during musical chairs, then we roll in like we’re the entertainment. I’ve got the bubble gun. You’ve got the sack. Classic misdirection.” “Classic idiocy,” Molly muttered, scanning the yard where a harried mom in a pastel cardigan was wrestling a piñata shaped like a sentient cupcake. “These aren’t hardened criminals, Bill. These are children. Tiny, shrieking landmines with juice boxes. One wrong move and we’ll be buried under a pile of screaming kindergarteners demanding we sing ‘Baby Shark.’” Bill waved a hand dismissively, sending Cheeto dust into the air like cheap orange glitter. “That’s why we blend, babe. I’m going as Professor Tickles, world-famous balloon artist. You’re my lovely assistant, Miss Fizzle. We’ll dazzle them with balloon dogs, then while they’re all oohing and aahing, we liberate the gift table. Those presents aren’t gonna steal themselves.” Molly pinched the bridge of her nose. “Professor Tickles? Last time you tried balloon animals you gave a kid a balloon that looked like a depressed octopus and he cried for forty minutes. His mom still sends you hate mail.” “That was performance art,” Bill protested, checking the clasps on his polka-dot suspenders. “And this time I’ve upgraded. I’ve got the good helium. Plus, I swiped a smoke bomb from that last fireworks heist. Little poof, big distraction, we grab the loot and vanish before the parents even finish their lukewarm coffee.” “Smoke bomb?” Molly’s eyes widened in horror. “At a children’s party? Bill, that’s not a distraction, that’s a federal incident. Next thing you know we’re on the news as the monsters who made Timmy’s seventh birthday smell like burnt regrets.” Bill leaned over and poked her fake rubber nose, earning a murderous glare. “Relax. It’s a tiny one. Looks like a whoopee cushion. We set it off behind the bounce house, blame it on little Kevin with the bean burrito problem, and while everyone’s pointing fingers we waltz out with enough Barbies and Legos to start our own black market. Think of the retirement fund, Mol. No more knocking over liquor stores. Just one glorious afternoon of pastel-colored larceny.” Molly stared at him for a long moment, then sighed the sigh of a woman who had accepted her fate. “Fine. But if I have to sing ‘Happy Birthday’ even once, I’m taking my cut in cash and leaving you to explain the missing My Little Pony castle to an army of angry soccer moms.” Bill beamed, sliding open the van door. “That’s my girl. Let’s go make some birthday memories. The kind they’ll talk about in therapy for years.” Molly pulled her wig down lower and followed him into the chaos, sack in hand. “I hate you.” “Love you too, Miss Fizzle.”
created_at
"2026-05-01T12:33:33.739Z"
5/1/2026, 12:33:33 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
572
moderation_latency
(null)
generation_time
9709
tokens_prompt
77
tokens_completion
852
native_tokens_prompt
182
native_tokens_completion
707
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0019278
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"d7daa9a9-b6d3-9d13-a2b4-4dc1f3911488"
is_byok
false
latency
114
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777638813-0Z8aG7VGCYMcoaTRK7kK"
session_id
(null)
api_type
"completions"
id
"gen-1777638813-hHvLrRg13KLTJFsARrii"
upstream_id
"d7daa9a9-b6d3-9d13-a2b4-4dc1f3911488"
total_cost
0.0019278
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
2.01%
Matches word count
n/a
neededClean
false
words
525
0.00%
Dialogue to Total Word Ratio
Ratio: 69.36%, Deviation: 39.36%
neededClean
false
wordsTotal
532
wordsDialogue
369
1.0058%