NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHSHRE9WFN00FHFYX9G4QQK
Run Details
Model:
Grok 4.20
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
11.5s
Cost:
$0.001825
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
99.38%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.6885%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the overgrown hedge, adjusting her fake mustache for the third time. The distant sound of cartoon music and high-pitched laughter drifted from the backyard party like some kind of sugar-fueled nightmare. Bill slouched beside her, fiddling with a comically large pair of binoculars that had a sticker of a cartoon dinosaur on the side. “This is the dumbest thing we’ve ever done,” Molly whispered, “and we once tried to rob a bank dressed as clowns.” Bill grinned, not taking his eyes off the chaos of balloons and screaming kids. “Exactly. We’re due for a win. Look at that table, Mol. Presents stacked like a pirate’s treasure. One piñata, twenty screaming gremlins, and a cake the size of a tractor tire. We go in, we blend, we snatch, we vanish.” Molly rolled her eyes so hard it was audible. “Blend? Bill, you’re six-foot-four and wearing a sweater with a reindeer on it. In July. We’re not blending, we’re looming.” “Kids love looms,” he said confidently. “I’ll be the fun uncle. You’re the face-painting lady. We practiced this.” “I practiced not vomiting from the smell of cheap frosting. There’s a difference.” She peeked again. A small boy was currently using a lightsaber to decapitate a row of juice boxes. “Target is the gift table by the bouncy castle. We create a distraction, maybe release that bunch of balloons, cause a stampede, then I slip in while you do your… whatever it is you do.” Bill chuckled. “My charm. My raw animal magnetism. Also I brought bubbles. Kids go feral for bubbles. I’ll start a bubble war on the far side of the yard. You grab the goods. Simple.” Molly stared at him. “You’re an actual child in an adult body, you know that?” “Explains why I fit in so well at these things.” He nudged her with his elbow. “Come on. Think of it as performance art. We’re not thieves, we’re… present redistribution specialists. Robin Hood with more glitter.” She sighed, checking the tiny sack hidden under her jacket. “Fine. But if I get recognized by some sticky-handed monster who saw me at my niece’s party last month, I’m selling you out faster than you can say ‘felony.’” “Noted. I’ll create a diversion by pretending to be a malfunctioning mascot. I’ve got the head in the van.” Molly paused. “You have a mascot head.” “Gorilla named Gary. Very convincing.” She pinched the bridge of her nose. “We’re going to prison.” “Only if we’re boring about it,” Bill said, standing up and cracking his knuckles. “Ready to crash the birthday party of the year?” Molly adjusted her mustache one last time and muttered, “I hate how much I love you right now.” They stepped out from behind the hedge, two ridiculous adults walking straight into enemy territory with nothing but bubbles, bad ideas, and an unshakable belief that this was somehow going to work. The children’s laughter grew louder. So did the faint, ridiculous hope that today they’d finally pull off the world’s stupidest heist.
created_at
"2026-05-01T12:52:20.557Z"
5/1/2026, 12:52:20 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
494
moderation_latency
(null)
generation_time
11147
tokens_prompt
77
tokens_completion
812
native_tokens_prompt
182
native_tokens_completion
666
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0018253
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"51a316c3-216c-27d5-5982-02656d0651bc"
is_byok
false
latency
142
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777639940-4AGwqL72ZPphOcGgoC6t"
session_id
(null)
api_type
"completions"
id
"gen-1777639940-Ozg389gywsXeRb1SI55b"
upstream_id
"51a316c3-216c-27d5-5982-02656d0651bc"
total_cost
0.0018253
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
99.38%
Matches word count
n/a
neededClean
false
words
505
0.00%
Dialogue to Total Word Ratio
Ratio: 59.69%, Deviation: 29.69%
neededClean
false
wordsTotal
511
wordsDialogue
305
49.6885%