NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY9NV0H3YXX1QXDP0AP3ZC8
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
18.6s
Cost:
$0.007960
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
99.74%
Matches word count
(details)
99.97%
Dialogue to Total Word Ratio
(details)
99.8590%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly leaned against the rusty van, eyeing Bill with a smirk. "Alright, genius, what's the play? We're crashing a kiddie bash for presents. This better not end with us dodging dodgeballs." Bill chuckled, adjusting his fake mustache. "Relax, Mol. It's foolproof. The party's at the McAllister mansion—rich folks, loaded with gifts. We'll go in as clowns. You know, big shoes, red noses, the works." "Clowns? Bill, last time you dressed as one, you scared the neighbor's dog into therapy. And what about security? They got cameras everywhere." "Pfft, cameras? I've got my hacker buddy looping the feed with footage of dancing squirrels. We'll slip in during the piñata whack-fest. Kids'll be distracted, parents tipsy on punch. Grab the gift table—boom, we're out with Barbies, Legos, and that one kid's college fund disguised as a teddy bear." Molly snorted. "College fund? You're dreaming. What if a tyke spots us? 'Mommy, the clown's stealing my pony!'" "Then we improvise! Juggle some excuses, throw in a balloon animal distraction. 'Hey kiddo, wanna see a giraffe?' While you stuff sacks. Teamwork, baby!" She rolled her eyes. "Fine, but if we get caught, I'm blaming your squeaky shoes. Deal?" "Deal. Heist of the century—kiddie edition!"
created_at
"2026-02-20T19:50:53.727Z"
2/20/2026, 7:50:53 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
5561
moderation_latency
(null)
generation_time
18488
tokens_prompt
77
tokens_completion
326
native_tokens_prompt
744
native_tokens_completion
484
native_tokens_completion_images
0
native_tokens_reasoning
182
native_tokens_cached
681
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00795975
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"39b12f93-6265-0bfe-1050-5259ec60419a"
is_byok
false
latency
53
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771617053-YW134GUPkUaxb5HiEKhY"
upstream_id
"39b12f93-6265-0bfe-1050-5259ec60419a"
total_cost
0.00795975
cache_discount
0.00153225
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.74%
Matches word count
n/a
neededClean
false
words
204
99.97%
Dialogue to Total Word Ratio
Ratio: 88.73%, Deviation: 1.27%
neededClean
false
wordsTotal
204
wordsDialogue
181
99.8590%