NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHYAVT9B73Q177KGTP0NRVB1
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
20.1s
Cost:
$0.008660
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.01%
Matches word count
(details)
96.68%
Dialogue to Total Word Ratio
(details)
48.3462%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly leaned against the dingy warehouse wall, scribbling on a napkin. "Okay, Bill, listen up. This kids' party at the mayor's mansion is our ticket to payday. Balloons, cake, and a mountain of presents. We blend in as clowns—big shoes, red noses, the works." Bill chuckled, twirling a fake mustache. "Clowns? Molly, last time I wore a wig, I looked like a deranged poodle. But fine, what's the play? We crash during the piñata bash?" "Exactly," Molly said, grinning. "While the rugrats are whacking that candy mule, we slip into the gift room. You distract the parents with balloon animals—make 'em look like tax auditors or something hilarious." Bill snorted. "Tax auditors? That's gold. I'll twist one into the IRS logo. Then what? Grab the loot and bolt?" "Nah, too sloppy," Molly replied. "We stash the presents in oversized clown pants. I've got a pair that could hide a pony. Exit through the bounce house—deflate it on the way out for chaos cover." Bill nodded eagerly. "Genius! But what if some kid spots us? 'Hey, mister, that's my bike!'" Molly winked. "Easy. We juggle excuses: 'It's a magic trick!' Poof, we're gone with the goods. This'll be the heist of the century—presents galore, no fingerprints, just frosting on our shoes." Bill high-fived her. "You're a criminal mastermind, Mol. Let's make those kiddos' tears our retirement fund."
created_at
"2026-02-20T20:11:38.165Z"
2/20/2026, 8:11:38 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
6547
moderation_latency
(null)
generation_time
20032
tokens_prompt
77
tokens_completion
355
native_tokens_prompt
744
native_tokens_completion
540
native_tokens_completion_images
0
native_tokens_reasoning
206
native_tokens_cached
743
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00866025
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"e4eec6dd-228e-eb1e-afe4-aca94d16c367"
is_byok
false
latency
65
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771618298-kBrlgXtyCljYdq1OzAF1"
upstream_id
"e4eec6dd-228e-eb1e-afe4-aca94d16c367"
total_cost
0.00866025
cache_discount
0.00167175
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.01%
Matches word count
n/a
neededClean
false
words
231
96.68%
Dialogue to Total Word Ratio
Ratio: 85.71%, Deviation: 4.29%
neededClean
false
wordsTotal
231
wordsDialogue
198
48.3462%