NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY8RZ464VB5CZEBHXS9TQ2R
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
17.6s
Cost:
$0.007925
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
35.00%
Matches word count
(details)
73.72%
Dialogue to Total Word Ratio
(details)
54.3590%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly paced the dingy motel room, her eyes gleaming with mischief, while Bill hunched over a crumpled blueprint of the suburban McMansion. "Alright, Bill, listen up," Molly said, jabbing a finger at the map. "The party's at noon tomorrow—little Timmy's turning eight. Balloons, cake, the works. Our target: that pile of presents under the bouncy castle." Bill scratched his beard. "Presents? For kids? Molly, that's low even for us. What if we get caught by a pack of sugar-rushed munchkins?" "Low? It's genius! We'll dress as clowns—big shoes, red noses, the whole shebang. You distract with balloon animals, I'll snatch the gifts into our giant clown pants." "Clown pants? Those things could hide a pony! But what about the parents? Soccer moms with pepper spray?" "Easy—I'll spike the punch with laxatives. They'll be too busy in the bathroom to notice. We zip-line out the window with the loot. Imagine the haul: bikes, toys, maybe a PlayStation!" Bill chuckled. "You're insane. Remember the zoo heist? We almost got mauled by penguins." "This is foolproof! In and out in ten minutes. We'll fence the stuff at the black market toy exchange. Come on, partner—let's make some kids cry... for comedy's sake." Bill sighed. "Fine, but if a toddler bites me, you're on diaper duty."
created_at
"2026-02-20T19:35:07.663Z"
2/20/2026, 7:35:07 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
6952
moderation_latency
(null)
generation_time
17570
tokens_prompt
77
tokens_completion
337
native_tokens_prompt
744
native_tokens_completion
491
native_tokens_completion_images
0
native_tokens_reasoning
188
native_tokens_cached
743
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00792525
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"47d07437-56ee-b73a-36e3-138387401cae"
is_byok
false
latency
1283
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771616107-JY5W7XuASkdefa2pA1mV"
upstream_id
"47d07437-56ee-b73a-36e3-138387401cae"
total_cost
0.00792525
cache_discount
0.00167175
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
35.00%
Matches word count
n/a
neededClean
false
words
218
73.72%
Dialogue to Total Word Ratio
Ratio: 82.57%, Deviation: 7.43%
neededClean
false
wordsTotal
218
wordsDialogue
180
54.3590%