NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY8KK1TXCW4NYC32JJ0GMXD
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
21.8s
Cost:
$0.009025
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
2.01%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
51.0048%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly leaned against the dingy warehouse wall, eyeing Bill with a smirk. "Alright, genius, spill it. How do we crash this kiddie bash and snag every last present without ending up as piñata fodder?" Bill paced, waving a crumpled blueprint. "Easy, Mol. It's Princess Penelope's eighth birthday blowout at the mayor's mansion. Clowns, balloons, the works. We go in disguised as entertainers. You as the balloon-twisting auntie, me as the bumbling magician. 'Abracadabra, watch the gifts disappear!'" Molly snorted. "Bumbling? That's not a stretch for you. But seriously, Bill, those parents are vultures. One wrong move and we're explaining to cops why we're hauling Barbies in a sack." "Relax! I've got the layout. Service entrance at the back—cake delivery van pulls up at 2 PM. We hitch a ride, slip in during the chaos. Kids screaming, music blaring. Grab the gift table while they're doing the Hokey Pokey. 'What's that you put in? Your right foot? Nah, our getaway!'" She laughed. "You're insane. What if there's security? Or a nosy nanny?" "Security's a rent-a-cop dozing off. Nanny? Distract her with your 'magic' balloon animals. 'Look, a poodle! Now it's gone—poof!' Like the presents." Molly grinned. "Fine, but if we get caught, I'm blaming your lousy tricks. Deal?" "Deal. This'll be the heist of the century—or at least the playground."
created_at
"2026-02-20T19:32:11.459Z"
2/20/2026, 7:32:11 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
7270
moderation_latency
(null)
generation_time
21687
tokens_prompt
77
tokens_completion
353
native_tokens_prompt
744
native_tokens_completion
555
native_tokens_completion_images
0
native_tokens_reasoning
224
native_tokens_cached
681
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00902475
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"e1d988f2-7d36-e969-e595-0bb43fe5f9cd"
is_byok
false
latency
747
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771615931-NhKzvN56Ms3MUxTInQse"
upstream_id
"e1d988f2-7d36-e969-e595-0bb43fe5f9cd"
total_cost
0.00902475
cache_discount
0.00153225
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
2.01%
Matches word count
n/a
neededClean
false
words
225
100.00%
Dialogue to Total Word Ratio
Ratio: 89.33%, Deviation: 0.67%
neededClean
false
wordsTotal
225
wordsDialogue
201
51.0048%