NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHYAHW0C8HY3THPG103GGK1B
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
18.4s
Cost:
$0.008234
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
86.38%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
43.1902%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In the flickering glow of a single bulb in their cramped motel room, Molly paced like a caged panther, her wild red hair a tangled mess. Bill slouched on the sagging bed, munching on a stale donut, his greasy fingers smudging the blueprints of the suburban McMansion. "This is gold, Bill," Molly hissed, jabbing at the map. "Kids' birthday bash at noon. Presents piled like a pirate's treasure. We slip in as clowns—balloons, face paint, the works." Bill snorted, crumbs flying. "Clowns? Last time I dressed as one, that Chihuahua nearly took my leg off. And what about the parents? Soccer moms with tasers?" Molly grinned, her eyes sparkling with mischief. "That's the beauty! We'll distract 'em with a fake magic show. You juggle, I swipe the gifts into the getaway van parked as the ice cream truck." Bill scratched his beard. "Juggling? I can barely tie my shoes. This better not end with us in juvie jail." "Trust me, it's foolproof," Molly said, rolling up the plans. "By evening, we'll be swimming in stolen Barbies and video games." Outside, a siren wailed distantly, but they laughed it off, plotting their absurd caper amid pizza boxes and dreams of easy riches. The party was tomorrow—time to inflate those balloons.
created_at
"2026-02-20T20:06:12.246Z"
2/20/2026, 8:06:12 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
6925
moderation_latency
(null)
generation_time
18317
tokens_prompt
77
tokens_completion
319
native_tokens_prompt
744
native_tokens_completion
505
native_tokens_completion_images
0
native_tokens_reasoning
219
native_tokens_cached
699
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00823425
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"0d6346a6-852c-accb-0122-f14d89434a9e"
is_byok
false
latency
79
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771617972-fO9kpiJzmGLanWOOxhrY"
upstream_id
"0d6346a6-852c-accb-0122-f14d89434a9e"
total_cost
0.00823425
cache_discount
0.00157275
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
86.38%
Matches word count
n/a
neededClean
false
words
211
0.00%
Dialogue to Total Word Ratio
Ratio: 50.24%, Deviation: 40.24%
neededClean
false
wordsTotal
211
wordsDialogue
106
43.1902%