NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY83P48KDH1ZM0N3J0GW9HG
Run Details
Model:
Grok 4
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
18.1s
Cost:
$0.008018
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
99.98%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.9920%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly leaned over the blueprint of the suburban backyard, her eyes gleaming with mischief. "Alright, Bill, listen up. The party's at noon tomorrow—balloons, cake, a horde of screaming kids. Our target: that mountain of presents under the piñata tree." Bill scratched his head, peering at the scribbled map. "Molly, you sure about this? Breaking into a kiddie bash? What if they got clowns? Clowns freak me out." She rolled her eyes. "Clowns are the least of our worries. We'll go in disguised as birthday entertainers. You be the magician—pull rabbits out of hats to distract 'em. I'll swipe the gifts while they're oohing and aahing." "Rabbits? I can't even pull off a card trick! Last time I tried magic, I set the deck on fire," Bill groaned, waving his singed eyebrows for emphasis. "That's why we're pros, dummy. Improvise! I'll rig a smoke bomb for your grand finale. Poof—we vanish with the loot. Imagine the haul: action figures, dolls, maybe even a pony if we're lucky." Bill chuckled nervously. "A pony? In a bag? You're nuts. Fine, but if a toddler spots us, I'm blaming you." Molly grinned. "Deal. Now, practice your abracadabra. This heist's gonna be child's play."
created_at
"2026-02-20T19:23:30.328Z"
2/20/2026, 7:23:30 PM
model
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
7674
moderation_latency
(null)
generation_time
17979
tokens_prompt
77
tokens_completion
308
native_tokens_prompt
744
native_tokens_completion
488
native_tokens_completion_images
0
native_tokens_reasoning
206
native_tokens_cached
682
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0080175
router
(null)
provider_responses
0
endpoint_id
"d40bef9d-6936-4c57-9749-bf7399de4a77"
id
"b58470aa-8a6c-0910-18ec-42eca6eccc81"
is_byok
false
latency
544
model_permaslug
"x-ai/grok-4-07-09"
4/7/2009, 12:00:00 AM
provider_name
"xAI"
status
200
api_type
"completions"
id
"gen-1771615410-HYuj1FuafPhJmeUv6HtB"
upstream_id
"b58470aa-8a6c-0910-18ec-42eca6eccc81"
total_cost
0.0080175
cache_discount
0.0015345
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.98%
Matches word count
n/a
neededClean
false
words
202
0.00%
Dialogue to Total Word Ratio
Ratio: 80.20%, Deviation: 30.20%
neededClean
false
wordsTotal
202
wordsDialogue
162
49.9920%