NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHSJ3CWDZ7WQASWQE9BP0V5
Run Details
Model:
Grok 4.20 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
1m 59s
Cost:
$0.024866
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
1.04%
Matches word count
(details)
97.93%
Dialogue to Total Word Ratio
(details)
49.4805%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Bill, how do we steal all the presents from this children's party without getting caught?" asked Molly. "Easy, Molly. We dress up as the children's entertainers. I'll be a clown and you can be the pony. While the kids are riding you I'll stuff the gifts in my big clown pants," replied Bill. "First of all I'm not being the pony. Second your clown pants aren't that big. Third we need a better plan than that you idiot," said Molly. "Okay okay. What if we sneak in during musical chairs and when the music stops we grab the tablecloth with all the presents on it like a magic trick and run?" suggested Bill. "That's better but still stupid. Let's pose as delivery people bringing more presents then swap them with empty boxes while they're not looking. No wait I have it. We cut the power and in the chaos we take everything," Molly said. "But Molly it's a day party. There's no power to cut. How about we challenge the parents to a dance off while you load the van?" Bill asked. "Now you're thinking. We break dance our way to the gift pile and then it's ours. This is going to be the greatest comedic heist of all time," Molly exclaimed. "Yeah the cops will be laughing too hard to chase us. Let's do it."
created_at
"2026-05-01T12:52:31.777Z"
5/1/2026, 12:52:31 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
116885
moderation_latency
(null)
generation_time
119014
tokens_prompt
77
tokens_completion
2526
native_tokens_prompt
184
native_tokens_completion
9908
native_tokens_completion_images
(null)
native_tokens_reasoning
9640
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0248656
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"4b5c3e50-0ed3-9d8f-b01e-9fbc7f13cde8"
is_byok
false
latency
180
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777639951-EvYhoZzD3yLySBMpSyEm"
session_id
(null)
api_type
"completions"
id
"gen-1777639951-TXbmemhNXy5giEosm3XX"
upstream_id
"4b5c3e50-0ed3-9d8f-b01e-9fbc7f13cde8"
total_cost
0.0248656
cache_discount
0.0001344
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
1.04%
Matches word count
n/a
neededClean
false
words
226
97.93%
Dialogue to Total Word Ratio
Ratio: 93.81%, Deviation: 3.81%
neededClean
false
wordsTotal
226
wordsDialogue
212
49.4805%