NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHQTM2J8XETMRZNDNAKZ7N1
Run Details
Model:
Grok 4.20 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
2m 37s
Cost:
$0.032615
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
78.93%
Dialogue to Total Word Ratio
(details)
39.4637%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Huddled in their dimly lit apartment, Molly and Bill were deep in discussion about their latest outrageous scheme. The table was covered with notes and drawings of the party layout. Molly had doubts but Bill was full of confidence. "Bill, this plan is crazy even for you. Breaking into a children's party to steal all the presents? What are we, the Grinch?" Molly said while looking at the blueprint of the house. Bill was undeterred. "The Grinch had the right idea, Molly. Those presents are worth a fortune if we sell them quick. No one will suspect us if we go in disguised as magicians," Bill replied. Molly paced around the room. "Magicians? So now we're pulling rabbits and the presents out of hats? How do you expect to get away with all that stuff? The bags will be huge?" Molly said. With a big smile, Bill answered. "We'll use the old distraction tactic. You do a magic show that goes hilariously wrong to keep the kids and parents laughing and occupied while I load the van out back. It's brilliant," Bill said. She stopped pacing. "Brilliant? It's idiotic. What if I can't make them laugh? What if they call the cops on the bad magician?" Molly asked. He waved his hands. "Then you improvise. Tell jokes about how the rabbit stole the presents. Kids love that stuff. Meanwhile, I'll be the silent partner making off with the goods. We meet at the van in fifteen minutes flat," Bill explained. The tension in the room was palpable. "Fifteen minutes is not a lot of time to entertain twenty kids, Bill. And what about the birthday cake? They always do that before presents?" Molly pointed out. Bill remained optimistic. "Perfect timing then. While they're all singing happy birthday, I start the heist. You keep them singing extra verses to buy me time," Bill suggested. Molly shook her head in disbelief. "Extra verses? Like happy birthday to you, you live in a zoo or something? This is going to be a disaster," Molly said, shaking her head. Bill chuckled. "Disaster for them, payday for us. Think of it as redistributing wealth from spoiled kids to two hardworking thieves," Bill said with a chuckle. She was not convinced easily. "Hardworking? We sit in this van all day planning dumb heists. But okay, let's say it works. How do we fence a bunch of kids toys without raising suspicion?" Molly inquired. His response was quick. "Online, anonymous. Or at flea markets far away. No one will connect it to one missing party. It's perfect," Bill assured her. Remembering past failures, Molly said. "Nothing is perfect in this business. Remember the last time we tried something like this at the company picnic? We barely escaped," Molly reminded him. Bill tried to reassure her. "This is different. No security guards at a kids party. Just bored dads and soccer moms. We can handle that," Bill said. With a sigh, Molly concluded. "I hope you're right. Because if we get recognized, our faces will be all over the evening news as the worst thieves ever," Molly said. The pair continued to fine tune their comedic heist plan for the children's party, hoping for the best but prepared for the worst in their criminal endeavor.
created_at
"2026-05-01T12:22:13.847Z"
5/1/2026, 12:22:13 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
152725
moderation_latency
(null)
generation_time
157118
tokens_prompt
77
tokens_completion
4216
native_tokens_prompt
184
native_tokens_completion
12981
native_tokens_completion_images
(null)
native_tokens_reasoning
12301
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0326153
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"c22bcbf1-c952-c63d-cd29-5c128ca4c65b"
is_byok
false
latency
87
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777638133-aY3j438wSm6mb0FpjV37"
session_id
(null)
api_type
"completions"
id
"gen-1777638133-8uXPj9jKl1wxYXuwjqbY"
upstream_id
"c22bcbf1-c952-c63d-cd29-5c128ca4c65b"
total_cost
0.0326153
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
547
78.93%
Dialogue to Total Word Ratio
Ratio: 67.82%, Deviation: 2.18%
neededClean
false
wordsTotal
547
wordsDialogue
371
39.4637%