NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHQHRHAXD4NQ1DFSQ76244J
Run Details
Model:
Grok 4.20 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
2m 42s
Cost:
$0.032123
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
1.04%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.5180%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly paced back and forth while Bill sat on the old couch eating a sandwich. "Bill I've got the perfect plan for our next heist. We're going to break into that big children's party on Elm Street and steal all the presents. It's going to be epic." Bill nearly choked on his food. "What? Have you gone mad Molly? A children's party? There will be face painting pony rides and dozens of little monsters running around. How are we supposed to pull that off without getting caught or feeling like total heels?" "It's easy if we do it right. We show up as the replacement magicians. I already made the call pretending to be the party's coordinator and told them the original guy had an emergency. So we arrive with capes and wands and while I perform tricks you start moving the presents to the side door where our van will be waiting," Molly explained. "Moving them how? Levitation? Those presents could be anything from dolls to trampolines. And what about the parents? They will be taking videos and pictures the whole time. One dad with a camera phone and we're viral for all the wrong reasons," Bill said. "We use the classic misdirection. While I'm pulling a rabbit out of my hat the kids and parents will be focused on me. You wear gloves and a fake mustache as part of your magician assistant costume. No one will suspect the guy in the sparkly vest. Plus we'll have a sign that says Presents will be moved for the big surprise reveal so it looks official," Molly replied. "A surprise reveal? That's not bad actually. But what is the surprise? If they ask us we need to have an answer ready or the whole thing falls apart." "The surprise is a celebrity appearance but we won't specify who. That way they stay excited and distracted. By the time they realize the celebrity is not coming we will be long gone with all the loot. We can sell the toys online anonymously and make a fortune," Molly said. "Selling kids toys online might raise some eyebrows but I guess if we do it right it could work. Okay I'm starting to come around to this idea. What about the getaway? If the parents catch on they'll block the driveway with their minivans," Bill asked. "That's why we park on the grass near the back fence. We cut a hole in the fence earlier this week disguised as part of the yard work. We slip through load up and drive away through the neighbor's yard who is also at the party. It's foolproof," Molly said with confidence. "Foolproof or fool hardy? This has all the makings of one of your crazy schemes that somehow works. Alright I'm in. But we need code words. If something goes wrong you say the rabbit is loose and we abort," Bill said. "The rabbit is loose. Got it. This is going to be the best or the funniest failure ever. Let's practice our magic routine so we look legit," Molly said. They spent hours rehearsing their roles for the children's party heist.
created_at
"2026-05-01T12:17:23.505Z"
5/1/2026, 12:17:23 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
156345
moderation_latency
(null)
generation_time
161631
tokens_prompt
77
tokens_completion
4116
native_tokens_prompt
184
native_tokens_completion
12784
native_tokens_completion_images
(null)
native_tokens_reasoning
12177
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0321228
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"6612c275-ad6c-946f-b9b5-835d31b7ff2d"
is_byok
false
latency
101
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777637843-ehX9CVXPlqA9xUnpnfno"
session_id
(null)
api_type
"completions"
id
"gen-1777637843-eZN7A1W5v1xoQOBb4eJa"
upstream_id
"6612c275-ad6c-946f-b9b5-835d31b7ff2d"
total_cost
0.0321228
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
1.04%
Matches word count
n/a
neededClean
false
words
526
0.00%
Dialogue to Total Word Ratio
Ratio: 90.49%, Deviation: 20.49%
neededClean
false
wordsTotal
526
wordsDialogue
476
0.5180%