NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZS6DC0WBK9AB54CN06D58
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
32.3s
Cost:
$0.046770
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
43.38%
Matches word count
(details)
54.22%
Dialogue to Total Word Ratio
(details)
48.7987%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In a small dimly lit apartment, Molly and Bill sat at a wobbly table examining a flyer for the upcoming children's birthday party. The event promised games, cake, and a huge pile of presents that had caught the thieves' attention weeks ago. The duo had spent days casing the joint from a distance, noting the schedule and layout of the large backyard filled with inflatable structures and colorful decorations. Bill traced a finger along their escape route on the paper while Molly organized their makeshift costumes in a pile of bright fabrics and fake noses. The whole idea had started as a joke but now it was becoming a real operation, one that could either make them legends in the criminal underworld or laughing stocks if it went wrong. "We strike when the kids open the presents," Bill said. "That's our window." Molly frowned deeply. "But how do we get past the parents without raising suspicion?" "Simple. We pretend to be hired photographers," he replied. "No one questions the camera guy." She shook her head. "Photographers need equipment we don't have. Let's go as clowns instead." "Clowns freak me out," Bill admitted. "But if it works I'm game. Just don't make me do the horn." They discussed the plan at length, considering every possible hiccup in their comedic caper. Molly worried about the emotional aspect while Bill focused on the loot. The apartment was filled with the smell of old takeout and the sound of their voices bouncing off the walls. They had done many heists before but none quite as unusual as this. Stealing from adults was one thing but targeting a children's celebration added a layer of absurdity that made them both chuckle nervously from time to time. Molly adjusted a multicolored wig in her hands, imagining the scene unfolding. "What if the birthday boy cries?" Molly asked. "I don't want to be the reason for ruined parties." "Kids cry over spilled milk. We'll be long gone before anyone notices," Bill assured her. "Still, let's make it quick. In and out in five minutes," she suggested. "Five minutes? Make it three. Grab and run," he countered. "Remember to wear the big shoes for authenticity," Molly added. "Big shoes will slow us down," Bill said. As they continued their preparations, the two thieves double checked their tools which consisted of large sacks, quick change outfits, and a map marked with critical times. The comedy of the situation was not lost on them, as they pictured themselves in oversized shoes tripping over balloons while trying to make a getaway. Despite the humorous undertones, they knew they had to be careful not to underestimate the vigilance of protective parents or the unpredictable nature of excited children. The plan was set and the anticipation built as they readied themselves for what promised to be a memorable afternoon of thievery at the most innocent of events.
created_at
"2026-03-12T21:38:28.401Z"
3/12/2026, 9:38:28 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
30614
moderation_latency
(null)
generation_time
32099
tokens_prompt
77
tokens_completion
738
native_tokens_prompt
184
native_tokens_completion
7772
native_tokens_completion_images
0
native_tokens_reasoning
7200
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0467696
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"e6f64946-0d63-9fcf-5f62-295af1879b9f"
is_byok
false
latency
351
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773351508-bLxLLJSq6Hb6HgqLWjzb"
upstream_id
"e6f64946-0d63-9fcf-5f62-295af1879b9f"
total_cost
0.0467696
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
43.38%
Matches word count
n/a
neededClean
false
words
483
54.22%
Dialogue to Total Word Ratio
Ratio: 26.50%, Deviation: 3.50%
neededClean
false
wordsTotal
483
wordsDialogue
128
48.7987%