NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-12-01KKHZ6QKBECF58E7R76XKRTH0
Run Details
Model:
Grok 4.20 (Beta, Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
45.3s
Cost:
$0.065916
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
97.67%
Dialogue to Total Word Ratio
(details)
48.8349%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill hunched over their notes in a dimly lit garage that served as their secret headquarters. They were planning their next big caper, a lively children's party in a suburban home. The idea had come to Molly in desperation and Bill was slowly coming around despite his reservations. The room was filled with colorful wigs, baggy pants, and oversized shoes for their clown disguises. "Okay Bill, the plan is to break into the children's party at three o'clock when the games are in full swing. We'll disguise ourselves as the hired clowns." Molly said with a straight face despite the absurdity of the statement. Bill looked at her incredulously before responding. "Clowns? I hate clowns. And stealing from kids? This is the lowest we've sunk." Pacing around the table, Molly tried to convince him of the merits. "It's not low it's smart. No one suspects clowns at a kids party. We grab the presents during the cake cutting." Still not entirely convinced, Bill voiced his concerns about potential issues. "What if they recognize us or the parents get suspicious?" With a confident wave of her hand, Molly addressed his worries. "We'll use fake names and accents. You be the silly one and I'll be the quiet helper. The van is ready for the getaway." Shaking his head but accepting the inevitable, Bill gave in with a final warning. "Alright but if we get caught I'm saying it was your idea." Molly pushed forward enthusiastically. "It will be fine. Think of all the expensive toys we can sell." But Bill had one last doubt. "I am thinking of the disappointed faces." Undeterred, Molly pressed on. "Disappointment builds character. Now let's finalize the details for entering through the basement window." The garage echoed with their voices as they continued to map out the operation. They considered the schedule of events at the party, making sure to time their arrival and actions precisely to maximize the confusion and minimize the risk of detection. Detailed diagrams showed the positions of doors, windows, and the all-important gift table. Bill suggested adding some magic tricks to their repertoire to keep the children occupied while Molly snuck away with the goods. They practiced their clown walks and silly voices, turning the planning session into an impromptu comedy show that had them both laughing despite the serious nature of the crime they were plotting. Their years of working together had honed their ability to anticipate each other's thoughts, making the planning process efficient if not entirely harmonious. In the end, after several hours, the plan was as solid as it could be for such an unorthodox job. They packed their bags with the necessary tools and disguises, ready to embark on what they hoped would be a successful and profitable venture.
created_at
"2026-03-12T21:28:23.408Z"
3/12/2026, 9:28:23 PM
model
"x-ai/grok-4.20-beta-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
43633
moderation_latency
(null)
generation_time
45076
tokens_prompt
77
tokens_completion
696
native_tokens_prompt
184
native_tokens_completion
10963
native_tokens_completion_images
0
native_tokens_reasoning
10414
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0659156
router
(null)
provider_responses
0
endpoint_id
"33f49f24-a0de-4dcf-872d-ca514e83fbae"
id
"12750aed-3352-63a9-c5cc-d01e635748d7"
is_byok
false
latency
246
model_permaslug
"x-ai/grok-4.20-beta-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773350903-8LIlMI3luF5F6b2JIiJm"
upstream_id
"12750aed-3352-63a9-c5cc-d01e635748d7"
total_cost
0.0659156
cache_discount
0.0002304
upstream_inference_cost
0
provider_name
"xAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
465
97.67%
Dialogue to Total Word Ratio
Ratio: 30.69%, Deviation: 0.69%
neededClean
false
wordsTotal
466
wordsDialogue
143
48.8349%