NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR90MP1CGEPHV515A9QGD7J8
Run Details
Model:
Grok 4.3
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
13.8s
Cost:
$0.001618
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Okay, Bill, listen up. We go in dressed as the birthday clowns, grab the gift table, and we’re out before anyone notices the punch is spiked.” “Clowns? Molly, do you even hear yourself? I’m six foot four and allergic to face paint. We’ll look like circus fugitives, not entertainers.” “Exactly. Nobody questions a clown carrying boxes. They just laugh and hand us more loot. Trust me, we’ll blend right in with the balloon animals.” “Blend in? Last time I wore a wig it slid into the nachos and people thought I was the world’s saddest dessert. I’m telling you, we go as caterers. Aprons, trays, and a big ‘Happy Birthday’ banner to hide the getaway van.” “Bill, I’m not risking salmonella poisoning just because you don’t want glitter on your nose. Besides, the clown costumes are already in the trunk. I paid twenty bucks for the squeaky shoes and I’m not wasting them.” “Great, so your investment strategy now hinges on rubber noses. Meanwhile, I’ll be the guy sneezing into the birthday cake. Real professional.” “We’ll split the haul, fifty-fifty, except I keep the video game consoles because you always lose at Mario Kart and it’s embarrassing. Deal?” “I lose because you cheat with those power-ups. Anyway, layout first. Front door’s the main entrance, but there’s a side gate behind the bounce house. I’ll pop it while you cause a distraction—maybe trip over your own giant shoes.” “Perfect. While everyone watches the tall clown face-plant into the cotton candy, I’ll wheel the entire present pile straight out the back. We’ll load up, peel away, and be home before the parents realize the pinata was a decoy.” “Speaking of decoys, we should plant one. Maybe a giant wrapped box that’s actually empty. They’ll spend ten minutes shaking it while we vanish.” “Love it. Label it ‘To: Everyone, From: Santa’s Lazy Cousin.’ Keeps the kids busy and gives us an extra thirty seconds.” “Thirty seconds? That’s a whole minute in heist time. We could fence half the haul at the flea market before lunch.” “Exactly. So no more whining about makeup. You’ll wear the nose, you’ll honk the horn, and you’ll smile like you enjoy children’s birthday songs. Got it?” “Fine. But if one of those little monsters asks me to make balloon animals, I’m outta there. I still have nightmares about the last dachshund I accidentally gave antlers.” “Deal. Now zip up that costume before the zipper fights back again. We leave in ten, and remember: no names, no prints, and absolutely no cake for you. Last time you got frosting on every fingerprint scanner in town.”
created_at
"2026-05-10T13:18:48.372Z"
5/10/2026, 1:18:48 PM
model
"x-ai/grok-4.3-20260430"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
522
moderation_latency
(null)
generation_time
13730
tokens_prompt
77
tokens_completion
707
native_tokens_prompt
182
native_tokens_completion
583
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
service_tier
"default"
usage
0.0016178
router
(null)
provider_responses
0
endpoint_id
"45623cb8-18e5-4b32-8fc4-85d05439982d"
id
"295b23ec-d997-f000-1f57-b3e64599c5af"
is_byok
false
latency
139
model_permaslug
"x-ai/grok-4.3-20260430"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778419128-shbjgRw2FyOpHhhrXbI6"
session_id
(null)
api_type
"completions"
id
"gen-1778419128-vqSf69fVclnVweTHdy90"
upstream_id
"295b23ec-d997-f000-1f57-b3e64599c5af"
total_cost
0.0016178
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
437
0.00%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 50.00%
neededClean
false
wordsTotal
440
wordsDialogue
440
0.0000%