NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR8XHXM7ARJ1NFAD7BMYGJS2
Run Details
Model:
Grok 4.3
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
9.3s
Cost:
$0.001393
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"So here's the plan," Bill said, leaning over the crumpled map of the party house. "The kids get their presents at noon. If we sneak in as the clown and magician, no one blinks twice." Molly crossed her arms. "Sure, Bill, a clown and a magician. "Bill, you can’t juggle. You drop a bowling pin and the parents call the cops." Bill grinned. "Details, darling. We show up at ten, before the guests. You wear the glittery coat, I’ll bring balloons. While everyone’s squealing over cake, we load the gifts into the flower van outside." Molly rolled her eyes. "Right. The flower van. Because nothing says sneaky like parking a giant bouquet with a getaway driver who smells like lilies." She tapped the address. "What happens after we stuff fifty dolls and action figures into plastic sacks? We still gotta cross the yard without tripping over toddlers." Bill shrugged. "We created a diversion. I told you the magician’s rabbit escapes every time. Kids chase fur, grown-ups chase kids. That window of five minutes is ours." Molly frowned. "And the alarm on the back door? The one that beeps if you open it too fast?" Bill pulled a small magnet from his pocket. "Stick this on the latch, no beep. I tested it on our kitchen fridge." Molly snorted. "Our fridge doesn’t guard fifty wrapped PlayStations. And what if the parents hired real security? Last time you said ‘no one will notice’ we ended up sharing a cell with a juggler who only wanted to talk card tricks." Bill waved the excuse away. "Minor setback. This house has a playroom the size of a small barn. Strewn gifts on a table, a few streamers, and zero cameras according to my cousin." Molly leaned closer, voice low. "Cousin who owes you fifty bucks for that poker scam?" Bill chuckled. "Exactly. He gave us floor plans and cake recipe. Bonus intel." Molly stared at the map again. "Fine, balloons and glitter. But you’re the magician at heart, remember? If anything goes south, you’re performing for real until the coast is clear." Bill raised an imaginary wand. "Abracadabra, presents disappear." Molly sighed. "Just keep the rabbit in the hat until we’re outside."
created_at
"2026-05-10T12:24:52.111Z"
5/10/2026, 12:24:52 PM
model
"x-ai/grok-4.3-20260430"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
427
moderation_latency
(null)
generation_time
9211
tokens_prompt
77
tokens_completion
568
native_tokens_prompt
182
native_tokens_completion
493
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
service_tier
"default"
usage
0.0013928
router
(null)
provider_responses
0
endpoint_id
"45623cb8-18e5-4b32-8fc4-85d05439982d"
id
"5a9923b1-5bae-966c-ae2c-a326efe63e0b"
is_byok
false
latency
132
model_permaslug
"x-ai/grok-4.3-20260430"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778415892-s1OAFA9QsI6yHTR29PhJ"
session_id
(null)
api_type
"completions"
id
"gen-1778415892-dn42Cwom32u945U4m8hF"
upstream_id
"5a9923b1-5bae-966c-ae2c-a326efe63e0b"
total_cost
0.0013928
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
368
0.00%
Dialogue to Total Word Ratio
Ratio: 33.88%, Deviation: 36.12%
neededClean
false
wordsTotal
369
wordsDialogue
125
0.0000%