NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR90WC5YJNDXE88E8T4YYA0J
Run Details
Model:
Grok 4.3
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
9.5s
Cost:
$0.001758
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
6.09%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
3.0453%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly adjusted her black ski mask in the dim light of the getaway van parked three blocks from the Applewood Community Center, where the noise of squealing kids echoed through the open windows. Bill fiddled with the lockpick set on his lap, grumbling as he checked the time on his phone. “Thirty presents minimum if the rumors are right,” said Molly, tapping her fingers against the dashboard. “Birthday boy’s loaded. Parents booked the magician and the bounce house and everything.” Bill sighed and slid the picks into his jacket pocket. “Yeah, and you’re forgetting the parents parked right out front with their backup cameras. How do we even get to the gift table without bumping into someone’s mom holding a cupcake tray?” Molly grinned and pulled out a crumpled map drawn in crayon from her bag. “We blend. I got us matching clown costumes from the thrift store bin. Big shoes, rainbow wigs, the works. Everyone’s in costume already. We slip in through the side door by the kitchen, pretend we’re delivering more balloons, and stash the loot in the laundry cart by the coat closet.” Bill shook his head and started the engine just enough to test the battery. “Clowns? That’s your master plan? Kids hate clowns. One scream and we’ve got every parent chasing us out with plastic forks. Plus I still don’t like that you insisted we paint our faces white. My allergies are going to give us away before we even touch a present.” Molly reached over and snatched the ignition key away from him. “Stop being dramatic. The white face is to hide our real faces from the security cams inside. We’ll wash it off in the bathroom after we load the cart. And stop worrying about the kids—half of them will be too hopped up on sugar to notice two extra clowns. We hit the table during the magician’s sword-swallowing bit when everyone’s facing the stage. Grab the big wrapped boxes first, ignore the tiny ones that are probably socks.” Bill leaned back and rubbed his forehead, watching a stream of balloons bobbing past the windshield as another family arrived. “Fine, but if one of those toddlers recognizes my voice from the park incident last month I’m leaving you to deal with the parents. And I’m not carrying anything heavier than two boxes. My sciatica’s back.” Molly chuckled and tucked the map back into her pocket. “Sciatica, my eye. You carried an entire safe last year. You’ll manage. We hit the table in five, dump the cart in the van in ten, and we’re gone before anyone notices the missing gifts. I already scoped the alarm—there isn’t one at the side door. Easy in, easy out.” Bill snorted and checked the clown noses they’d glued to the passenger seat. “This whole thing is ridiculous. We’re professional thieves, not children’s entertainers. Next time we pick a jewelry store or something with less glitter and more cash.” Molly popped the lockpick case open again and handed him one. “Less complaining, more focus. Ready?” Bill grabbed it with a resigned sigh and started the van again. “Ready, clown.”
created_at
"2026-05-10T13:23:00.421Z"
5/10/2026, 1:23:00 PM
model
"x-ai/grok-4.3-20260430"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
570
moderation_latency
(null)
generation_time
9422
tokens_prompt
77
tokens_completion
804
native_tokens_prompt
182
native_tokens_completion
639
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
service_tier
"default"
usage
0.0017578
router
(null)
provider_responses
0
endpoint_id
"45623cb8-18e5-4b32-8fc4-85d05439982d"
id
"5f3011d5-33e6-92e5-87d9-70f8daf17c6a"
is_byok
false
latency
91
model_permaslug
"x-ai/grok-4.3-20260430"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778419380-OjFr9f30k7B2lPUn211N"
session_id
(null)
api_type
"completions"
id
"gen-1778419380-FY19jtUW2Ob4iZJucxiI"
upstream_id
"5f3011d5-33e6-92e5-87d9-70f8daf17c6a"
total_cost
0.0017578
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
6.09%
Matches word count
n/a
neededClean
false
words
523
0.00%
Dialogue to Total Word Ratio
Ratio: 66.79%, Deviation: 36.79%
neededClean
false
wordsTotal
524
wordsDialogue
350
3.0453%