NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR8WAV92K62NDGVMW2F4QHXG
Run Details
Model:
Grok 4.3
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
12.2s
Cost:
$0.001348
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
63.29%
Dialogue to Total Word Ratio
(details)
31.6468%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"C’mon, Bill, the circus theme means everything’s in one room—they pin it all near the clown booth for easy cake access," hissed Molly, crouched behind a stack of plastic chairs in the community-center hallway. Bill tugged his black balaclava half-on and half-off, squinting through the eye holes. "Yeah, and the birthday boy’s mom is five sangria deep already. She’ll never notice if we stroll in wearing party hats." Molly flicked a glittery cone at him. "Put that on your head, not your toe. We need the hats for cover; everyone’s wearing them." Bill jammed it backward so the tassel tickled his ear. "And the clown—how do we get past the clown? That guy’s six-three and armed with balloon animals." Molly grinned. "We tell him we’re his understudies. He’ll thank us. Then we slip behind the gift mountain while he’s twisting a poodle." "What if he recognizes our voices?" Bill asked, voice pitching higher under nerves. "We talk like pirates the whole time. Arr, matey, excellent balloon work," Molly demonstrated, raising an imaginary cutlass. Bill snorted. "Pirate clowns. Fine. And when the kids swarm the presents?" "Distraction protocol. You belly-flop into the bouncy house shouting you forgot your birthday too." "Molly, I could rupture something vital." "Worth it for the Gucci tech boxes. You saw the registry scroll last week—Cruise control watches, headphones still in shrink-wrap. We clear half the haul in six minutes flat." Bill tapped the floor plan sketched on a napkin. "South entrance leads straight to the pile. We exit through the kitchen once Seńor Piñata bursts." Molly checked the clock on the wall. "Sugar crash hits in twenty. Parents start hunting cups and car keys; we vanish while they’re arguing who hid the goodie bags." Bill lowered his voice. "And the security cam?" She patted a tiny silver cylinder. "Loop of yesterday’s cleaning crew. Ten minutes of vacuuming glory on repeat." Bill nodded, then frowned under the crooked hat. "One last thing—what if the birthday boy catches us?" Molly shrugged. "Tell him we’re the Tooth Fairy’s repo squad. He’ll scream, mom will film it for TikTok, and we’re gone before the algorithm finishes buffering."
created_at
"2026-05-10T12:03:31.753Z"
5/10/2026, 12:03:31 PM
model
"x-ai/grok-4.3-20260430"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
457
moderation_latency
(null)
generation_time
12134
tokens_prompt
77
tokens_completion
585
native_tokens_prompt
182
native_tokens_completion
502
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
service_tier
"default"
usage
0.0013481
router
(null)
provider_responses
0
endpoint_id
"45623cb8-18e5-4b32-8fc4-85d05439982d"
id
"325ab8de-ad6c-90b7-97e2-5d69b873e3e0"
is_byok
false
latency
105
model_permaslug
"x-ai/grok-4.3-20260430"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778414611-cfPsvxL7vqcgvxTnXNGx"
session_id
(null)
api_type
"completions"
id
"gen-1778414611-Me7rDkF8qljAEFfFeCgu"
upstream_id
"325ab8de-ad6c-90b7-97e2-5d69b873e3e0"
total_cost
0.0013481
cache_discount
0.0001344
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
361
63.29%
Dialogue to Total Word Ratio
Ratio: 73.02%, Deviation: 3.02%
neededClean
false
wordsTotal
367
wordsDialogue
268
31.6468%