NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR8ZH50J8WQY8FGBCH9NJZCX
Run Details
Model:
Grok 4.3
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
7.6s
Cost:
$0.001806
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
6.09%
Matches word count
(details)
0.08%
Dialogue to Total Word Ratio
(details)
3.0864%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly eyed Bill across the stained kitchen table, their walkie-talkies and a crumpled map of the block between them. “All right, one last time. We hit the Johnson house at noon sharp. Half the kids are still opening gifts, the parents are busy with cake, and the back gate is propped open with a garden gnome. That’s our window.” “Window’s a window, but presents have eyes,” Bill muttered, twirling a screwdriver like a baton. “How many boxes you figure? Twenty? Thirty? We’re not taking them one by one.” “Take the lot in those black roller bins from the garage. The caterers leave them behind every year. Roll ‘em straight through the playroom like we’re cleaning up broken toys.” Bill snorted. “Cleaning up. Right. And when little Tommy starts screaming his spaceman set is missing, what’s our exit line?” Molly tapped the map with a fingerprint-smeared pen. “You’re the distraction guy. You knock on the front door holding a busted balloon and say you’re the replacement clown. Everyone panics about the schedule, you slip around back while I’m already loading bins.” “You sure you can handle the weight? Those rocket-ship wrapping jobs weigh more than you.” Molly rolled her eyes. “I bench-pressed the safe from the corner store last month. I’ll manage. You just keep those moms talking about party favors and don’t eat any cake. Last time you froze mid-heist because someone offered you sprinkles.” “That was one time,” Bill protested, cheeks coloring. “And the frosting looked suspicious. Coulda been laced.” “We’re stealing Lego, not poisons. Focus. Once the bins hit the van, we circle the block once then ditch the plates. Cops expect amateurs, not two professionals timing a birthday party.” Bill leaned forward, lowering his voice like the walls might snitch. “What about the dog? Last year Mrs. Johnson brought a corgi that bites ankles on sight.” “Already handled. I left a trail of bacon on the opposite fence. Dog’ll be occupied chasing breakfast while we chase loot. No barking, no backup.” Bill’s knee bounced. “And the alarm on the garage?” “Motion sensor only. We freeze when the red light blinks, then move again. Takes thirty seconds to reset. We’ve got the timing down to the second.” Molly slid the map toward him. “Mark the spots you’ll hit first. The big shiny boxes with the bow, they go top of the pile. Anything that rattles gets left—sound carries through wrapping paper.” Bill grabbed the pen, drawing quick X’s. “Done. And if someone spots us?” Molly grinned, teeth bright. “We’re the cleaners. Just smile, nod, and keep rolling those bins out the gate. Worst case we claim the presents are for the school raffle. You got the fake flyer in your back pocket?” Bill patted his jeans. “Printed it myself. ‘Annual Toy Drive.’ Looks legit.” “Perfect. We go in separate, we leave together loaded. Ten minutes tops. Kids cry, parents post on group chats, we fence the haul by dinner.” Bill lifted his coffee mug like a toast. “To birthday greed and zero witnesses.” Molly clinked her own against it. “To being the only grown-ups who actually loot a four-year-old’s loot.”
created_at
"2026-05-10T12:59:24.057Z"
5/10/2026, 12:59:24 PM
model
"x-ai/grok-4.3-20260430"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
393
moderation_latency
(null)
generation_time
7254
tokens_prompt
77
tokens_completion
857
native_tokens_prompt
182
native_tokens_completion
685
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
service_tier
"default"
usage
0.0018056
router
(null)
provider_responses
0
endpoint_id
"45623cb8-18e5-4b32-8fc4-85d05439982d"
id
"df4bba4f-d803-9363-3de0-a80c437d88f6"
is_byok
false
latency
131
model_permaslug
"x-ai/grok-4.3-20260430"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778417964-7ZqK0bHjnEqlWPY5Yd4V"
session_id
(null)
api_type
"completions"
id
"gen-1778417964-GfOUMjZRkzw5JQ1L7U67"
upstream_id
"df4bba4f-d803-9363-3de0-a80c437d88f6"
total_cost
0.0018056
cache_discount
0.0001344
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
6.09%
Matches word count
n/a
neededClean
false
words
523
0.08%
Dialogue to Total Word Ratio
Ratio: 81.92%, Deviation: 11.92%
neededClean
false
wordsTotal
531
wordsDialogue
435
3.0864%