NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHR1ZJ6H0SD3B6G8J98SHTQ
Run Details
Model:
Grok 4.3 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
2m 30s
Cost:
$0.044125
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
86.38%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
43.1902%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill sat in the van across from the party, whispering their plan. "Bill, we are going to steal all the presents from this children's party by creating the ultimate distraction," Molly said. "When the clown brings out the new piñata for the kids to hit with sticks, we will slip inside the back and load up. The piñata will keep everyone's attention for minutes while we grab the goods from the gift table and the floor." Bill replied "The piñata is a great idea Molly. The kids will be swinging like crazy and the parents will be cheering them on. That gives us the perfect window. But how do we avoid the dad who might be standing guard near the gifts." "We avoid him by having you spill some punch on his shoes," Molly suggested. "You can say it was a clumsy accident and offer to clean it up. While you distract him, I will sneak the presents into the sacks hidden behind the curtain. The sacks are black so they blend in. Then we go back out the same door and load the van." Bill agreed "Spilling the punch will definitely work. I will make it look like I tripped over a toy. Then I can offer him a new cup and chat about how fun the party is. That should give you three minutes to grab as many as you can. We need to focus on the light ones like watches and video game cases. Leave the bulky ones like the big stuffed bears because they are hard to carry." "That is the plan Bill," Molly continued. "We must move with speed and stealth like we did on the last shopping mall job. After we have the bags, we close the door quietly and run to the van. You drive and I count the number of presents we got. I think we will get at least twenty. Then at the hideout we can open them and see what we have for sale. This will be a great score." Bill laughed "Opening them at the hideout will be hilarious Molly. Can you imagine the looks on the kids faces when they find out there are no presents left? They will be disappointed but the clown will cheer them up. I love that this heist has a touch of comedy unlike our usual serious jobs where we worry about cameras and alarms. Here the only alarms are the kids screaming when balloons pop." "Yes, the only worry is if a kid sees us taking the presents," Molly added. "But we can give them a balloon animal or candy to keep them quiet. That trick always works with kids. Now are you ready to start?" "Ready as ever," Bill answered. "Let's get our sacks and go in there to do this heist so we end up with all the presents." They grabbed their equipment from the back of the van and walked towards the party entrance with the intention of stealing every last gift in the most comedic way possible.
created_at
"2026-05-01T12:26:14.991Z"
5/1/2026, 12:26:14 PM
model
"x-ai/grok-4.3-20260430"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
147489
moderation_latency
(null)
generation_time
150144
tokens_prompt
77
tokens_completion
5297
native_tokens_prompt
184
native_tokens_completion
17585
native_tokens_completion_images
(null)
native_tokens_reasoning
17001
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0441253
router
(null)
provider_responses
0
endpoint_id
"45623cb8-18e5-4b32-8fc4-85d05439982d"
id
"9a6343ca-ebff-96d3-ad61-c5e3ac0b5b6d"
is_byok
false
latency
91
model_permaslug
"x-ai/grok-4.3-20260430"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777638374-9nUM4jTjTEcGF0Ibo3AX"
session_id
(null)
api_type
"completions"
id
"gen-1777638374-XmhxaFnyfqxarZMKCVVF"
upstream_id
"9a6343ca-ebff-96d3-ad61-c5e3ac0b5b6d"
total_cost
0.0441253
cache_discount
0.0000672
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
86.38%
Matches word count
n/a
neededClean
false
words
511
0.00%
Dialogue to Total Word Ratio
Ratio: 88.26%, Deviation: 18.26%
neededClean
false
wordsTotal
511
wordsDialogue
451
43.1902%