NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-01-01KQHQGFQ4X62K7W16N5FZN9A4
Run Details
Model:
Grok 4.20 (Reasoning)
Test:
Dialogue tags
Scenario:
Write unattributed dialogue
Duration:
37.2s
Cost:
$0.007094
Provider:
xAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Count dialogue tags
(details)
Model input
User/Human
You are an expert novel writer in the cozy mystery genre. Write a scene of about 200 words, with dialogue between two characters. The dialogue should be unattributed, meaning the speaker is not explicitly mentioned. The scene should be set in a small town bakery. Only return the story, nothing else.
Model response
Messages
Meta
AI
The small town bakery welcomed the day with its inviting aroma of cinnamon buns and apple pies cooling on the racks. The morning crowd had thinned, leaving two familiar figures deep in conversation at a small table by the window. The wooden floors creaked softly underfoot, and lace curtains filtered the sunlight into a homey glow. "Did you notice anything odd about the delivery van yesterday?" "The one that parked in the alley for over an hour? How could I miss it? The driver never came inside like usual." "I peeked out back after it left. There was a crate by the door with no label." "What was inside? More supplies?" "No. Old documents from town hall—financial records, some pages highlighted with suspicious transactions." "This ties back to the missing harvest festival funds, doesn't it?" "Precisely. And the person responsible has access to both the bakery and the municipal offices." "Are you thinking who I'm thinking?" "The new assistant. He's been volunteering for every extra shift, always asking questions about the secret recipes." "We should follow him after closing. Carefully." "Agreed. But first, try one of these eclairs. Solving mysteries always goes better with chocolate."
created_at
"2026-05-01T12:16:41.705Z"
5/1/2026, 12:16:41 PM
model
"x-ai/grok-4.20-20260309"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
35021
moderation_latency
(null)
generation_time
37009
tokens_prompt
81
tokens_completion
1207
native_tokens_prompt
185
native_tokens_completion
2799
native_tokens_completion_images
(null)
native_tokens_reasoning
2565
native_tokens_cached
128
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00709435
router
(null)
provider_responses
0
endpoint_id
"3ecee37f-b217-4093-87fb-aaf0afe307af"
id
"a5de30f5-4b86-3ebf-f9fd-fa7d04d84d25"
is_byok
false
latency
91
model_permaslug
"x-ai/grok-4.20-20260309"
provider_name
"xAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777637801-CNiAncsgQtece3tUAAMO"
session_id
(null)
api_type
"completions"
id
"gen-1777637801-6dFHFDwbaapbov6hMlKF"
upstream_id
"a5de30f5-4b86-3ebf-f9fd-fa7d04d84d25"
total_cost
0.00709435
cache_discount
0.0001344
upstream_inference_cost
0
provider_name
"xAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Count dialogue tags
n/a
neededClean
false
dialogueTags
(empty)