NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR93DHS53NYBAXH4KG1M8GAW
Run Details
Model:
Gemma 4 26B (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
2m 33s
Cost:
$0.002612
Provider:
Cloudflare
Evaluation summary
Score
Evaluator
Details Link
43.38%
Matches word count
(details)
80.84%
Dialogue to Total Word Ratio
(details)
62.1075%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Listen, Bill," Molly whispered. "The target is the Miller residence. Three o'clock sharp. Theme: dinosaurs." "Dinosaurs? Is that a code for something? Like, high-stakes prehistoric fossils?" Bill asked. "No, Bill. It's a six-year-old's birthday party. We move during the juice box lull." "The juice box lull? That sounds tactical. Highly professional. I love a good lull." "We hit the pile under the decorated tree. We grab the electronics, the Lego sets, anything with high resale value. Avoid the glitter. It's a tactical nightmare. It gets everywhere. It sticks to the soul, Bill." "What about the bouncy castle?" Bill squinted. "Can we use it as a high-velocity extraction point?" "No, Bill! We're not bouncing out of there. We slip in through the sliding glass door, bag the loot, and vanish before the cake is cut. It has to be clean." "And if the birthday boy sees us?" "He's five, Bill. He'll think we're the entertainment. Just keep your mask on and for the love of god, do not eat the cupcakes. We are professionals." "Right. Professionals. No cupcakes. Got it. I am a professional."
created_at
"2026-05-10T14:07:20.366Z"
5/10/2026, 2:07:20 PM
model
"google/gemma-4-26b-a4b-it-20260403"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
979
moderation_latency
(null)
generation_time
152547
tokens_prompt
77
tokens_completion
8538
native_tokens_prompt
81
native_tokens_completion
8678
native_tokens_completion_images
(null)
native_tokens_reasoning
8248
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
service_tier
(null)
usage
0.0026115
router
(null)
provider_responses
0
endpoint_id
"e2a16fa6-1f65-413e-9ecb-93901089ddb1"
id
"id-1778422041257"
is_byok
false
latency
887
model_permaslug
"google/gemma-4-26b-a4b-it-20260403"
provider_name
"Cloudflare"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778422040-AykkoYVedcn6aKiCh7lD"
session_id
(null)
api_type
"completions"
id
"gen-1778422040-llRbfklNC6ZwpNZL4G99"
upstream_id
"id-1778422041257"
total_cost
0.0026115
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Cloudflare"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
43.38%
Matches word count
n/a
neededClean
false
words
183
80.84%
Dialogue to Total Word Ratio
Ratio: 96.79%, Deviation: 6.79%
neededClean
false
wordsTotal
187
wordsDialogue
181
62.1075%