NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR929E849AEVD8G63H76CZRD
Run Details
Model:
Gemma 4 26B (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
3m 41s
Cost:
$0.002685
Provider:
Novita
Evaluation summary
Score
Evaluator
Details Link
98.71%
Matches word count
(details)
98.72%
Dialogue to Total Word Ratio
(details)
98.7174%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Listen closely, Bill. The primary target is the central gift pile located directly beneath the frosted cupcake tower," Molly whispered, tapping a crayon-drawn map. "Do we absolutely have to wear the oversized clown noses?" Bill asked, eyeing the foam spheres. "It's the perfect disguise. Who suspects a clown of grand larceny?" "I feel ridiculous. And incredibly itchy." "Focus! We strike during the peak chaos of the bouncy castle session. That is our only window of opportunity." "What if a toddler intercepts our perimeter?" "Use the tactical bubble wand. It creates a momentary sensory distraction." "A bubble wand? Seriously? We are criminals, Molly, not birthday entertainers." "It is highly effective, Bill. They stop to pop, we grab the LEGO sets and the remote-control cars. Then, we exit via the sandbox." "The sandbox is a mess. My tactical boots will never recover from the grit." "Do you want the shiny plastic gold or not?" "Fine. But I am taking a handful of those dinosaur nuggets for the road." "No food, Bill! We are professional thieves, not hungry mascots!" "Professional thieves with low blood sugar, Molly. It is a very fine line between heist and snack time."
created_at
"2026-05-10T13:47:37.101Z"
5/10/2026, 1:47:37 PM
model
"google/gemma-4-26b-a4b-it-20260403"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
5120
moderation_latency
(null)
generation_time
220993
tokens_prompt
77
tokens_completion
4981
native_tokens_prompt
81
native_tokens_completion
6686
native_tokens_completion_images
(null)
native_tokens_reasoning
4675
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
service_tier
(null)
usage
0.00268493
router
(null)
provider_responses
0
endpoint_id
"03b96766-ef7e-454f-9519-c76b70654b49"
id
"e339351e4a763fa6d5c0c6228e22f630"
is_byok
false
latency
4969
model_permaslug
"google/gemma-4-26b-a4b-it-20260403"
provider_name
"Novita"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778420857-pGNKpfX9wXbvrdfyIYBH"
session_id
(null)
api_type
"completions"
id
"gen-1778420857-adZ4BTGhZiQbl3wu4O8m"
upstream_id
"e339351e4a763fa6d5c0c6228e22f630"
total_cost
0.00268493
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Novita"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
98.71%
Matches word count
n/a
neededClean
false
words
194
98.72%
Dialogue to Total Word Ratio
Ratio: 93.37%, Deviation: 3.37%
neededClean
false
wordsTotal
196
wordsDialogue
183
98.7174%