NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR938M5FQEG5K6JD9FKGPTYT
Run Details
Model:
Gemma 4 31B
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
34.5s
Cost:
$0.000233
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The blueprint was drawn in crayon on the back of a pizza box, featuring a crude sketch of a suburban living room and several menacing arrows pointing toward a mountain of colorful wrapping paper. Molly studied the map with the intensity of a general planning the invasion of Normandy, while Bill struggled to fit himself into a giant, plush panda costume that smelled faintly of old gym socks. "The perimeter is guarded by a golden retriever with a low IQ but a loud bark," Molly whispered, pointing to a doodle of a dog. "We enter through the sliding door during the cake collapse. Chaos is our primary camouflage." Bill tried to cross his arms, but the plush padding made him look like he was hugging himself in a panic. "Why the panda suit? I feel like a target. A large, fluffy, monochromatic target." "It is the perfect disguise," Molly insisted, adjusting her own costume, which consisted of a very convincing oversized baby bib and a bonnet. "Nobody suspects the hired entertainment or the weirdly large infant. We blend in, we scoop the loot, and we vanish before the piƱata bursts." Bill shifted his weight, causing the panda head to tilt precariously to the left. "And what about the target? The presents. Are we talking high-end LEGO sets or just a bunch of generic bubble-blowers?" "Intelligence suggests a luxury distribution," Molly replied, her eyes gleaming. "The birthday boy is the grandson of a venture capitalist. We are looking at limited edition robotics, handheld consoles, and possibly a miniature electric car. It is a goldmine of plastic and circuitry." "I don't like the window of operation," Bill grumbled, his voice muffled by the foam head. "The sugar rush hits at 3:00 PM. Once those kids start vibrating, the environment becomes unpredictable. It is a tactical nightmare." Molly patted the pizza box firmly. "That is exactly why we strike at 3:05. When the sugar peak hits, the parents will be too busy preventing a living room riot to notice a panda hauling a sack of toys toward the fence." Bill sighed, a sound that echoed wetly inside the mask. "I still think the penguin suit was a better choice for agility." "Penguins don't evoke trust, Bill. Pandas are cuddly. Now, practice your 'friendly mascot' wave. If you look suspicious, we are dead." Bill raised a heavy, padded paw and waved with a slow, rhythmic lethargy that looked more like a threat than a greeting. Molly stared at him for a long moment before shaking her head. "We are definitely going to jail," she muttered, though she didn't stop smiling. "Now, let's go. The cake is about to be served, and I refuse to miss the window of opportunity."
created_at
"2026-05-10T14:04:45.786Z"
5/10/2026, 2:04:45 PM
model
"google/gemma-4-31b-it-20260402"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
245
moderation_latency
(null)
generation_time
27494
tokens_prompt
77
tokens_completion
681
native_tokens_prompt
79
native_tokens_completion
586
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
service_tier
(null)
usage
0.00023295
router
(null)
provider_responses
0
endpoint_id
"69279254-6a96-4e9d-abc6-ea76e6bf8e97"
is_byok
false
latency
46
model_permaslug
"google/gemma-4-31b-it-20260402"
provider_name
"Novita"
status
429
1
endpoint_id
"9d4644e9-a5e5-41f4-b810-d75f484d2c24"
is_byok
false
latency
22
model_permaslug
"google/gemma-4-31b-it-20260402"
provider_name
"Parasail"
status
429
2
endpoint_id
"4149ab03-3b69-45a0-b94c-6c253b8641a1"
id
"chatcmpl-RXF1Ladq0nXKeoZiKC1EWrym"
is_byok
false
latency
61
model_permaslug
"google/gemma-4-31b-it-20260402"
provider_name
"DeepInfra"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778421885-cBJeMne7FXpxWAXtmWC5"
session_id
(null)
api_type
"completions"
id
"gen-1778421885-uyXIcDMR4UeKuAtoccOd"
upstream_id
"chatcmpl-RXF1Ladq0nXKeoZiKC1EWrym"
total_cost
0.00023295
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
455
0.00%
Dialogue to Total Word Ratio
Ratio: 56.02%, Deviation: 26.02%
neededClean
false
wordsTotal
457
wordsDialogue
256
0.0000%