NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY87X9BQGVDXKBSCESYERAN
Run Details
Model:
Gemma 3 4B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
7.3s
Cost:
$0.000053
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Right, so, Operation Sugar Rush is a go, then?” Molly asked, meticulously arranging miniature plastic dinosaurs on a napkin. Bill sighed dramatically, running a hand through his already chaotic hair. “Do we *have* to do this, Molly? Really? A children’s birthday party? It feels…sad.” “Sad is a minimum wage job, Bill. This is lucrative. Think of the resale value! Limited edition Paw Patrol figurines. Vintage Beanie Babies. The possibilities are… glittery.” “Glittery and potentially fraught with accusations of child endangerment,” Bill pointed out, staring at the dinosaur collection. “Do you even realize how many sugar-fueled, screaming toddlers we’re dealing with?” “That’s part of the charm. Controlled chaos. Besides, we’ll be in and out. Fifteen minutes. Maximum. Think ninja, not kidnapper.” “Ninjas don’t wear floral camouflage,” Bill grumbled. “This outfit is humiliating.” “It’s practical! And festive. Now, about the distraction…” Molly tapped a pen against her chin. “I was thinking a brief, staged emergency involving a rogue bouncy castle?” “A bouncy castle? Molly, that’s… elaborate.” “Necessity, Bill. Utter necessity. Backup plan: a synchronized rendition of ‘Baby Shark’ performed by a group of elderly ladies.” “Absolutely not.” “Fine. We'll go with the bouncy castle. I've already called a panicked landscaping company. They’re bringing a forklift.” “A forklift? Seriously? That’s bordering on overkill.” “It’s the element of surprise! Look, the key is stealth. We’re professionals. Remember the blueprints?” “The blueprints you stole from Mr. Henderson's garage after he forgot to lock the door?” “Details, details. They show the ventilation shaft. Relatively small. You can squeeze through.” “I’m five-foot-four, Molly. I’m going to explode in that ventilation shaft.” “You’ll be compact! Think of it as… tiny espionage.” “Espionage involving dust and potential raccoon encounters.” “Raccoons are a minor inconvenience. Now, let’s talk loot. I'm prioritizing the drum set. Apparently, eight-year-old Timmy is obsessed.” “Drum set? Are you planning on starting a band?” “Precisely. Think of the resale value! A decent Ludwig drum set… we’re talking serious cash, Bill. Serious, glittery cash.” “You’re terrifying, you know that?” “I thrive on terror. Okay, logistics. You’re on distraction. I’m on present acquisition. Code word: ‘Sprinkles.’” “Sprinkles? Really?” “Don’t judge my creative process. And for the love of all that is shiny, try to look slightly less horrified. The children are watching.”
created_at
"2026-02-20T19:25:48.727Z"
2/20/2026, 7:25:48 PM
model
"google/gemma-3-4b-it"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
225
moderation_latency
(null)
generation_time
7135
tokens_prompt
93
tokens_completion
739
native_tokens_prompt
75
native_tokens_completion
622
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00005276
router
(null)
provider_responses
0
endpoint_id
"d3de3bd4-81bc-48fb-924f-2a87b2a36e75"
id
"chatcmpl-Rh1nEYuwoVwoz4lgWmlmH7wg"
is_byok
false
latency
105
model_permaslug
"google/gemma-3-4b-it"
provider_name
"DeepInfra"
status
200
api_type
"completions"
id
"gen-1771615548-JHEweL93FuUuQXqm885L"
upstream_id
"chatcmpl-Rh1nEYuwoVwoz4lgWmlmH7wg"
total_cost
0.00005276
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
379
0.00%
Dialogue to Total Word Ratio
Ratio: 89.97%, Deviation: 19.97%
neededClean
false
wordsTotal
379
wordsDialogue
341
0.0000%