NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY8667YQEAM4P4W8DB8SAKT
Run Details
Model:
Gemma 3 4B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
7.2s
Cost:
$0.000053
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“So, let’s go over this again, Bill. Because frankly, my brain is starting to resemble a bouncy castle after a toddler convention.” “Relax, Molly. It’s a simple in-and-out. Operation Sugar Rush.” “Simple? You call snatching every single present at a five-year-old’s birthday party ‘simple’? That’s… aggressively chaotic.” “It’s lucrative! They’ve got drones, Molly. Drones! Imagine the resale value of a remote-controlled dinosaur.” “I’m imagining myself covered in glitter and sticky fingerprints. Seriously, the sheer volume of plastic is terrifying.” “Think of the families, Molly! We’re liberating them from the crushing weight of consumerism. Plus, we’re getting a vintage Barbie Dreamhouse.” “That’s your justification? ‘Liberation’ and a Barbie Dreamhouse? Bill, you’re a pathological collector.” “It’s a *good* Barbie Dreamhouse! The one with the extra bathroom. It’s a collector’s item.” “Right. And the plan? Because I’m picturing a full-scale children’s rebellion. Tiny fists and a lot of screaming.” “Distraction. I’ve already contacted ‘Sparkle’, she'll deploy a synchronized balloon release. Pink, naturally. Overwhelming, yet festive.” “Sparkle? Is that a code name or a brand of industrial adhesive?” “Neither! Just… trust me. She’s good. Then I disable the security cameras – a simple reboot, really. You handle the presents.” “Me? Just… collect presents? You know I have a deeply ingrained aversion to things manufactured for miniature humans.” “You’re the packer, Molly. You’re organized. You handle logistics. It’s what you do best." “My best involves spreadsheets and meticulous color-coding. This involves screaming children and the faint scent of cake.” “Think of it as… a very colorful spreadsheet.” “Okay, fine. But if a single child tries to bite me, I’m blaming you.” “Deal. And if the Barbie Dreamhouse is just… ordinary, you’re buying me a lifetime supply of gummy bears.” “You’re banking on the Dreamhouse being extraordinary, aren’t you?” “Let’s just say I have a vested interest. Now, about the disguises…” “Don't even *think* about dressing as clowns. Absolutely not.” “What about garden gnomes? They’re subtly unsettling, and blend in surprisingly well with the decor.” “Bill! No gnomes. I’m thinking oversized bow ties. Very distracting.” “Bow ties. Right. Okay. But if I see a single fluffy unicorn, all bets are off.” “Just… focus on the bow ties. And for the love of all that is holy, no glitter.”
created_at
"2026-02-20T19:24:52.364Z"
2/20/2026, 7:24:52 PM
model
"google/gemma-3-4b-it"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
185
moderation_latency
(null)
generation_time
7019
tokens_prompt
93
tokens_completion
720
native_tokens_prompt
75
native_tokens_completion
621
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00005268
router
(null)
provider_responses
0
endpoint_id
"d3de3bd4-81bc-48fb-924f-2a87b2a36e75"
id
"chatcmpl-RHRf90jcshN38DRYhmeYEIJu"
is_byok
false
latency
54
model_permaslug
"google/gemma-3-4b-it"
provider_name
"DeepInfra"
status
200
api_type
"completions"
id
"gen-1771615492-gjFtFylOJzBnQtJsNKxQ"
upstream_id
"chatcmpl-RHRf90jcshN38DRYhmeYEIJu"
total_cost
0.00005268
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
378
0.00%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 30.00%
neededClean
false
wordsTotal
378
wordsDialogue
378
0.0000%