NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHYAN03V0VBHFDMJP93T008A
Run Details
Model:
Gemma 3 4B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
6.9s
Cost:
$0.000049
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Right, so, Operation Sugar Rush. You’ve confirmed the bouncy castle doesn’t have laser grids?” “Nope. Just a frankly terrifying amount of glitter and a surprisingly aggressive inflatable unicorn.” “Ugh. Unicorns. Always a problem. Okay, distraction. You handle the, uh, clown?” “Leo? Seriously? He’s wearing a full Elvis jumpsuit. That's not a distraction, that’s a red flag. I’m thinking a trombone. A very loud trombone.” “A trombone? Bill, we’re stealing presents, not serenading the parents.” “It creates noise, Molly! Chaos! It’s strategic dissonance!” “Strategic…dissonance? You’ve been hitting the artisanal kombucha again, haven’t you?” “It’s fermented! It’s… stimulating! Look, just trust me on the trombone.” “Fine. Trombone. You got the grappling hook?” “Of course. Gravitational assistance is practically my middle name. Though technically, it’s Bartholomew. But you get the gist.” “Bartholomew the Gravitational Thief. Catchy. Okay, let’s talk logistics. We’re in through the back, right? The service entrance?” “Confirmed. Window’s slightly ajar. Mrs. Henderson always forgets to latch it. Said something about ‘a busy life’ and ‘not wanting to be a bother’.” “A busy life involving a hundred sugar-fueled five-year-olds? Sounds exhausting. What about the parents?” “They’re all glued to their phones. Competitive cake decorating, apparently. Fierce rivalry.” “Excellent. So minimal resistance. Right, the presents themselves... what's the priority list?” “Obviously, the LEGO Millennium Falcon. It’s iconic. Then the remote-control monster truck. Classic.” “And the… the stuffed badger wearing a tiny raincoat?” “Don’t judge, Bill! It has a name – Bartholomew the Second! It’s a delicate operation.” “Right. Bartholomew the Second. Okay, and the... glitter cannons?” “Absolutely. The more glitter, the better. For camouflage, obviously.” “Camouflage? We’re not blending in, Molly. We're robbing a children’s birthday party.” “Details, details! Think of it as… enhanced confusion. Look, just grab everything that’s shiny and wrapped. Let’s move. I’ll cue the trombone.” “You’re going to ruin my carefully constructed plan with a trombone?” “It's a tactical instrument of mayhem, Bill! Now, let's go liberate some loot.” “Seriously. A trombone.”
created_at
"2026-02-20T20:07:54.757Z"
2/20/2026, 8:07:54 PM
model
"google/gemma-3-4b-it"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
196
moderation_latency
(null)
generation_time
6767
tokens_prompt
93
tokens_completion
671
native_tokens_prompt
75
native_tokens_completion
575
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.000049
router
(null)
provider_responses
0
endpoint_id
"d3de3bd4-81bc-48fb-924f-2a87b2a36e75"
id
"chatcmpl-RWHDeY4MMqzG1rTWzFo4ZZZh"
is_byok
false
latency
60
model_permaslug
"google/gemma-3-4b-it"
provider_name
"DeepInfra"
status
200
api_type
"completions"
id
"gen-1771618074-8BHcHH58EbTYWez8N2rH"
upstream_id
"chatcmpl-RWHDeY4MMqzG1rTWzFo4ZZZh"
total_cost
0.000049
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
327
0.00%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 30.00%
neededClean
false
wordsTotal
327
wordsDialogue
327
0.0000%