NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY7T1YBRB5R733J1VM1KJJ6
Run Details
Model:
Gemma 3 4B
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
7.5s
Cost:
$0.000056
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Right, so, Nightingale Elementary, Saturday, three o’clock. Confirmation on the bouncy castle?” Molly asked, tapping a chewed-up pen against a greasy napkin covered in diagrams and circled exclamation points. “Confirmed. Pink, inflatable leviathan. Balloons – primary colours, naturally. And a disturbingly cheerful clown named ‘Pip’ who’s apparently handing out ‘party favors’,” Bill replied, leaning back in his booth at the ‘Rusty Sprocket’ diner, nursing a lukewarm coffee. “It’s…optimistic.” “Optimistic is good. Less like a hostage situation, more like a sugar-fueled heist. We want chaos, not screaming children,” Molly countered, drawing a thick line through ‘hostage situation’ on her napkin. “So, the presents. We’re after the…everything.” “Everything’s a bit broad, isn’t it? Like saying ‘gold’ when you mean ‘gold bars’.” “I’m saying a mountain of brightly wrapped toys. Action figures, dolls, slime, those glow stick bracelets that cause permanent aches – the works,” Molly insisted, gesturing wildly with the pen. “Mrs. Henderson – the mother of little Timmy Henderson, the target – she’s a collector. Surplus Christmas gifts, mostly. Forgotten birthdays. The whole tragically adorable mess.” “You’re describing a human landfill, Molly. A landfill of plastic and disappointment.” “Don't be dramatic. Think of the resale value! We’ll be swimming in vintage Lego.” “Lego is not swimming. It's meticulously arranged in tiny, overpriced bins.” Bill sighed, stirring his coffee with excessive force. “And what about Pip? The clown? He’s a variable we haven’t accounted for.” “Pip is a distraction. A highly irritating, balloon-animal-based distraction. I’ve been watching him. He creates a disproportionate amount of squeaky animal noises. I’m thinking we leverage that.” “Leverage squeaks? How?” “We overload the system. A judicious application of noisemakers – air horns, cat toys, anything that can generate a sustained, ear-splitting shriek. He’ll be overwhelmed. It’s foolproof.” “Foolproof? Molly, you once tried to disable a laser grid with a feather boa.” “That was a tactical miscalculation! This is different. This requires…precision squeaking.” She looked intensely at Bill. “Look, I’ve mapped out the exit route. Through the snack table, under the bouncy castle, then out the emergency fire exit. Minimal visibility, maximum loot.” “The emergency fire exit? Seriously? That’s practically begging to be noticed.” “It’s the most aesthetically pleasing emergency fire exit. It’s framed with pine cones and tiny plastic flags. It’s... charmingly alarming.” “Charmingly alarming is my phrase. Okay, okay. Pine cones and flags. Noise cannons. Let’s just…try not to traumatize any small children.” “Trauma is optional. Loot is mandatory.” Molly grinned, tapping the napkin decisively. “Now, about those glow stick bracelets…”
created_at
"2026-02-20T19:18:14.744Z"
2/20/2026, 7:18:14 PM
model
"google/gemma-3-4b-it"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
175
moderation_latency
(null)
generation_time
7315
tokens_prompt
93
tokens_completion
806
native_tokens_prompt
75
native_tokens_completion
662
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00005596
router
(null)
provider_responses
0
endpoint_id
"d3de3bd4-81bc-48fb-924f-2a87b2a36e75"
id
"chatcmpl-RKdU3mkL6Vs7EbQ08tlMGSNb"
is_byok
false
latency
45
model_permaslug
"google/gemma-3-4b-it"
provider_name
"DeepInfra"
status
200
api_type
"completions"
id
"gen-1771615094-9vd43xffyuIxZIRA09Bi"
upstream_id
"chatcmpl-RKdU3mkL6Vs7EbQ08tlMGSNb"
total_cost
0.00005596
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
415
0.00%
Dialogue to Total Word Ratio
Ratio: 85.54%, Deviation: 35.54%
neededClean
false
wordsTotal
415
wordsDialogue
355
0.0000%