NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY9QFRHD30AT1E661HC2DS6
Run Details
Model:
Gemma 3 4B
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
9.5s
Cost:
$0.000059
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
27.17%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
13.5829%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The rain hammered against the corrugated iron roof of the abandoned hardware store, mimicking the frantic drumming in Molly’s chest. Bill, meticulously assembling a miniature grappling hook using bobby pins and a rubber band, wasn’t exhibiting much outward distress. “Seriously, Bill,” Molly said, folding her arms, “a children’s birthday party? You’re staging a raid on a room full of sugar-fueled chaos and inflatable dinosaurs. This feels…wrong.” “Wrong is a relative term, Molly,” Bill replied, adjusting the rubber band with painstaking precision. “And frankly, ‘wrong’ doesn’t pay the bills. Think about it – the sheer volume of disposable income on display. We’re talking about a mountain of shiny, wrapped things. A veritable Everest of Erector Sets.” “You’re obsessed with Erector Sets,” Molly stated flatly, staring out at the rain-slicked street. “Last week you spent an hour arguing with a shopkeeper about the structural integrity of a plastic castle.” “It’s important research!” Bill insisted, finally securing the hook. “Understanding the building blocks of childhood desire is crucial for a successful operation.” He held it up, examining it like a priceless artifact. “This, my dear Molly, is the key to unlocking a treasure trove.” “The key to unlocking a sugar rush and probably a minor injury,” she countered. “What’s the plan, anyway? We just waltz in, grab everything, and hope for the best?” “Minimalist approach,” Bill said, sketching a crude diagram on a scrap of cardboard with a chewed-up pencil. “Phase one: infiltration. I’ll utilize the ventilation shaft – surprisingly well-maintained considering the state of the building. Phase two: swift assessment. Prioritize the large presents. Think remote-controlled vehicles, gaming consoles… the usual suspects.” “And what about the parents?” Molly asked, pointing towards the flickering lights of the party venue across the street. “They’re not exactly known for their relaxed security protocols.” “They’ll be preoccupied,” Bill said dismissively. “Distracted by cake and the incessant shrieking of miniature humans. It's a perfect window. I’ve calculated the approximate timeframe. Between the rendition of ‘Happy Birthday’ and the simultaneous appearance of a clown. Fifteen minutes, max.” “Fifteen minutes to steal a fortune’s worth of kid’s birthday presents," Molly repeated, a skeptical frown etched on her face. “You're basing this on what, precisely? Your extensive knowledge of children’s birthday parties?” “Intuition, Molly. And a surprisingly detailed YouTube archive on party decorations,” Bill said with a shrug. “Besides, it’s gotta be better than inventorying lawnmowers at Miller’s Hardware. Remember that debacle?” “Don’t remind me," Molly shuddered. "Let’s just hope this little venture doesn’t end with us being chased by a horde of angry toddlers wielding sticky fingers." Bill examined his miniature hook again. “Aggression isn’t part of the plan. Efficiency is. Now, are you in, or are you going to stand there and lament the injustice of it all?” Molly sighed, pushing herself to her feet. “Fine. But if I see a single inflatable dinosaur, I’m out.”
created_at
"2026-02-20T19:51:47.742Z"
2/20/2026, 7:51:47 PM
model
"google/gemma-3-4b-it"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
149
moderation_latency
(null)
generation_time
9304
tokens_prompt
93
tokens_completion
868
native_tokens_prompt
75
native_tokens_completion
698
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00005884
router
(null)
provider_responses
0
endpoint_id
"d3de3bd4-81bc-48fb-924f-2a87b2a36e75"
id
"chatcmpl-RtnYxbepqvjHIrdZS2ZLptXO"
is_byok
false
latency
47
model_permaslug
"google/gemma-3-4b-it"
provider_name
"DeepInfra"
status
200
api_type
"completions"
id
"gen-1771617107-EhBvGZ4HhrgrY2krubKQ"
upstream_id
"chatcmpl-RtnYxbepqvjHIrdZS2ZLptXO"
total_cost
0.00005884
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
Evaluation details
Result
Evaluator
Details
Meta Data
27.17%
Matches word count
n/a
neededClean
false
words
481
0.00%
Dialogue to Total Word Ratio
Ratio: 70.27%, Deviation: 40.27%
neededClean
false
wordsTotal
481
wordsDialogue
338
13.5829%