NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHYAPACNXTJVWJS76F3XQMAB
Run Details
Model:
Gemma 3 12B
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
15.4s
Cost:
$0.000095
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
99.92%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.9595%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The community hall smelled faintly of sugar cookies and misplaced anxiety. Molly surveyed the scene, a pinched expression tightening her already angular face. Streamers drooped from the ceiling like sad, multicolored rain. A bouncy castle pulsed with shrieking, sugar-fueled children. It was, objectively, a nightmare. “Are you *sure* this is the best angle, Bill?” she asked, her voice barely audible above the chaotic soundtrack. "Stealing presents from a five-year-old's birthday party? It's… low tide, even for us." Bill, a man whose enthusiasm outweighed his competence by a significant margin, was meticulously constructing a diversion from a pile of juggling balls, a rubber chicken, and a feather boa. “Low tide is a goldmine, Molly! Think about it. Mountains of toys! The latest action figures! Plush unicorns! Emotional distress buys us time.” Molly resisted the urge to rub her temples. “Emotional distress buys us a lengthy prison sentence, Bill. We're going to be the villains in a storybook.” She gestured at the mass of tiny humans assaulting the bouncy castle. "Look at them! They’re already terrifying enough.” “Relax, relax,” Bill chirped, attempting to toss the rubber chicken into the air and failing spectacularly. It landed with a pathetic thud at his feet. “Operation Sugar Rush is foolproof. I’ve got the distraction covered - I’ll be the ‘zany entertainer’. These kids can’t resist a rubber chicken. Trust me.” "I trust a rusty paperclip more than I trust your entertainment skills," Molly muttered under her breath. “And what about security? There’s bound to be at least one moderately attentive parent lurking about.” “Mother security is easily bypassed. Observe,” Bill declared, straightening his already crooked bow tie. “Parents are motivated by cake and the desperate need for five minutes of quiet. They’re your average sentry.” He launched into a clumsy juggling routine, dropping balls with alarming regularity. Three landed squarely in a bowl of punch. “See? Instant chaos! While they’re dealing with sticky children and ruined refreshments, we swoop in, acquire the merchandise, and vanish like pixies.” Molly shuddered. “Pixies are known for their mischievousness, Bill. We’re going for 'unrepentant thieves’, not ‘fairy folk’. How long do you estimate we have to work before someone notices a suspiciously well-dressed man creating a pink and sticky disaster area?” “Maximum fifteen minutes,” Bill announced, abandoning the juggling to strike a dramatic pose, the feather boa askew. “After that, the cake will have softened their defenses. And I might lose interest in the chicken.” Molly sighed, surrendering to the absurdity. "Fifteen minutes, then. And Bill? No interacting with the children. Absolutely no saying 'peek-a-boo' or anything equally ridiculous. You're a ghost. A silent, rubber-chicken-adjacent ghost.” “Got it, Molly! A professional ghost! Now, where's that unicorn…?” He vanished into the throng of miniature partygoers, emitting a strangled yelp as a particularly enthusiastic toddler attempted to confiscate the feather boa. Molly watched him go, already bracing herself for the inevitable fallout. It was always the fallout that was the worst.
created_at
"2026-02-20T20:08:38.046Z"
2/20/2026, 8:08:38 PM
model
"google/gemma-3-12b-it"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
159
moderation_latency
(null)
generation_time
15282
tokens_prompt
93
tokens_completion
858
native_tokens_prompt
75
native_tokens_completion
704
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00009452
router
(null)
provider_responses
0
endpoint_id
"eb06dc92-5a16-47ec-a776-6ef956457c47"
id
"chatcmpl-Rf2BvkfQdhRkDBFy6MPaW6bw"
is_byok
false
latency
52
model_permaslug
"google/gemma-3-12b-it"
provider_name
"DeepInfra"
status
200
api_type
"completions"
id
"gen-1771618118-ivTAogM1XsyJzmcxZ79m"
upstream_id
"chatcmpl-Rf2BvkfQdhRkDBFy6MPaW6bw"
total_cost
0.00009452
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
Evaluation details
Result
Evaluator
Details
Meta Data
99.92%
Matches word count
n/a
neededClean
false
words
497
0.00%
Dialogue to Total Word Ratio
Ratio: 54.33%, Deviation: 24.33%
neededClean
false
wordsTotal
497
wordsDialogue
270
49.9595%