NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY80TECJVSANTTPCJR6EE7V
Run Details
Model:
Gemma 3 12B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
15.5s
Cost:
$0.000092
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
72.02%
Dialogue to Total Word Ratio
(details)
36.0079%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The aroma of melted crayons and disappointment hung heavy in Bill’s cluttered apartment. Molly, perched precariously on a stack of old National Geographics, was meticulously sketching a bouncy castle. "Right, so, inflatable perimeter secured," she announced, tapping the drawing with a bright pink highlighter. "Kids will be too busy bouncing to notice anything amiss." Bill, sprawled on a threadbare armchair, slowly peeled a grape off a plate. "A bouncy castle? Molly, really? We’re stealing Christmas presents, not organizing a petting zoo.” "Strategic distraction, Bill! Think! Squirrel! Shiny thing! Boing, boing, boing! Away they go, leaving unattended piles of delightful loot.” She gestured dramatically at the drawing. “It's innovative. It's efficient. It's… bouncy." Bill munched on the grape. “It’s utterly ridiculous. And attracting attention. Imagine the headlines, ‘Two masked bandits thwarted by a particularly enthusiastic six-year-old.’” “You worry too much. We’ll be in and out before anyone even realizes their little Timmy doesn't have a remote-controlled dinosaur. Speaking of which,” she zoomed in on a crudely drawn picture of a toy dinosaur. “Priorities. Remote-controlled dinosaurs. Lego castles. Anything featuring glitter. Those are the targets, Bill. High-value, easily transportable joy.” “And how do you propose we transport these… piles of joy?” Bill asked, gesturing around the apartment. “We’re not exactly smuggling them in shopping carts.” “We’re going with the stroller approach. Classic. Nobody suspects a woman pushing a stroller. Suddenly, the stroller’s full of Walkie Talkies and unicorn plushies. Genius, right?” Molly beamed. “A double stroller,” Bill corrected, taking another grape. “For the sheer volume of anticipated gifts. And you expect me to push it?” Molly wrinkled her nose. “Well, someone has to. You're the strong one. Unless you’d prefer I juggle those dinosaur figures while sprinting?” “I’m thinking about a decoy,” Bill mused, staring at the ceiling. “A slightly… unbalanced mime. Wandering around, creating confusion. No one looks twice at a mime.” “A mime? You’re serious?” Molly threw her hands up in exasperation. “Bill, the last thing we need is more weirdness. This is a children’s party, not a circus sideshow!” "It’s a tactical advantage!” Bill insisted. “He could be strategically tripping over things. Briefly distracting individual children. It's misdirection!" "Misdirection involving a silently judging man in striped clothing?" Molly asked, clearly unconvinced. "I’m sticking with the bouncy castle. It’s… wholesome. Minimizes suspicion. A touch of whimsy. Besides, I already ordered the face paint." Bill groaned. “Face paint? Now you’re just taunting me. What are we going to paint ourselves as? Friendly clowns?” “Possibly,” Molly said, a mischievous glint in her eye. “Or maybe… exceedingly cheerful elves. Think about it, Bill. Elves. Presents. Children. It’s practically foolproof.”
created_at
"2026-02-20T19:21:56.44Z"
2/20/2026, 7:21:56 PM
model
"google/gemma-3-12b-it"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
179
moderation_latency
(null)
generation_time
15445
tokens_prompt
93
tokens_completion
816
native_tokens_prompt
75
native_tokens_completion
683
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00009179
router
(null)
provider_responses
0
endpoint_id
"eb06dc92-5a16-47ec-a776-6ef956457c47"
id
"chatcmpl-Rq1O0qDlIwkXz3PQXg79UdTg"
is_byok
false
latency
58
model_permaslug
"google/gemma-3-12b-it"
provider_name
"DeepInfra"
status
200
api_type
"completions"
id
"gen-1771615316-UGPgtlonvQEChvz9pUQf"
upstream_id
"chatcmpl-Rq1O0qDlIwkXz3PQXg79UdTg"
total_cost
0.00009179
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
441
72.02%
Dialogue to Total Word Ratio
Ratio: 72.56%, Deviation: 2.56%
neededClean
false
wordsTotal
441
wordsDialogue
320
36.0079%