NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHYARTH5FBDMDKH7NR72GYJ8
Run Details
Model:
Gemma 3 12B
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
14.0s
Cost:
$0.000101
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
60.28%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
30.1376%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Rain lashed against the bakery window, blurring the sugar-dusted pastries within. Molly chewed on a stale croissant, her gaze fixed on a crumpled flyer taped to the glass. “'Princess Penelope’s Fourth Birthday Spectacular!'” she read aloud, dripping with mock enthusiasm. “Seriously, Bill? A children's party?” Bill, a man whose enthusiasm generally peaked at locating a dropped pound coin, grunted from behind a mountain of éclairs. “Look, the intel was solid. Million-dollar security system—turns out it’s protecting a bouncy castle and a cake shaped like a unicorn. Don't knock it till you've tried it.” “A million-dollar security system for a room full of four-year-olds? What, are they expecting a ninja invasion?” Molly rolled her eyes. “Besides, stealing presents? It’s… tacky. We usually go for high-value art, rare stamps, maybe the occasional priceless Fabergé egg. Not sparkly tiaras and plastic dinosaurs.” “Tacky equals profitable, Molly. Think about it. Parents spend a fortune on these things. We’re talking limited edition superhero figurines, bespoke teddy bears, handcrafted wooden train sets… each individual gift could be worth a decent chunk. Plus,” he pointed a chocolate-covered finger at a picture on the flyer, “look at the guest list. Upper-crust. Daddy’s credit cards. Very, very shiny presents.” Molly considered this, taking another bite of her croissant. "Alright, alright, you've made a flimsy argument, but I'm listening. So, operation 'Sugar Rush' – what’s the plan? We can’t exactly waltz in and announce ‘surprise, we’re stealing your presents!’” "Of course not. Disguises. We go as entertainers." Bill beamed, a rogue smear of chocolate on his cheek. "I've been practicing my balloon animals. Prepare to be amazed. I'm going for a particularly realistic dachshund.” Molly choked on her croissant. "You? A balloon animal artist? You struggle to inflate a bicycle tire without a manual!” “Details, details! You're doing face paint. Butterflies, superheroes, the usual. And a convincing sob story about your clown school dropout rate.” "A sob story? Bill, you’re a master of meticulous planning, meticulous execution, not... emotional manipulation." “It’s all part of the package! Think of the sympathy factor. Parents aren’t going to suspect a man with damp eyes and a vaguely sad-looking dachshund of stealing their children's Christmas lists come to life.” He paused, then added, "You'll also need a glitter gun. Essential for crowd control. Think of it as… strategic sparkle deployment.” Molly sighed. “A glitter gun. Of course. So, the security system? What’s the weak link?” “The bouncy castle. Apparently, the system prioritizes the safety of the bouncing children. Which means if we disable the power to the bouncy castle, it temporarily shuts down all external sensors.” Bill tapped his chin thoughtfully. "We disable the bouncy castle, grab the loot, back out before anyone realizes their miniature porcelain pug collection is missing.” “And what about the parents? They’ll notice a sudden, mass disappearance of gifts.” Molly frowned. "They'll be too busy consoling children who are suddenly bouncing-castle-less to pay attention to missing presents. Trust me. This is foolproof." Bill grinned, picking up another éclair. "Foolproof, sparkly, and surprisingly lucrative.”
created_at
"2026-02-20T20:10:00.112Z"
2/20/2026, 8:10:00 PM
model
"google/gemma-3-12b-it"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
125
moderation_latency
(null)
generation_time
13875
tokens_prompt
93
tokens_completion
896
native_tokens_prompt
75
native_tokens_completion
752
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00010076
router
(null)
provider_responses
0
endpoint_id
"5aef2a85-3d1c-4659-a522-98bbb5272c6a"
is_byok
false
latency
32
model_permaslug
"google/gemma-3-12b-it"
provider_name
"Crusoe"
status
429
1
endpoint_id
"eb06dc92-5a16-47ec-a776-6ef956457c47"
id
"chatcmpl-RHs4iWmpVpBjUpEwtEbDGK3E"
is_byok
false
latency
57
model_permaslug
"google/gemma-3-12b-it"
provider_name
"DeepInfra"
status
200
api_type
"completions"
id
"gen-1771618200-9IXkT0fs1zyDHK2xYx5B"
upstream_id
"chatcmpl-RHs4iWmpVpBjUpEwtEbDGK3E"
total_cost
0.00010076
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
Evaluation details
Result
Evaluator
Details
Meta Data
60.28%
Matches word count
n/a
neededClean
false
words
515
0.00%
Dialogue to Total Word Ratio
Ratio: 77.67%, Deviation: 27.67%
neededClean
false
wordsTotal
515
wordsDialogue
400
30.1376%