NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-20-01KHY7TXRPPCYFDFVDY02FZQZV
Run Details
Model:
Gemma 3 4B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
7.7s
Cost:
$0.000058
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Okay, so, Operation Rainbow Riot. Just… sounds insane, doesn’t it?” “Insanely lucrative, Molly. That’s what it is. Besides, little Timmy Henderson’s got a drone. A *drone* filled with miniature fidget spinners. We’re talking serious assets here.” “Right, assets. Let’s talk logistics. We’re going in through the bouncy castle, correct?” “Absolutely. Like a fluffy, brightly colored missile. Think about it – kids, inflatable, minimal security. It’s… idyllic.” “Idyllic for a robbery. You’re making this worse.” “I’m being pragmatic. What’s your alternative? Scaling the wall with a grappling hook?” “Don’t tempt me. But no, bouncy castle. But what about the parents? They’re the real threat.” “They’ll be preoccupied. Cake. Balloons. Mild panic about sugar rushes. They’re practically pre-programmed for distraction.” “Highly doubtful. Mrs. Henderson is a retired police officer. She practically irons her suspicion.” “Details, details. We’ll be in and out before she even realizes her hotdog is missing. Speaking of – are we still going with the mustard-flavored smoke bombs?” “Absolutely. It’s surprisingly effective. I tested it on my cat. He was… bewildered.” “Bewildered is good. Confused is better. Now, the presents themselves. We’re hitting for volume, right? A mountain of plastic dinosaurs and sparkly unicorn figurines?” “That’s the objective. I’ve calculated we can potentially acquire approximately seven hundred and sixty-two items within a thirty-minute window.” “Seven hundred and sixty-two! That’s… a lot of miniature dump trucks.” “It’s a *lot* of potential profit, Bill. Don’t diminish my calculations. What about a diversion? Should we unleash the glitter?” “Seriously? Glitter? That’s your strategy? I thought you were the brains of this operation.” “It’s psychologically disruptive! Imagine: a sudden, shimmering cloud of iridescent despair. They’ll be chasing glitter for an hour.” “You're actually considering *glitter*?” “It’s an option! We could also stage a minor incident. Like… a rogue balloon?” “A rogue balloon? Molly, you've been watching too many cartoons.” "Don't be so judgmental. It's lightweight, easily launched, and covered in incredibly obnoxious polka dots. Think of the chaos!” “Okay, okay. No rogue balloon. But the smoke bombs are definitely happening. And we *need* to establish a clear looting route. Left, then right, then straight to the gift table. Minimal backtracking.” “Agreed. And no touching the pinatas. Remember what happened last time?” “Don’t remind me. That donkey was… aggressive.” “Let’s just focus on the presents, Bill. Seven hundred and sixty-two. Think of the resale value.” “I’m thinking of the paperwork. This is going to be a *lot* of paperwork.”
created_at
"2026-02-20T19:18:43.234Z"
2/20/2026, 7:18:43 PM
model
"google/gemma-3-4b-it"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
366
moderation_latency
(null)
generation_time
7571
tokens_prompt
93
tokens_completion
792
native_tokens_prompt
75
native_tokens_completion
684
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00005772
router
(null)
provider_responses
0
endpoint_id
"d3de3bd4-81bc-48fb-924f-2a87b2a36e75"
id
"chatcmpl-RXKOiAfmQcBUEO4qSIGWKsj5"
is_byok
false
latency
49
model_permaslug
"google/gemma-3-4b-it"
provider_name
"DeepInfra"
status
200
api_type
"completions"
id
"gen-1771615123-YEKBipgWQ2TqtV0WPduy"
upstream_id
"chatcmpl-RXKOiAfmQcBUEO4qSIGWKsj5"
total_cost
0.00005772
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
408
0.00%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 30.00%
neededClean
false
wordsTotal
408
wordsDialogue
408
0.0000%