NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR8ZF5W6JXXM9SWJAD9W5B66
Run Details
Model:
Gemma 4 31B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
20.5s
Cost:
$0.000254
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"The balloons are the primary obstacle, Bill." "Balloons? They're rubber bags of air, Molly. Just pop them and move on." "You don't understand the acoustics of a bouncy castle. One pop and it's like a gunshot in a cathedral. The children will scream, the parents will panic, and suddenly we're surrounded by thirty toddlers in dinosaur costumes. It's a tactical nightmare." "I've got the stealth-grade scissors. I'll snip the ribbons. We glide in, grab the pile, and glide out." "Glide? You're wearing a tuxedo and carrying a bag of loot. You'll look like a very suspicious butler who's stealing Christmas in July." "It’s a themed party, Molly. I'm blending in. I'll just tell them I'm the entertainment. 'Hello children, I am the Magic Magician! Now watch as I make these LEGO sets disappear!'" "You can't even do a coin trick without dropping the coin, Bill." "That's part of the comedy! Now, talk to me about the cake. Is it a distraction or a barrier?" "It's a three-tier fondant monstrosity guarding the gift table. We have to navigate the perimeter without tripping the juice-box perimeter." "The what?" "The juice boxes. They've spilled everywhere. It's a slip-and-slide of apple juice. One wrong step and you're sliding face-first into a pile of wrapping paper." "I'll wear grip-socks. I've got a pair with little rubber paws on the bottom." "You're planning a high-stakes heist in paw-socks?" "They provide excellent traction on linoleum, Molly. Don't judge my gear." "Fine. But what about the birthday boy? He’s six. He’s a wildcard. He’s got a sugar rush and a plastic sword. He's basically a tiny, chaotic mercenary." "I'll distract him with a shiny object. Or a lollipop. Everyone loves a lollipop." "We are stealing forty-two presents, Bill. Including the giant oversized teddy bear. How are you planning to 'glide' out of a house with a six-foot plush bear under your arm while wearing paw-socks?" "I'll tell them the bear is going to a special magic school for bears." "You are the worst professional I've ever worked with." "And yet, you're the one who suggested we target a five-year-old's birthday party for the 'low security' risk." "It is low security! The adults are all distracted by the artisanal cheese platter. It's the perfect crime." "Right. Now, do I look like a magician or a waiter?" "You look like a man who's about to be tackled by a group of kindergartners. Now put on the cape and let's go."
created_at
"2026-05-10T12:58:19.408Z"
5/10/2026, 12:58:19 PM
model
"google/gemma-4-31b-it-20260402"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
225
moderation_latency
(null)
generation_time
20278
tokens_prompt
77
tokens_completion
648
native_tokens_prompt
79
native_tokens_completion
642
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
service_tier
(null)
usage
0.00025423
router
(null)
provider_responses
0
endpoint_id
"4149ab03-3b69-45a0-b94c-6c253b8641a1"
id
"chatcmpl-RiwAIgAttgqDsazVUUaephRS"
is_byok
false
latency
56
model_permaslug
"google/gemma-4-31b-it-20260402"
provider_name
"DeepInfra"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778417899-98tKUgt3w7LIjbtKAY86"
session_id
(null)
api_type
"completions"
id
"gen-1778417899-BJidgPoZWSQuM3tUAC2I"
upstream_id
"chatcmpl-RiwAIgAttgqDsazVUUaephRS"
total_cost
0.00025423
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
410
0.00%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 30.00%
neededClean
false
wordsTotal
424
wordsDialogue
424
0.0000%