NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR8WMYNJ32EKXCXKEWNN1BWB
Run Details
Model:
Gemma 4 31B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
20.4s
Cost:
$0.000248
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"The target is a six-year-old's birthday bash," Molly said, slapping a blueprint of a suburban living room onto the table. Bill stared at the drawing. "A birthday party? Molly, we’re professionals. We did the Antwerp diamond job." "Exactly. Which is why we’re overqualified for this. Look at the intel. The parents are 'helicopter parents.' They’ve bought everything. High-end Lego sets, remote-control drones, a miniature electric Jeep. It’s a goldmine of resale value." "It’s a goldmine of glitter and juice boxes," Bill countered. "What’s the entry point?" "The sliding glass door in the patio. I’ve timed the cake ceremony. At exactly 3:15 PM, the parents will usher all twenty children into the dining room for the song. That gives us a four-minute window of total silence in the gift zone." "Four minutes? To clear forty wrapped boxes?" "That’s why we’re using the 'Toddler Tactic.'" Bill frowned. "The what?" "We dress as oversized mascot characters. I’m a giant pink panda, you’re a blue dinosaur. We walk in, we pretend to be the hired entertainment, we scoop the presents into oversized laundry bags, and we waltz out the front door." "I am not wearing a dinosaur suit, Molly." "It’s a high-quality plush, Bill. It’s breathable." "I have a reputation in the underworld. If the Fence hears I was spotted in a blue dinosaur costume, I’ll be the laughingstock of the tri-state area." "Do you want the Lego Star Destroyer or do you want your dignity? Because you can't have both." Bill sighed, leaning over the map. "What about the security?" "A Golden Retriever named Buster. He’s easily bribed with organic peanut butter." "And the children? What if one of them wanders back for a bathroom break?" "I’ve factored that in. I’ll carry a handful of high-sugar lollipops. One pop and they’ll be too distracted by a glucose spike to notice a dinosaur stealing a Jeep." "And the getaway?" "A white minivan. It blends in. I’ve already stolen the plates from a soccer mom in the next neighborhood." "I hate you," Bill muttered. "What size am I in the suit?" "Extra large. You’ve been hitting the pasta too hard." "I'm muscular! It's bulk!" "Sure it is, Dino-Bill. Now, let's practice the 'Happy Birthday' distraction. On three." "I'm not singing, Molly." "Sing, Bill, or I'm taking the drones for myself." "Happy birthday to you," Bill droned miserably. "Now give me the peanut butter."
created_at
"2026-05-10T12:09:02.905Z"
5/10/2026, 12:09:02 PM
model
"google/gemma-4-31b-it-20260402"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
149
moderation_latency
(null)
generation_time
20339
tokens_prompt
77
tokens_completion
656
native_tokens_prompt
79
native_tokens_completion
626
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
service_tier
(null)
usage
0.00024815
router
(null)
provider_responses
0
endpoint_id
"4149ab03-3b69-45a0-b94c-6c253b8641a1"
id
"chatcmpl-RKmFeKIWMDCwhq2hLuNsOyQl"
is_byok
false
latency
59
model_permaslug
"google/gemma-4-31b-it-20260402"
provider_name
"DeepInfra"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778414942-Ke4ianMbX98MCFPl1XUq"
session_id
(null)
api_type
"completions"
id
"gen-1778414942-Zib3eGwPbH8uwITRDeu0"
upstream_id
"chatcmpl-RKmFeKIWMDCwhq2hLuNsOyQl"
total_cost
0.00024815
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
398
0.00%
Dialogue to Total Word Ratio
Ratio: 91.87%, Deviation: 21.87%
neededClean
false
wordsTotal
406
wordsDialogue
373
0.0000%