NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR8ZBGSR04CDM6P095MQBS1C
Run Details
Model:
Gemma 4 31B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
18.8s
Cost:
$0.000223
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"The target is a five-year-old’s birthday bash at the Miller residence," Molly whispered, sliding a blueprint of a suburban living room across the table. Bill squinted at the drawing. "Why is there a giant red circle around the cake table?" "That’s the diversion, Bill. While the children are in a sugar-induced frenzy, we strike the gift pile." "The gift pile? Molly, we’re professionals. We’ve hit banks. We’ve hit galleries. Now we’re hitting a pile of Lego sets and My Little Ponies?" "It’s about the volume, Bill! Do you know the resale value of a limited edition robotic dinosaur? It’s gold. Plastic, neon-colored gold." "I feel my dignity slipping," Bill sighed. "What’s the entry point?" "The laundry chute. It leads directly into the mudroom." "I’m too wide for a laundry chute." "Then you’re the lookout. You’ll be disguised as a giant inflatable giraffe." Bill froze. "I will not wear the giraffe." "The theme is 'Safari Adventure,' Bill. If you’re a human man standing in a driveway, you’re suspicious. If you’re a seven-foot yellow ungulate, you’re part of the decor." "Will I have a hat?" "You’ll have a pith helmet strapped to the giraffe’s head. Now, timing. The 'Pin the Tail on the Donkey' game starts at two. That’s when the parents are most distracted by the chaos." "What about the security?" "The dog is a golden retriever with a penchant for tennis balls. One throw and he’s neutralized for ten minutes." "And the kids? They’re witnesses, Molly. Tiny, screaming witnesses." "I’ve accounted for that. I’m bringing a bag of high-grade organic gummy bears. We lure them into the backyard with a trail of treats, and then we swoop in for the loot." "What if one of them follows us?" "Then we give them a small toy car as hush money. It’s the code of the street." "I can't believe this is my life. I had a degree in accounting." "And now you have a degree in giraffe-ing. Do we have a deal?" "Fine. But I want a cut of the dinosaur sales." "Deal. Now, let's practice your 'silent giraffe' walk." "I hate you so much." "Love you too, Bill. Now, put on the costume."
created_at
"2026-05-10T12:56:19.52Z"
5/10/2026, 12:56:19 PM
model
"google/gemma-4-31b-it-20260402"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
111
moderation_latency
(null)
generation_time
18708
tokens_prompt
77
tokens_completion
581
native_tokens_prompt
79
native_tokens_completion
561
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
service_tier
(null)
usage
0.00022345
router
(null)
provider_responses
0
endpoint_id
"4149ab03-3b69-45a0-b94c-6c253b8641a1"
id
"chatcmpl-RPgFzziZT8XCFUt6EQF7k3Xx"
is_byok
false
latency
52
model_permaslug
"google/gemma-4-31b-it-20260402"
provider_name
"DeepInfra"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778417779-KwBtKd7lK4z3bdBZ1ET7"
session_id
(null)
api_type
"completions"
id
"gen-1778417779-4ZBuxO8i7FL69pSsC0ki"
upstream_id
"chatcmpl-RPgFzziZT8XCFUt6EQF7k3Xx"
total_cost
0.00022345
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
363
0.00%
Dialogue to Total Word Ratio
Ratio: 94.05%, Deviation: 24.05%
neededClean
false
wordsTotal
370
wordsDialogue
348
0.0000%