NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR90JNRWAZQC3541QA61RANF
Run Details
Model:
Gemma 4 31B (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
2m 42s
Cost:
$0.001059
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
5.08%
Dialogue to Total Word Ratio
(details)
2.5412%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the blueprints across the stained card table, her finger tapping a red circle around the buffet table. The room smelled of damp laundry and desperation. She had spent three days surveilling the Miller residence, noting every movement of the golden retriever and the exact timing of the cake arrival. Her eyes were bloodshot from lack of sleep, fueled by a dangerous mixture of espresso and ambition. "The bouncy castle is our primary obstacle," Molly whispered. Bill leaned in, scratching his chin. "Since when is a blow-up slide an obstacle?" "It’s not about the jump, Bill; it’s about the noise," she snapped. "One misplaced step on that vinyl and you'll sound like a herd of elephants. We enter through the sliding glass door during the peak of the chaos—right when the piñata breaks." Bill looked at the map and then back at Molly, his expression one of profound skepticism. He had robbed banks and jewelry stores, but the prospect of infiltrating a suburban living room filled with toddlers felt strangely more perilous than a vault in Zurich. "Are we really doing this? What's the payout on a Lego set?" "It’s not just Lego, you philistine," Molly hissed. "There are high-end electronic tablets and limited edition collectibles. We hit the gift pile and vanish before the first sugar crash hits." She pulled out two oversized neon-colored jumpsuits. They were designed to blend in with the party theme, which involved a very specific shade of lime green. Bill stared at the fabric as if it were a hazardous material. "I'm not wearing that," Bill said firmly. "You'll wear it or you'll be the lookout in a tuxedo," Molly countered. "The theme is Neon Jungle. If you look like a confused adult in a suit, you're a target. If you look like a very large, very sweaty uncle in a lime jumpsuit, you're invisible." Bill sighed, the sound of a man who had long ago surrendered his dignity for a cut of the take. He traced the route to the living room with a calloused finger, imagining the logistical nightmare of hauling forty wrapped boxes through a kitchen filled with spilled juice. "And the dog?" he asked. "I've got a bag of premium organic liver treats," Molly said. "The beast will be neutralized within seconds. We move at 14:00 hours." Bill picked up the neon fabric, feeling the cheap polyester cling to his skin. He imagined himself sliding across a hardwood floor, clutching a handful of Barbie Dreamhouses, while a dozen screaming children looked on in confusion. It wasn't the high-stakes heist he had dreamed of as a youth, but the profit margins on pre-owned toys were surprisingly stable in the current market.
created_at
"2026-05-10T13:17:42.564Z"
5/10/2026, 1:17:42 PM
model
"google/gemma-4-31b-it-20260402"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
223
moderation_latency
(null)
generation_time
162403
tokens_prompt
77
tokens_completion
2920
native_tokens_prompt
82
native_tokens_completion
2760
native_tokens_completion_images
(null)
native_tokens_reasoning
2233
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
service_tier
(null)
usage
0.00105946
router
(null)
provider_responses
0
endpoint_id
"4149ab03-3b69-45a0-b94c-6c253b8641a1"
id
"chatcmpl-RV5TRBRm3Knn2Jt8qErMWtgU"
is_byok
false
latency
59
model_permaslug
"google/gemma-4-31b-it-20260402"
provider_name
"DeepInfra"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778419062-7A3qbPOsXStfbw86U5e8"
session_id
(null)
api_type
"completions"
id
"gen-1778419062-MRBV89xx755xPbf6ocTT"
upstream_id
"chatcmpl-RV5TRBRm3Knn2Jt8qErMWtgU"
total_cost
0.00105946
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
452
5.08%
Dialogue to Total Word Ratio
Ratio: 37.72%, Deviation: 7.72%
neededClean
false
wordsTotal
456
wordsDialogue
172
2.5412%