NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR8YNRD8TFF58FFJPQZVSZ3A
Run Details
Model:
Gemma 4 31B (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
7m 30s
Cost:
$0.004151
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
99.92%
Matches word count
(details)
96.17%
Dialogue to Total Word Ratio
(details)
98.0432%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the blueprint across the stained folding table, her eyes narrowing at the precise drawing of a modest suburban ranch-style home. She tapped a red circle around the living room with an intensity that bordered on the religious. "The target is the living room," she whispered. Bill scratched his chin, leaning back in a creaky plastic chair and looking utterly skeptical of the entire operation. He shifted his gaze from the map to Molly. "Why are we stealing from a seven-year-old?" Molly snapped, her finger stabbing the paper where the present pile was marked with a gold star, as if the star were a target in a war room. "It is not about the age, Bill, it is about the volume. Timmy-versary is a very high-yield event. We are talking Lego sets, remote-controlled drones, and a gaming console." Bill leaned in, squinting at the architectural layout of the house and the proximity of the kitchen, wondering how they had gotten into this. "What about the security?" Molly explained with a smug, knowing smirk that suggested she had already won the battle. "A golden retriever with a mild gluten allergy and a mother who is obsessed with organic juice boxes. We strike during the cake phase. That is when the chaos peaks." Bill looked confused, blinking slowly. "The cake phase?" She began to pace the small room, her boots clicking on the concrete floor. "Precisely. Once the sugar hits their systems, the children become a screaming entity. The parents will be too busy preventing a living room riot to notice two adults in inflatable dinosaur suits slipping through the patio doors." Bill stared at her in absolute horror, imagining himself encased in nylon. "I am not wearing a dinosaur suit." Molly groaned, throwing her hands up in frustration. "It is the perfect camouflage, Bill! No one suspects a T-Rex of grand larceny." Bill countered immediately, his voice rising in a pitch of desperation. "People would definitely suspect a T-Rex of grand larceny." Molly sighed and rubbed her temples, trying to maintain her professional composure. "Fine. You can be the Stegosaurus. We have to be synchronized. I enter from the north, you secure the perimeter near the gift table." Bill worried aloud, shifting his weight. "What if the dog barks?" She patted her tactical vest with a confident grin. "I have hypoallergenic treats. One piece of dried liver and that dog will forget his own name." Bill looked at the blueprints one last time. "Where is the loot going?" Molly smiled, her eyes gleaming with anticipation. "The modified strollers. Load the presents, push them out the back alley, and vanish." Bill muttered, shaking his head. "This feels wrong." Molly insisted coldly. "It is a victimless crime. The kids will get them eventually. We are providing a lesson in early childhood loss." Bill sighed, completely defeated by her logic. "I think we are just stealing from a kid, Molly." She pointed emphatically toward the door. "Details, Bill! Put on the tail. Leave in ten minutes."
created_at
"2026-05-10T12:44:26.415Z"
5/10/2026, 12:44:26 PM
model
"google/gemma-4-31b-it-20260402"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
272
moderation_latency
(null)
generation_time
450082
tokens_prompt
77
tokens_completion
9837
native_tokens_prompt
82
native_tokens_completion
10896
native_tokens_completion_images
(null)
native_tokens_reasoning
9022
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
service_tier
(null)
usage
0.00415114
router
(null)
provider_responses
0
endpoint_id
"4149ab03-3b69-45a0-b94c-6c253b8641a1"
id
"chatcmpl-RQ9Jl6TJLFAbgEf5dhw1D6Jl"
is_byok
false
latency
57
model_permaslug
"google/gemma-4-31b-it-20260402"
provider_name
"DeepInfra"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778417066-2fHETJ3yqQGWpocHv5aG"
session_id
(null)
api_type
"completions"
id
"gen-1778417066-VLj1XOuxLlLEZVQIuvKY"
upstream_id
"chatcmpl-RQ9Jl6TJLFAbgEf5dhw1D6Jl"
total_cost
0.00415114
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
99.92%
Matches word count
n/a
neededClean
false
words
503
96.17%
Dialogue to Total Word Ratio
Ratio: 50.88%, Deviation: 0.88%
neededClean
false
wordsTotal
509
wordsDialogue
259
98.0432%