NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR8XVG7KKSPAWEWEWT5T0RVH
Run Details
Model:
Gemma 4 31B (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
8m 50s
Cost:
$0.004818
Provider:
Together
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
96.65%
Dialogue to Total Word Ratio
(details)
48.3243%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the crayon-drawn map across the dashboard of the rust-bucket van, pinning the corners down with a half-eaten bagel and a stolen stapler. The dashboard was a graveyard of old receipts and sticky candy wrappers, shimmering under a layer of summer dust. The vehicle, a faded beige monstrosity with a sliding door that screeched like a dying hawk, smelled faintly of old gym socks and desperation. Bill sat beside her, methodically chewing through a lukewarm corn dog with a level of concentration that was almost impressive. He stared at the map with a furrowed brow, his gaze lingering on a particularly crude drawing of a small, angry-looking dog that looked more like a potato with teeth. "The target is the living room," Molly said, tapping a circle around a cluster of rectangles. "Why is there a dinosaur on the eastern flank?" Bill asked, looking at the map and then back at the suburban house across the street. "That is the golden retriever, Bill," she replied. They were targeting the seventh birthday party of a child named Timmy, a boy whose parents apparently possessed an alarming amount of disposable income and a total lack of supervision. The haul was legendary among the low-rent criminal underworld, a circle of thieves who mostly specialized in stealing lawn ornaments, consisting of three massive Lego sets, including the rare starship edition, a miniature electric car with working headlights, and a rumored next-gen gaming console. It was the kind of score that could fund their operation for a month, or at least pay off their outstanding tab at the local taco stand. "We enter through the sliding glass doors at fourteen hundred hours," Molly continued. "Once the cake is served, the sugar rush will peak, and we strike." Bill wiped a smear of mustard onto his cargo pants. "What about the balloons? They are tactical hazards." Molly produced a pair of surgically sharpened sewing needles from her sleeve. "We will use the silent pop technique," she said. "The presents are clustered here," she continued, pointing to a red X. "We bag them and exit via the laundry chute." Bill frowned, glancing at the size of the ride-on car in the window. "Is the laundry chute actually big enough for a motorized vehicle?" "We will disassemble it on the fly, Bill. Try to keep up," Molly snapped. He sighed, leaning back into the cracked vinyl seat. "I still think we should have gone for the jewelry store." "Jewelry stores have lasers, Bill. This place has a bouncy castle," she countered. Molly checked her tactical watch, noting that the clown, a man known locally as Giggles the Terrible, would arrive in ten minutes to serve as their signal. Bill muttered about his deep-seated hatred for clowns while Molly confirmed that the industrial-grade glitter bombs, filled with a blend of holographic silver and neon pink, were prepped and ready to cause a shimmering disaster. "This is the biggest score of the season," Molly said. "If a kid catches us, we are the gift delivery squad." Bill looked at his oversized sneakers. "I do not think I look like an elf." "You look like a very confused elf," she said. "Now, put on the costume."
created_at
"2026-05-10T12:30:06.074Z"
5/10/2026, 12:30:06 PM
model
"google/gemma-4-31b-it-20260402"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
948
moderation_latency
(null)
generation_time
530223
tokens_prompt
77
tokens_completion
9970
native_tokens_prompt
81
native_tokens_completion
9603
native_tokens_completion_images
(null)
native_tokens_reasoning
9152
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
service_tier
(null)
usage
0.0048177
router
(null)
provider_responses
0
endpoint_id
"4149ab03-3b69-45a0-b94c-6c253b8641a1"
is_byok
false
latency
72
model_permaslug
"google/gemma-4-31b-it-20260402"
provider_name
"DeepInfra"
status
429
1
endpoint_id
"9d4644e9-a5e5-41f4-b810-d75f484d2c24"
is_byok
false
latency
31
model_permaslug
"google/gemma-4-31b-it-20260402"
provider_name
"Parasail"
status
429
2
endpoint_id
"576fca5b-f09c-4bda-b560-9989c39ac3ab"
id
"oiXYv8s-zfFMX-9f98f408c747c4d5"
is_byok
false
latency
912
model_permaslug
"google/gemma-4-31b-it-20260402"
provider_name
"Together"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778416206-PSJVr6t0GRaDBeIyAVHm"
session_id
(null)
api_type
"completions"
id
"gen-1778416206-3094UTmBXhJ4t1cV1Ncr"
upstream_id
"oiXYv8s-zfFMX-9f98f408c747c4d5"
total_cost
0.0048177
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Together"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
536
96.65%
Dialogue to Total Word Ratio
Ratio: 29.17%, Deviation: 0.83%
neededClean
false
wordsTotal
545
wordsDialogue
159
48.3243%