NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR91BGXRPBT17H0M7DN290DV
Run Details
Model:
Gemma 4 31B (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
4m 32s
Cost:
$0.001229
Provider:
Together
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Right, focus. The layout is simple. Entrance through the patio, pivot left past the bounce house, and the haul is right there by the cake, Molly said, tapping a crude map. Why the bounce house? Can't we just walk around it? Bill asked. It is a tactical diversion, Bill. If we are seen, we blend in with the chaos. I am not blending in with a bunch of six-year-olds. I am six-foot-four. Wear the oversized clown shoes I bought. It is a themed party. I am not wearing clown shoes. Do you want the LEGO sets or not? The limited edition Star Wars ones are in that pile. Fine. But I am keeping the red nose. No noses. We need to be stealthy. Now, timing. The Piñata Window is our only shot. When the kids are fighting over candy, we strike. What if the piñata does not break? Then we wait. We do not engage until the sugar crash hits. Once they are all nap-trapped, we clear the table in thirty seconds. Thirty seconds? That is a lot of wrapping paper to move. That is why we use the vacuum-seal bags. We suck the air out, stack them, and vanish. It is surgical. I thought we were being clowns. The clowning is the cover, Bill! The vacuuming is the heist! Can I at least take one toy? Just a small one? Maybe a little car? Absolutely not. We are professionals. We sell the lot to the fence in Jersey. But the fence hates LEGOs. He loves money, Bill. Now, let us discuss the Juice Box Perimeter. Is that where we hide the getaway car? No, that is where the parents congregate. We avoid the juice boxes at all costs. One conversation about mortgage rates and we are compromised. I do not get why this is so hard. It is just a living room. It is not a living room, Bill. It is a minefield of sticky fingers and high-pitched screaming. It is the most dangerous environment known to man. I have done a bank job in downtown Chicago. Did the bank have a toddler with a sticky lollipop and a grudge? I think not. Fair point. Do the shoes come in a size twelve? They are one size fits all, Bill. Now, let us run the drill again. From the patio, on my mark. And what about the dog? Bill asked. A golden retriever. Harmless, unless it smells the ham sandwiches in your pocket. I am bringing ham sandwiches? You are on a six-hour stakeout, Bill. Of course you are bringing sandwiches. I still think the nose would be a good touch. Put the shoes on, Bill.
created_at
"2026-05-10T13:31:16.8Z"
5/10/2026, 1:31:16 PM
model
"google/gemma-4-31b-it-20260402"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
453
moderation_latency
(null)
generation_time
271664
tokens_prompt
77
tokens_completion
2361
native_tokens_prompt
81
native_tokens_completion
2425
native_tokens_completion_images
(null)
native_tokens_reasoning
1719
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
service_tier
(null)
usage
0.0012287
router
(null)
provider_responses
0
endpoint_id
"576fca5b-f09c-4bda-b560-9989c39ac3ab"
id
"oiXsjBK-zfFMX-9f994da615853a6b"
is_byok
false
latency
279
model_permaslug
"google/gemma-4-31b-it-20260402"
provider_name
"Together"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778419876-S3jSw0umiX0pSe70nvFQ"
session_id
(null)
api_type
"completions"
id
"gen-1778419876-3jrI3hOj8eVyS1Y55frt"
upstream_id
"oiXsjBK-zfFMX-9f994da615853a6b"
total_cost
0.0012287
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Together"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
447
0.00%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 70.00%
neededClean
false
wordsTotal
455
wordsDialogue
0
0.0000%