NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-05-10-01KR8YRQSMYB0EPZGR0ZE4J1G7
Run Details
Model:
Gemma 4 31B
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
9.1s
Cost:
$0.000116
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
6.09%
Matches word count
(details)
36.79%
Dialogue to Total Word Ratio
(details)
21.4393%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"The bounce house is the blind spot, Bill." "Why the bounce house?" "Because parents hate the bounce house. They stand ten feet away, staring at their phones, praying for the cake to be served." "I can't fit in a giant inflatable castle, Molly. I’m a broad man." "Wear the tutu. It distracts them." "The tutu?" "Yes, the pink one. You'll blend in with the chaos. I’ll handle the gift table while you act as the getaway driver for the toy wagon." "Is the wagon reinforced?" "Steel axles. We can haul forty Lego sets and a motorized pony without snapping the frame." "What about the clowns?" "Ignore the clowns. They're freelancers; they don't care about the loot as long as they get paid." "And the sugar-high toddlers?" "That's the tactical advantage. Once the cupcakes hit, they'll create a perimeter of screaming anarchy. We slip in, bag the presents, and vanish before the piñata bursts." "Do we get cake?" "Focus, Bill." "I just think it's a waste of a good party." "Tutu. Now." "Fine. But I'm keeping the pony."
created_at
"2026-05-10T12:46:04.093Z"
5/10/2026, 12:46:04 PM
model
"google/gemma-4-31b-it-20260402"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
113
moderation_latency
(null)
generation_time
8956
tokens_prompt
77
tokens_completion
286
native_tokens_prompt
79
native_tokens_completion
278
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
service_tier
(null)
usage
0.00011591
router
(null)
provider_responses
0
endpoint_id
"4149ab03-3b69-45a0-b94c-6c253b8641a1"
id
"chatcmpl-RLycZLIFrIPsn0ZX82YCW4gG"
is_byok
false
latency
43
model_permaslug
"google/gemma-4-31b-it-20260402"
provider_name
"DeepInfra"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1778417164-2QXNTcQwAtrlQ26dtr5R"
session_id
(null)
api_type
"completions"
id
"gen-1778417164-t89FSt3ZFPfRGBTpKtam"
upstream_id
"chatcmpl-RLycZLIFrIPsn0ZX82YCW4gG"
total_cost
0.00011591
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
6.09%
Matches word count
n/a
neededClean
false
words
177
36.79%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
178
wordsDialogue
178
21.4393%