NC Bench
Overview
About
Tests
Models
Model
Gemini 2.5 Pro
Test
Dialogue tags
Scenario
Write 500 words with 70% dialogue
Run ID
2025-11-18-01KACPATP309EJ7F69B94E665D
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly tapped a manicured fingernail on the crinkled blueprint spread across the van’s dashboard. “Okay, look here. This is the layout.” Bill squinted, his focus glued to a crudely drawn shape in the backyard. “Is that a bouncy castle?” “Focus, Bill. The bouncy castle is irrelevant.” “It doesn’t look irrelevant. It looks exceptionally bouncy. An obstacle, for sure.” “The target,” she said, circling a spot in the living room with a red pen, “is here. The ‘Present Pile,’ as designated by my intel.” “Your ‘intel’ is a seven-year-old named Timmy who you bribed with a fudge pop.” “He was a reliable source. Now, at 1400 hours, the magician starts. That’s our window. The children will be mesmerized by cheap tricks and questionable animal handling.” “Ooh, a magician! Think he can pull a rabbit out of a hat?” “We are not staying to watch the magic, Bill. We are the distraction.” He looked at her with a profound sense of dread. “What does that even mean?” “It means you put on the suit.” “Not the suit. Molly, please. Anything but the suit.” The costume in question was slumped in the back of the van, a heap of matted faux-fur and quiet desperation. “The Giggle-Me-Gus the Gorilla suit is our only way in. No one questions a party gorilla.” “It smells like despair and old bananas. And the zipper gets stuck around the hindquarters. You know that.” “You’ll be a triumph. You go in, you juggle, you make some balloon animals.” “I don’t know how to juggle. The one time I tried, I broke your grandmother’s urn.” “Then just fall over a lot. Kids find that hysterical. While you’re causing a slapstick diversion, I slip in through the kitchen pantry.” “The kitchen?” His eyes lit up. “Will there be those tiny hot dogs on sticks?” Molly pinched the bridge of her nose. “No tiny hot dogs! Just presents. We grab the big ones first—LEGO sets, drones, anything that says ‘Requires Adult Assembly’—and we’re out before they even start singing ‘Happy Birthday’.” “But what if they have cake? I can’t do gorilla antics on an empty stomach. It’s bad for my character’s motivation.” “Your character is a sweaty man in a gorilla costume. His motivation is a paycheck, which you will not get if you eat the evidence.” “Seems harsh. I think Gus deserves a little slice of funfetti for his troubles.” She ignored him, pointing to her own sleek, black watch. “After the job, you can buy your own cake. Now, synchronize.” Bill held up his wrist, revealing a plastic watch with a cartoon dog on its face. “Mine stopped working last Tuesday.” She sighed, the sound of a thousand brilliant plans being ruined by one simple man. “Just get in the gorilla suit, Bill.” “Fine. But if I see a piñata, I’m going rogue. I’m telling you that right now.”
created_at
"2025-11-18T23:56:15.327445+00:00"
Tuesday, November 18th, 2025 at 11:56:15 PM GMT+00:00
model
"google/gemini-2.5-pro"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1746
moderation_latency
(null)
generation_time
22864
tokens_prompt
68
tokens_completion
1250
native_tokens_prompt
67
native_tokens_completion
2500
native_tokens_completion_images
0
native_tokens_reasoning
1810
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"STOP"
usage
0.02508375
api_type
"completions"
id
"gen-1763510151-piQD1G9PD6odK3hvDV2h"
upstream_id
(null)
total_cost
0.02508375
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Google AI Studio"
Evaluation details
Result
Evaluator
Details
Meta Data
60.2752%
Matches word count
n/a
neededClean
false
words
485
34.0923%
Dialogue to Total Word Ratio
Ratio: 74.64%, Deviation: 4.64%
neededClean
false
wordsTotal
485
wordsDialogue
362
47.1838%