NC Bench
Overview
About
Tests
Models
Model
GPT-4.1
Test
Dialogue tags
Scenario
Write 500 words with 50% dialogue
Run ID
2025-04-14-01JRTSXCBVDQ8QDTVR3R9J80EJ
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly pressed her face to the smudged window, peeking through the curtain of dinosaur streamers. “All right, Bill, you ready? There’s a balloon artist at three o’clock.” Bill crouched behind the plastic pink flamingo, squinting. “Which three o’clock? Yours or mine? You know I’m all turned around.” “Bill, focus. My three o’clock, inside. He’s got a hat made entirely of rubber chickens and hope. We wait ‘til he starts on the poodles.” “Copy that. Poodles are distraction gold. Now, do we hit the gift table from the north or the east?” Molly snorted. “If you say north one more time, I will staple a party hat to your forehead. We go in through the back gate, by the bouncy castle.” “Bouncy castle?” Bill’s eyes widened. “Molls, you know how I get around inflatable structures. Last time…” “I do. That’s why you’re staying upright and keeping off the moonwalk. Last I checked, Pennywise isn’t invited.” “Can’t promise. I’ve seen some weird clowns so far.” Bill rubbed his hands together. “Remind me—what’s the haul?” Molly pointed with her chin. “Three large boxes. Suspected LEGO, assorted plushies, and at least two remote-controlled vehicles.” “And the gift bags?” “Bill, we don’t stoop to gift bags. That’s for common thieves.” “Right, right,” Bill whispered. “Apologies to the league of extraordinary larcenists.” Molly peered at the plastic table arrayed with presents. “Okay, on my mark, we blend. You grab two party hats. I’ll handle the face paints behind the fence.” “Wait,” Bill said, “I don’t do face paint.” “You do today. Last time you said no, you got mistaken for the magician and I had to hide in a cupboard with two goldfish and a dove.” He frowned. “Those goldfish had it out for me, I swear.” “Less talking, more party hats. Once we’re in, you move left. I’ll charm Susie with the one eye and a grip like a lobster.” Bill mumbled, “I still have scars.” “That’s the price of the craft. Once you hit the punch bowl, distract the grownups.” “How?” Bill furrowed his brow. “I can’t make conversation about Paw Patrol again, Molls. Last time nearly did me in.” She grinned. “Just nod, Bill. Everyone loves a yes man at a child’s birthday. I’ll sweep the gift table, you back me up.” “And if anyone asks why we’re there?” Molly snapped her suspenders. “We’re distant relatives from Great Nana Edna’s triplets’ side. Everyone forgets family at a sixth birthday party.” “You thought of everything, didn’t you?” She smiled wickedly. “I even brought a card. ‘Happy Birthday, Timmy! Love, Bill and Molly from Wisconsin.’” Bill blinked. “We’re from Wisconsin now?” “Only until we clear the perimeter.” He nodded. “After that, we vanish.” “Like the wind, Bill. Like a slightly birthday-cake-scented wind.” They waited in silence as Happy Birthday wailed from a tinny speaker. Then Molly grinned. “Go time.” Bill straightened his paper hat. “Let’s steal some childhood memories.” Molly elbowed him. “Just the presents.” He shrugged. “Right. Strictly the presents.”
id
"gen-1744656315-Pg0YSetbFOlnOahl2Tcm"
upstream_id
"chatcmpl-BMJ5v13M7MAOk203ABFP1Um4McVes"
total_cost
0.006078
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:45:25.092182+00:00"
Monday, April 14th, 2025 at 6:45:25 PM GMT+00:00
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
481
moderation_latency
116
generation_time
8287
tokens_prompt
68
tokens_completion
742
native_tokens_prompt
67
native_tokens_completion
743
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.006078
Evaluation details
Result
Evaluator
Details
Meta Data
100.0000%
Matches word count
n/a
neededClean
false
words
500
0.0000%
Dialogue to Total Word Ratio
Ratio: 80.60%, Deviation: 30.60%
neededClean
false
wordsTotal
500
wordsDialogue
403
50.0000%