NC Bench
Overview
About
Tests
Models
Model
GPT-4.1
Test
Dialogue tags
Scenario
Write 500 words with 70% dialogue
Run ID
2025-04-14-01JRTSW88XA0PADN4QH96CHHYV
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Okay, Bill, run it by me again. What’s our approach?” “Simple. We enter disguised as clowns. No one suspects clowns.” “I’m six feet tall, have a scar down my chin, and am allergic to balloons. You think I’ll pass as a clown?” “Sure you will, Molly. Just frown less. And wear this wig.” “This thing looks like the dog chewed on it, then spat it out.” “Authentic. Trust me, the showbiz types call it character.” “Bill, will you focus? What entrance are we using?” “The side gate. Parents drop their little monsters round the front, so the side’s clear.” “And after we enter?” “You take the left. Distract the kids with balloon animals. I take the right, load up the duffel with the presents.” “Bill, my balloon animals look like roadkill. You remember what happened at that zoo?” “We don’t talk about the zoo, Molly. But these are kids—three feet tall, still think Santa delivers by Uber. They’ll buy whatever you’re selling.” “You’re going to owe me. If a balloon bursts and I sneeze?” “That’s what the confetti is for. Distraction. Plus you’ll be wearing a red nose.” “Bill, the last time you said ‘it’s just a little red nose’, it glued itself to my face.” “That was different glue! This is clown-grade.” “Fine. Who’s our lookout?” “I asked Uncle Jim.” “Uncle Jim’s got one eye and dreams about the Queen chasing him with a spatula. We want someone less…distractible.” “He’ll be fine. He’s stationed by the ice cream truck. Nobody ignores the ice cream guy.” “What’s the signal if something goes wrong?” “I’ll make the ‘honk-honk’ noise. Like a clown horn.” “Subtle. Real subtle, Bill.” “Look, we have to blend. Honking’s expected.” “And if the parents catch on?” “I’ll improvise. I’m workshopping a bit about flatulent giraffes.” “Bill, I’m only agreeing because I love presents. And hate kids.” “Exactly. So you want the wind-up dinosaur, the rubber slime, any of those dumb books?” “No collectibles. Only the gift cards.” “Gift cards? You think these kids get cards?” “They have better taste than you think. Claire’s, Target, Starbucks. Yesterday, my nephew bought himself a latte.” “Okay, so: you grab cards, I grab the noisy toys for resale. We bail over the fence.” “What if we’re chased?” “We have a getaway tricycle, spray-painted black, parked in the alley.” “A tricycle? You said you’d rent a scooter!” “Budget cuts. But that’s why I got us capes. Capes add velocity!” “Bill, if you make me wear a cape and pedal a trike, I swear—” “Better than being nicked by a herd of juice-box-fueled preschoolers.” “How long is this going to take?” “Fifteen minutes tops, unless you’re slow at twisting balloons.” “Balloon twisting is an art. I’m a thief, not a magician!” “Well, today you’re both. Ready, Molly?” “Ready as I’ll ever be.” “Let’s put the ‘party’ in ‘party heist.’” “Bill, try not to eat all the cake this time.” “I make no promises. Now smile—clown style.” “You realize if we pull this off, we’re legends…” “…with sprinkles on top.”
id
"gen-1744656278-yDukJOwoPGF6tPSWYcfW"
upstream_id
"chatcmpl-BMJ5KtzjCMhVMxsxDJjtZiSVTdUXl"
total_cost
0.006102
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:44:45.734224+00:00"
Monday, April 14th, 2025 at 6:44:45 PM GMT+00:00
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
379
moderation_latency
137
generation_time
5906
tokens_prompt
68
tokens_completion
745
native_tokens_prompt
67
native_tokens_completion
746
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.006102
Evaluation details
Result
Evaluator
Details
Meta Data
75.1556%
Matches word count
n/a
neededClean
false
words
513
0.0000%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 30.00%
neededClean
false
wordsTotal
513
wordsDialogue
513
37.5778%