NC Bench
Overview
About
Tests
Models
Model
GPT-4.1
Test
Dialogue tags
Scenario
Write 500 words with 70% dialogue
Run ID
2025-04-14-01JRTSY1CNKAWXXQ9WME9S1DV2
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Alright, Bill, go over it again. How are we getting past the clown?" "Molly, for the last time, the clown’s on stilts. We wait till he’s handing out balloon swords, then slip by under his stilty legs. Easy." "Under the stilts? I’m five-foot-nine, Bill." "You crouch. You’ve crouched before! Remember the wedding cake caper?" "That ended with you covered in marzipan. Also, these are children. They are notoriously sticky." "That’s why we’re wearing gloves." "Purple gloves?" "They were on sale, Molly." "They say 'I heart Grandma.'" "They were VERY on sale." "Fine. So we’re through the clown’s legs, into the backyard. What’s next?" "We avoid the face-painting station. Last time you got a unicorn on your cheek and lost all professionalism." "I looked majestic. But focus, Bill, the presents. Location?" "Under the pergola, by a very aggressive bouncy castle. I scoped it out while pretending to be the magician’s assistant. Did you know little Timmy throws hard with those plastic apples?" "I know, because he threw one at my head when I was juggling fake jewels. But what about the cake watch?" "Cake watch?" "You know, the one parent who hovers by the cake like a security guard because 'Susie can’t have chocolate.'" "Ah, the Cake Warden. That’s Mrs. Blodgett. She’s got a whistle and everything." "Distract her with gluten. Wave a breadstick—tell her it’s wheatless." "I’m not sure I can confidently say ‘wheatless’ with a straight face." "I’ll do it. You snatch the presents." "How many presents are we talking?" "Twenty-six. Six are obviously socks, so we skip those." "Socks are easy to fence!" "Socks are never easy to fence, Bill! You tried last Easter." "Moving on. Presents into the bag—purple, matches the gloves. We exit how?" "Same way we came, or…bouncy castle jump. Over the fence." "Last time you bounced, you sprained an ankle." "Last time, the castle wasn’t dinosaur-shaped. Dinosaurs are sturdier." "You’re inventing that fact, Molly." "I stand by it. Alright, backup plan. If caught?" "We pretend we're the birthday fairy and her assistant, Sir Wrapping Paper. I’ve got a hat." "Bill, that hat says 'Retiree of the Year.'" "Again, very on sale! It’s festive. Look, if all else fails, we yell 'Parrot emergency!' and run for it." "How many parrots do you think kids have, Bill?" "It’s the element of surprise. No one expects yelling about parrots." "You know what? Good enough. So—gloves, bag, gluten stick, hat. Bouncy dinosaur. Parrot emergency. Ready?" "Ready. Molly?" "Yeah?" "If you see a kid named Betsy, duck. She bites." "Understood. Clown, pergola, cake, presents. Operation Party Swipe is a go?" "Go! And Molly?" "Yes, Bill?" "Try not to eat any cake this time." "No promises, Bill. Absolutely none."
id
"gen-1744656336-jrI4N0vk2ZTpcJaxfS8v"
upstream_id
"chatcmpl-BMJ6H8I9rOaXq8axHSJunXVDefLIv"
total_cost
0.00567
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:45:52.086723+00:00"
Monday, April 14th, 2025 at 6:45:52 PM GMT+00:00
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
536
moderation_latency
229
generation_time
13821
tokens_prompt
68
tokens_completion
692
native_tokens_prompt
67
native_tokens_completion
692
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00567
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
457
0.0000%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 30.00%
neededClean
false
wordsTotal
457
wordsDialogue
457
0.0000%