NC Bench
Overview
About
Tests
Models
Model
GPT-4.1
Test
Dialogue tags
Scenario
Write 500 words with 70% dialogue
Run ID
2025-04-14-01JRTST5EBM26PBAWW8SQ19B35
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“What’s the strategy, Bill? We don’t look like magicians, rabbits, or clowns, so unless you’re hiding a red nose in your trench coat—” “I told you, Molly, party entertainment’s booked solid. The best we can do is, ‘surprise guest appearance of your distant, moderately likable aunt and her weird boyfriend.’ Works. At least my boyfriend act is ace.” “You? Even pretending to date me, you’d trip over your lines. But fine. Auntie and boyfriend. Got the wigs?” “I got wigs, big sunglasses, and a badge that says ‘World’s Okayest Relative.’ Plus, an ‘I Heart Cake’ button. Commitment to the role.” “Well, hearts and buttons don’t break into gift tables. Or keep eight-year-olds from stomping your feet.” “That’s why we don’t rush. We mingle—ingratiate, compliment the cake, endure a round of musical chairs, you know.” “You can’t sit still for twelve seconds. And you hate children. If any so much as sneezes on you, you’ll faint.” “I’ll wear closed shoes. And the hat. That’ll keep my dignity.” “Is that the Peppa Pig one or the one that says ‘Over the Hill’?” “The latter. Children like irony. And I, personally, enjoy a hat with gravitas.” “I don’t think they’ll notice your gravitas when you’re hauling a sack of presents out the door.” “Molly, that’s your department. While you’re chatting up the other ‘aunties,’ I’ll sneak over to the gift table, do some sleight of hand.” “Sleight of hand? Bill, unless by ‘sleight’ you mean ‘drop the sack, knock over the punch, and run screaming,’ because that’s your usual MO.” “I’ve practiced. I watched three videos online. One said it was easy to disappear in confetti clouds.” “And where, pray tell, are you getting the confetti clouds?” “That’s what the piñata’s for. We time it perfectly, the smashing, the confetti, the sugar high, the chaos. Presents—pow!—in the bag, out the window.” “I see one flaw. Two, if we count the little girl who guards her presents like a dragon.” “I have a solution. I distract her with a harmonica recital. Kids love music.” “You can’t play harmonica.” “She doesn’t know that. She’s eight! I’ll tell her it’s European jazz.” “This is doomed.” “We’ve done worse.” “Not at the mercy of a cake-fuelled mob of first graders. Or helium. I still think we use the balloons—put presents inside, float them out the door.” “Molly, the law of physics—” “Don’t argue science with me. Fine. You distract, I do a sweep, we converge at the gift table, I create a diversion—maybe ask to take a family photo. While everyone says cheese, you grab the loot.” “Simple, classic, not reliant on harmonica. That’s why you’re the brains.” “Because you’re barely the fingers.” “Ouch. Just for that, you’re wearing the ‘World’s Okayest Relative’ badge.” “Fine. But next time, we knock over a grownup’s party. Or at least one with less glitter.” “Deal. Now, pass me the auntie wig. And Bill—try to look, you know, innocent.” “I invented innocent. Just not the kind that stands trial.” “No more jokes. Let’s go, before someone else pinches the cake.” “If anyone gets the cake, it’s me.” “That’s the spirit. Now hurry up, boyfriend. We’ve got presents to lift and dignity to lose.”
id
"gen-1744656209-NOjZcqy4UT31IU9f34DL"
upstream_id
"chatcmpl-BMJ4ECjwiXzMXjvOfC5a4VUyFMU33"
total_cost
0.006342
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:43:38.887169+00:00"
Monday, April 14th, 2025 at 6:43:38 PM GMT+00:00
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
742
moderation_latency
105
generation_time
7704
tokens_prompt
68
tokens_completion
775
native_tokens_prompt
67
native_tokens_completion
776
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.006342
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
541
0.0000%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 30.00%
neededClean
false
wordsTotal
541
wordsDialogue
541
0.0000%