NC Bench
Overview
About
Tests
Models
Model
GPT-4.1
Test
Dialogue tags
Scenario
Write 500 words with 30% dialogue
Run ID
2025-04-14-01JRTSVX8ZHD4106Y32847M7MX
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge, squinting through leaves at the bouncing castle festooned in more shades of pink than she’d thought possible. “Look at these amateurs, Bill. Balloons tied to the mailbox. Have they no situational awareness?” Bill, burly and dressed all in black, tried to hunch down too, but the wild array of glitter stuck to his knees told anyone looking exactly where he had been. “This is a birthday party, Molls, not Fort Knox. Kids just want cake and chaos.” “I want presents.” She licked her lips, eyeing the growing pile beside a papier-mâché unicorn. “You see that haul?” He grinned. “Why d’you think I wore my extra-big duffel?” He patted it with a gusty sigh. “Alright, so what’s the approach? You know I’m useless with children.” “You said you once hid in a ball pit all night.” “Terrible experience. Still get nightmares about static shocks.” Molly rolled her eyes. “Simple plan. We wait until the musicians start ‘Baby Shark.’ That’ll draw every ankle-biter to the dance floor. We slip in by the side, wearing hats. Holiday hats—balloons, streamers, the works. Look like we belong. You grab the big boxes, I’ll handle the small ones. In and out.” “Holiday hats?” Bill reached for his own beanie. “You do realize I don’t fit inside a bouncy castle. Or a magician’s hat. You want a distraction?” “I want precision. If you make a scene, you’ll get frosting on my hoodie again.” “Once! That happened once. It washes out.” He allowed, “What if someone asks who we are? Adults tend to recognize strangers at shindigs, Molly.” She flashed a toothy smile. “We’re professional face painters. Every party needs face painters.” “I can’t draw.” “Just draw anything,” she said, “they’re five years old. Give them a mustache and they’ll think it’s a rainbow.” Bill grunted, unconvinced. “Okay, but if someone requests a tiger, I’m painting a potato.” “Potato it is.” She unzipped a tote bag and produced two garish vests bedazzled with plastic jewels. “Uniforms. For authenticity.” “Oh, good, we’ll blend right in with the circus.” She smacked his arm. “It’s called *commitment*. Now, on my three. Ready your best clown smile.” Bill bared his teeth in what might have been a friendly grimace. “One toothy grin, coming right up.” They waited, breathless, as inside, the ‘Baby Shark’ theme began its warbling call. Instantly, the children stampeded toward the makeshift dance floor. Molly rose, gathering her supplies, and jabbed at Bill’s arm. “Go. And remember: face painters do not run.” He waddled forward, clutching the duffel. “If I get glitter in my beard again, you’re buying the shampoo.” Inside, chaos reigned. Parents gossiped by the sandwiches. Children spun, shrieking to the music. Molly beamed at a suspicious mother. “Face painting! Would your lovely child like to be...um...a dragon?” The mother blinked. “Didn’t hire you.” Molly winked. “We’re freelance. First-come, first-served.” She heard Bill hiss, “Grab the gifts, Molly! I’m running out of small talk!” “Making magic, Bill! Blend in, smile big. And next time—learn to paint a tiger!”
id
"gen-1744656267-MEBWSfeOsMS4nX40dKtM"
upstream_id
"chatcmpl-BMJ59WjXjz8GlqNDbelqB846K7mQD"
total_cost
0.006254
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:44:35.238875+00:00"
Monday, April 14th, 2025 at 6:44:35 PM GMT+00:00
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
331
moderation_latency
105
generation_time
7412
tokens_prompt
68
tokens_completion
764
native_tokens_prompt
67
native_tokens_completion
765
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.006254
Evaluation details
Result
Evaluator
Details
Meta Data
81.2727%
Matches word count
n/a
neededClean
false
words
512
0.0000%
Dialogue to Total Word Ratio
Ratio: 63.67%, Deviation: 33.67%
neededClean
false
wordsTotal
512
wordsDialogue
326
40.6364%