NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2025-04-14-01JRTSZK7VFQB4W6VRR32AYNGM
Run Details
Model:
GPT-4.1
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
9.3s
Cost:
$0.005758
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
98.71%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
49.3562%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly peered through the minivan’s steamed-up window. “Bill, you see that clown? The one juggling plastic bowling pins?” Bill slouched lower in his seat. “Yeah, Molly, how could I miss him? He’s literally wearing a shirt with his own face on it.” “I’m telling you, that’s our ticket in. Honestly, who’s gonna question guests wearing outsized rainbow wigs at a seven-year-old’s party?” Bill scratched behind his ear. “Last time I wore a wig, it caught on my velcro watch band and the whole damn thing went flying. I almost took out the magician’s parrot.” “Which is why we go simple. No top hats. No accessories. Just blend in. Flash some cheesy grins and poof, we’re party regulars.” “Any idea how many presents we’re talking?” Molly craned her neck. “Counting the table, the stack by the bouncy castle, and the suspiciously large box next to the ball pit…I'd say at least twenty-five. Maybe thirty if those striped bags aren’t full of socks again.” “Last job, you said, ‘Bill, there’s gold inside!’ Turned out to be edible glitter and a kazoo.” “No mistakes this time. I got the guest list memorized. All we have to do is answer to ‘Auntie Mo’ and ‘Uncle Phil.’ Easy. It’s a big family.” “What if they ask me where I work? Last time I said I was a clown dentist and the real clown’s mother grilled me for twenty minutes.” “You’re overthinking again. Just say, ‘I build miniature railroads for orphans.’ If they ask questions, start crying about your difficult past.” Bill squinted. “What about the cake table? That woman looks like she benches strollers for fun.” “Distraction,” Molly whispered, snapping her fingers. “One of us, preferably you, yells ‘Pinata time!’ Everyone rushes outside, you swing at an empty box, and I do my thing.” “So I’m the decoy, again. While you bag the loot?” “Bill, people believe you when you say weird things. I still don’t know how you convinced a roomful of five-year-olds you could talk to hamsters.” “Thanks, I guess. Anyway, how do we smuggle out a mountain of gift bags? They’ll notice if we waddle off stuffed full of Paw Patrol puzzles.” “Simple, we stash them under the dessert cart. I’ll distract the cake lady while you slip the presents onto the lower rack. Then we wheel it out, say we’re ‘just getting more icing.’” “We? You mean me pushing and you waving like royalty?” “Teamwork! Okay, get in character. Remember, you’re jolly. I’m mysterious. We’re not nervous adults in poorly fitting polyester.” He took a deep breath. “So, Auntie Mo and Uncle Phil, lovers of children, givers of joy, secret snatchers of plush toys.” “Exactly.” Molly straightened her wig. “Ready?” “About as ready as I’ll ever be to take on a room full of sugar-high seven-year-olds.” “Showtime.” They stepped out of the minivan, rehearsing their biggest, toothiest grins, and marched toward the party, humming “Happy Birthday” as if it might just hide their true intentions.
id
"gen-1744656387-A9CZxFPpbEo7zvkOtSey"
upstream_id
"chatcmpl-BMJ76jJU1UG28b1SPYhs2BUSTUfOU"
total_cost
0.005758
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:46:37.11951+00:00"
4/14/2025, 6:46:37 PM
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
707
moderation_latency
80
generation_time
7643
tokens_prompt
68
tokens_completion
702
native_tokens_prompt
67
native_tokens_completion
703
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.005758
Evaluation details
Result
Evaluator
Details
Meta Data
98.71%
Matches word count
n/a
neededClean
false
words
494
0.00%
Dialogue to Total Word Ratio
Ratio: 87.08%, Deviation: 17.08%
neededClean
false
wordsTotal
503
wordsDialogue
438
49.3562%