NC Bench
Overview
About
Tests
Models
Model
GPT-4.1
Test
Dialogue tags
Scenario
Write 200 words with 90% dialogue
Run ID
2025-04-14-01JRTSR8WN0R8917FXF9C264B7
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Bill, do you still have the clown costume? The shoes are missing, but yes. Missing shoes? We can’t rob five-year-olds barefoot. Not barefoot, Molly, just own shoes. Kids won’t care. Fine, but stay in character. No sad clowns. When have I ever been a sad clown? Your birthday, last year. Never mind. Do you have the list? List of what? Demands? No! The presents, Bill! Relax, I have photographic memory. Frozen Legos, Paw Patrol plushies, remote control drones. Don’t forget the slime-making kit. Jenny’s allergic to glitter, right? Good catch, skip the glitter slime. You’re sure you can fit under the bouncy castle? Bill, I slid under a cop car in Milan. I can fit beneath a castle made of plastic. No snacking on cake during the job. Don’t tempt me. Remember, distraction first. I’ll drop the balloon animal explosion. Six dozen balloons, all at once? Genius. Thank you. While they’re screaming over rubber octopuses, you grab the gifts by the Pin the Tail station. What if a toddler attacks? Deploy emergency lollipop, as always. And the exit? Scooter behind the pony pen. Quick getaway, no animal cruelty. You thought of everything. Molly, it’s beautiful. Focus, Bill. This is our most dangerous job yet. Let’s go rob some children. With style.
id
"gen-1744656147-2kiPImjmaYlYQuYyFhaF"
upstream_id
"chatcmpl-BMJ3DSW3u7kiQwTkM8RyZ10k2Ko7E"
total_cost
0.002518
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:42:33.12932+00:00"
Monday, April 14th, 2025 at 6:42:33 PM GMT+00:00
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
352
moderation_latency
89
generation_time
4317
tokens_prompt
68
tokens_completion
297
native_tokens_prompt
67
native_tokens_completion
298
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.002518
Evaluation details
Result
Evaluator
Details
Meta Data
68.1022%
Matches word count
n/a
neededClean
false
words
214
0.0000%
Dialogue to Total Word Ratio
Ratio: 0.00%, Deviation: 90.00%
neededClean
false
wordsTotal
214
wordsDialogue
0
34.0511%