NC Bench
Overview
About
Tests
Models
Model
GPT-4o, May 13th (temp=1)
Test
Dialogue tags
Scenario
Write 200 words with 90% dialogue
Run ID
2024-07-19-01J35SH5TXCY69CHP0NDN40454
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Bill, have you got the map?" "Of course, Molly! Who do you think I am? A rookie?" "Just checking. So, the party starts at 2." "And we sneak in at 4, right? Right in the middle of the cake chaos." "Exactly. Now, remember, the clown is our distraction. Kids love clowns." "Yeah, but Jack as a clown? That's a stretch." "Hey, the nose is big enough. Focus, Bill." "Okay, okay. What's next?" "Once the parents are busy with the cake, you grab the presents." "And you?" "I'll handle the piñata situation. Can't have kids whacking me while I'm bagging loot." "Good point. But what if someone sees us?" "That's why we’re dressing as superheroes, Bill. Who’d suspect a superhero of stealing?" "But Molly, how do we escape with a giant sack of presents?" "That's where the 'tragic birthday candle accident' comes in." "Oh! Right, the smoke! Brilliant." "See, I think of everything. Now, the exit?" "Through the back gate. Easy peasy." "And remember, no taking candy from the goody bags. We’re professionals." "Got it. No candy. But can I keep a balloon?" "Only if it’s not shaped like a dog." "Aw, you’re no fun." "Just focus, Bill. We're in for the haul of a lifetime." "Roger that, Molly. Let’s make these kids cry." "No, let's make these kids… share." “Works for me!"
id
"gen-1gdxYOByi5bQOgbQVPMirH26sOxn"
total_cost
0.00527
upstream_id
"chatcmpl-9mjlVvU61lga0gMkSzhCwNsv73xUq"
created_at
"2024-07-19T15:24:59.981165+00:00"
Friday, July 19th, 2024 at 3:24:59 PM GMT+00:00
model
"openai/gpt-4o-2024-05-13"
app_id
182717
streamed
false
cancelled
false
provider_name
"OpenAI"
latency
6013
moderation_latency
259
generation_time
0
finish_reason
"stop"
tokens_prompt
68
tokens_completion
332
native_tokens_prompt
67
native_tokens_completion
329
num_media_prompt
(null)
num_media_completion
(null)
origin
"https://ncbench.com/"
usage
0.00527
Evaluation details
Result
Evaluator
Details
Meta Data
9.6081%
Matches word count
n/a
neededClean
false
words
222
36.7879%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
222
wordsDialogue
222
23.1980%