NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2024-07-19-01J35SH5TXCY69CHP0NDN40454
Run Details
Model:
GPT-4o, May 13th (temp=1)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
6.4s
Cost:
$0.005270
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
9.61%
Matches word count
(details)
36.79%
Dialogue to Total Word Ratio
(details)
23.1980%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Bill, have you got the map?" "Of course, Molly! Who do you think I am? A rookie?" "Just checking. So, the party starts at 2." "And we sneak in at 4, right? Right in the middle of the cake chaos." "Exactly. Now, remember, the clown is our distraction. Kids love clowns." "Yeah, but Jack as a clown? That's a stretch." "Hey, the nose is big enough. Focus, Bill." "Okay, okay. What's next?" "Once the parents are busy with the cake, you grab the presents." "And you?" "I'll handle the piñata situation. Can't have kids whacking me while I'm bagging loot." "Good point. But what if someone sees us?" "That's why we’re dressing as superheroes, Bill. Who’d suspect a superhero of stealing?" "But Molly, how do we escape with a giant sack of presents?" "That's where the 'tragic birthday candle accident' comes in." "Oh! Right, the smoke! Brilliant." "See, I think of everything. Now, the exit?" "Through the back gate. Easy peasy." "And remember, no taking candy from the goody bags. We’re professionals." "Got it. No candy. But can I keep a balloon?" "Only if it’s not shaped like a dog." "Aw, you’re no fun." "Just focus, Bill. We're in for the haul of a lifetime." "Roger that, Molly. Let’s make these kids cry." "No, let's make these kids… share." “Works for me!"
id
"gen-1gdxYOByi5bQOgbQVPMirH26sOxn"
total_cost
0.00527
upstream_id
"chatcmpl-9mjlVvU61lga0gMkSzhCwNsv73xUq"
created_at
"2024-07-19T15:24:59.981165+00:00"
7/19/2024, 3:24:59 PM
model
"openai/gpt-4o-2024-05-13"
app_id
182717
streamed
false
cancelled
false
provider_name
"OpenAI"
latency
6013
moderation_latency
259
generation_time
0
finish_reason
"stop"
tokens_prompt
68
tokens_completion
332
native_tokens_prompt
67
native_tokens_completion
329
num_media_prompt
(null)
num_media_completion
(null)
origin
"https://ncbench.com/"
usage
0.00527
Evaluation details
Result
Evaluator
Details
Meta Data
9.61%
Matches word count
n/a
neededClean
false
words
222
36.79%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
222
wordsDialogue
222
23.1980%