Run: 2024-07-19-01J35SH5TXCY69CHP0NDN40454

Run Details

Model:: GPT-4o, May 13th (temp=1)
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 6.4s
Cost:: $0.005270
Provider:: OpenAI

Evaluation summary

Score	Evaluator	Details Link
9.61%	Matches word count	(details)
36.79%	Dialogue to Total Word Ratio	(details)
23.1980%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

"Bill, have you got the map?"

"Of course, Molly! Who do you think I am? A rookie?"

"Just checking. So, the party starts at 2."

"And we sneak in at 4, right? Right in the middle of the cake chaos."

"Exactly. Now, remember, the clown is our distraction. Kids love clowns."

"Yeah, but Jack as a clown? That's a stretch."

"Hey, the nose is big enough. Focus, Bill."

"Okay, okay. What's next?"

"Once the parents are busy with the cake, you grab the presents."

"And you?"

"I'll handle the piñata situation. Can't have kids whacking me while I'm bagging loot."

"Good point. But what if someone sees us?"

"That's why we’re dressing as superheroes, Bill. Who’d suspect a superhero of stealing?"

"But Molly, how do we escape with a giant sack of presents?"

"That's where the 'tragic birthday candle accident' comes in."

"Oh! Right, the smoke! Brilliant."

"See, I think of everything. Now, the exit?"

"Through the back gate. Easy peasy."

"And remember, no taking candy from the goody bags. We’re professionals."

"Got it. No candy. But can I keep a balloon?"

"Only if it’s not shaped like a dog."

"Aw, you’re no fun."

"Just focus, Bill. We're in for the haul of a lifetime."

"Roger that, Molly. Let’s make these kids cry."

"No, let's make these kids… share."

“Works for me!"

Evaluation details

Result

Evaluator

Details

Meta Data

9.61%

Matches word count

n/a

neededClean	false
words	222

36.79%

Dialogue to Total Word Ratio

Ratio: 100.00%, Deviation: 10.00%

neededClean	false
wordsTotal	222
wordsDialogue	222

23.1980%

id	"gen-1gdxYOByi5bQOgbQVPMirH26sOxn"
total_cost	0.00527
upstream_id	"chatcmpl-9mjlVvU61lga0gMkSzhCwNsv73xUq"
created_at	"2024-07-19T15:24:59.981165+00:00" 7/19/2024, 3:24:59 PM
model	"openai/gpt-4o-2024-05-13"
app_id	182717
streamed	false
cancelled	false
provider_name	"OpenAI"
latency	6013
moderation_latency	259
generation_time	0
finish_reason	"stop"
tokens_prompt	68
tokens_completion	332
native_tokens_prompt	67
native_tokens_completion	329
num_media_prompt	(null)
num_media_completion	(null)
origin	"https://ncbench.com/"
usage	0.00527