Run: 2024-07-19-01J35SN8N473JWCMWSR03EQPV0

Run Details

Model:: GPT-4o, May 13th (temp=1)
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 12.3s
Cost:: $0.004910
Provider:: OpenAI

Evaluation summary

Score	Evaluator	Details Link
100.00%	Matches word count	(details)
36.79%	Dialogue to Total Word Ratio	(details)
68.3935%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

"Alright, Bill, what's the plan?"

"Simple, Molly. We slip in with the entertainers."

"Clowns or magicians?"

"Neither, Molly. Balloon artists."

"Balloon artists? Really? We're going to wear squeaky shoes?"

"It's genius, Molly. No one suspects balloon artists."

"Fine. How do we get in?"

"Easy. I rented us a real gig. We show up, make a few dogs, maybe an octopus."

"An octopus? Have you ever made a balloon octopus?"

"How hard can it be? It's balloons!"

"Right, and then what?"

"Then, the kids are distracted, and we go for the presents."

"And how do we carry them out, Bill? With balloon animals in hand?"

"Pillowcases. We act like Santa Claus but with more style."

"Makes sense. What about the parents?"

"I’ve got that covered. We hand them some ‘special’ balloons filled with... helium."

"Helium? You want the parents to sound like chipmunks?"

"Exactly, they laugh, we swipe."

"I can’t believe this, Bill. You actually think it’ll work?"

"Have some faith, Molly. What's the worst that can happen?"

"We get caught, and end up performing at prison talent shows."

"Okay, true, but it’s a foolproof plan."

"You're a madman, Bill. But I'm in."

"Good. Let’s practice those balloon dogs."

"And the octopus."

Evaluation details

Result

Evaluator

Details

Meta Data

100.00%

Matches word count

n/a

neededClean	false
words	201

36.79%

Dialogue to Total Word Ratio

Ratio: 100.00%, Deviation: 10.00%

neededClean	false
wordsTotal	201
wordsDialogue	201

68.3935%

id	"gen-ZAcFMjIVJguc4BlnGqgbpL3FwArq"
total_cost	0.00491
upstream_id	"chatcmpl-9mjnf8ROwVvaWeR4jXG8nKYkvWOPa"
created_at	"2024-07-19T15:27:19.87354+00:00" 7/19/2024, 3:27:19 PM
model	"openai/gpt-4o-2024-05-13"
app_id	182717
streamed	false
cancelled	false
provider_name	"OpenAI"
latency	11991
moderation_latency	206
generation_time	0
finish_reason	"stop"
tokens_prompt	68
tokens_completion	312
native_tokens_prompt	67
native_tokens_completion	305
num_media_prompt	(null)
num_media_completion	(null)
origin	"https://ncbench.com/"
usage	0.00491