Run: 2024-08-06-01J4MXVP7XHBSBMHG62HWNHXWG

Run Details

Model:: GPT-4o, Aug. 6th (temp=1)
Test:: Dialogue tags
Scenario:: Write 200 words with 90% dialogue
Duration:: 4.1s
Cost:: $0.003567
Provider:: OpenAI

Evaluation summary

Score	Evaluator	Details Link
81.27%	Matches word count	(details)
36.79%	Dialogue to Total Word Ratio	(details)
59.0303%	Overall

Model input

User/Human

You are a author of comedic heist books.

Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents.

Only include the prose of the scene without any markdown or commentary.

Model response

"Alright, Molly, what's the plan?"  
"Simple, Bill. We blend in, act like clumsy clowns. Kids love clowns."  
"Clowns? Really? I hate clowns."  
"Yeah, but you look like one naturally."  
"Very funny. Okay, what's next?"  
"Distraction. Got a squeaky nose and spinny bow tie ready?"  
"Squeaky nose? Where’d you get this stuff?"  
"Online shopping has everything. Now, focus. We head to the gift table."  
"What about the cake? A heist ain’t a heist without cake."  
"The cake is the diversion. I’ll pretend to trip, faceplant right into it."  
"You know, I always thought you had good aim."  
"Just do your part, Bill. Sneak the presents out in the chaos."  
"How am I hauling twenty gift bags out?"  
"Stash them in giant balloons. Hide in plain sight!"  
"Okay, but what if a kid wants a balloon?"  
"Then you distract with balloon animals. Can you make a giraffe?"  
"Nope, but I can make… something that sort of looks like a snake?"  
"Perfect. Snakes are in right now."  
"You’re deluded. And what if adults get suspicious?"  
"Charm ‘em with your accordion skills."  
"You know I can’t play."  
"Exactly why it’s charming, Bill."  
"So masterfully terrible it’s genius."  
"Just wait till they see our grand escape."  
"What's that, a rocket-powered pogo stick?"  
"You know me so well."

Evaluation details

Result

Evaluator

Details

Meta Data

81.27%

Matches word count

n/a

neededClean	false
words	212

36.79%

Dialogue to Total Word Ratio

Ratio: 100.00%, Deviation: 10.00%

neededClean	false
wordsTotal	213
wordsDialogue	213

59.0303%

id	"gen-04KWJgeWALWOaYoz2yObyrFyMlAS"
total_cost	0.0035675
upstream_id	"chatcmpl-9tND9BUZrkMSO59dVZiKLXuq1ABR6"
created_at	"2024-08-06T22:44:54.763219+00:00" 8/6/2024, 10:44:54 PM
model	"openai/gpt-4o-2024-08-06"
app_id	182717
streamed	false
cancelled	false
provider_name	"OpenAI"
latency	3671
moderation_latency	286
generation_time	0
finish_reason	"stop"
tokens_prompt	68
tokens_completion	339
native_tokens_prompt	67
native_tokens_completion	340
num_media_prompt	(null)
num_media_completion	(null)
origin	"https://ncbench.com/"
usage	0.0035675