NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2024-07-19-01J35SZR4288XTWDDGVMVES8YD
Run Details
Model:
GPT-4o, May 13th (temp=1)
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
8.9s
Cost:
$0.004895
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
98.71%
Matches word count
(details)
36.79%
Dialogue to Total Word Ratio
(details)
67.7502%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Alright, Molly, picture this: a toddler birthday bash at Chuck E. Cheese. Who's gonna suspect two adults?" "Makes sense, Bill. But how do we blend in? We don't exactly scream 'party guests.'" "We need disguises, obviously. How about we go as clowns?" "Clowns? Seriously? You want me to juggle?" "No, no juggling! We just need to look festive. Puffy suits, red noses, big shoes—the whole bit." "Fine. And once we're in, what's the plan?" "Like a charm. You distract the kids with your balloon animal skills—" "I don't know how to make balloon animals, Bill!" "How hard could it be? Twist them into shapes, make 'em look, uh, animal-ish." "You're really simplifying this. What's next?" "I'll sneak over to the present pile, slip 'em into this giant jack-in-the-box. No one suspects a jack-in-the-box, right?" "Right. And then we roll out?" "Exactamente. Quick and clean. We'll let the jack-in-the-box 'pop' and then leave. They'll think it's all part of the show." "And if the kids don’t buy it?" "We'll create a diversion. You can squirt them with your flower." "I swear, Bill, this better work." "Trust me, Molly. By tonight, we'll have more toys than Santa Claus." "Hope you’re right. I’ve always wanted a toy train.”
id
"gen-o2FThSEQ1WG9fxN8SOM7C2UAQA2w"
total_cost
0.004895
upstream_id
"chatcmpl-9mjtDY3lsNurlyAiBPm4cpllC0bJw"
created_at
"2024-07-19T15:33:00.047788+00:00"
7/19/2024, 3:33:00 PM
model
"openai/gpt-4o-2024-05-13"
app_id
182717
streamed
false
cancelled
false
provider_name
"OpenAI"
latency
8411
moderation_latency
380
generation_time
0
finish_reason
"stop"
tokens_prompt
68
tokens_completion
313
native_tokens_prompt
67
native_tokens_completion
304
num_media_prompt
(null)
num_media_completion
(null)
origin
"https://ncbench.com/"
usage
0.004895
Evaluation details
Result
Evaluator
Details
Meta Data
98.71%
Matches word count
n/a
neededClean
false
words
206
36.79%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
216
wordsDialogue
216
67.7502%