NC Bench
Overview
About
Tests
Models
Model
GPT-4o, May 13th (temp=1)
Test
Dialogue tags
Scenario
Write 200 words with 90% dialogue
Run ID
2024-07-19-01J35SZR4288XTWDDGVMVES8YD
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Alright, Molly, picture this: a toddler birthday bash at Chuck E. Cheese. Who's gonna suspect two adults?" "Makes sense, Bill. But how do we blend in? We don't exactly scream 'party guests.'" "We need disguises, obviously. How about we go as clowns?" "Clowns? Seriously? You want me to juggle?" "No, no juggling! We just need to look festive. Puffy suits, red noses, big shoes—the whole bit." "Fine. And once we're in, what's the plan?" "Like a charm. You distract the kids with your balloon animal skills—" "I don't know how to make balloon animals, Bill!" "How hard could it be? Twist them into shapes, make 'em look, uh, animal-ish." "You're really simplifying this. What's next?" "I'll sneak over to the present pile, slip 'em into this giant jack-in-the-box. No one suspects a jack-in-the-box, right?" "Right. And then we roll out?" "Exactamente. Quick and clean. We'll let the jack-in-the-box 'pop' and then leave. They'll think it's all part of the show." "And if the kids don’t buy it?" "We'll create a diversion. You can squirt them with your flower." "I swear, Bill, this better work." "Trust me, Molly. By tonight, we'll have more toys than Santa Claus." "Hope you’re right. I’ve always wanted a toy train.”
id
"gen-o2FThSEQ1WG9fxN8SOM7C2UAQA2w"
total_cost
0.004895
upstream_id
"chatcmpl-9mjtDY3lsNurlyAiBPm4cpllC0bJw"
created_at
"2024-07-19T15:33:00.047788+00:00"
Friday, July 19th, 2024 at 3:33:00 PM GMT+00:00
model
"openai/gpt-4o-2024-05-13"
app_id
182717
streamed
false
cancelled
false
provider_name
"OpenAI"
latency
8411
moderation_latency
380
generation_time
0
finish_reason
"stop"
tokens_prompt
68
tokens_completion
313
native_tokens_prompt
67
native_tokens_completion
304
num_media_prompt
(null)
num_media_completion
(null)
origin
"https://ncbench.com/"
usage
0.004895
Evaluation details
Result
Evaluator
Details
Meta Data
51.9255%
Matches word count
n/a
neededClean
false
words
216
36.7879%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
216
wordsDialogue
216
44.3567%