NC Bench
Overview
About
Tests
Models
Model
GPT-4o, May 13th (temp=1)
Test
Dialogue tags
Scenario
Write 200 words with 90% dialogue
Run ID
2024-07-19-01J35ST6EQYQEZTR10AG9TW5D7
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Bill, you got the map?” “Got it right here, Molly. Back door's unguarded at 3 p.m.” “Excellent. What about the distraction?” “Clown’s ready. He’s got a water squirter and everything.” “Perfect. Now, the kids?” “All sugared up by 2:30. They won’t know what hit them.” “You sure about this, Bill? Stealing from a children’s party?” “Ah, come on, Molly. Ever heard of Robin Hood?” “Sure, but he didn’t exactly steal toys.” “Well, these are high-end, top-dollar toys. We could sell ‘em off, make a fortune.” “And how do we get out?” “Simple. Crouch down, blend in with the crab walk race.” “Genius. But what if we get caught?” “We won’t. I’ve memorized all the escape routes. And remember, we’ve got our backup plan.” “The piñata offer still stands?” “Filled with smoke bombs and glitter. Kids will love it.” “Alright, Bill. What about the parents?” “Busy with party games. Plus, I've scheduled a balloon animal show at the same time.” “This is starting to sound foolproof.” “Because it is. Just stick to the plan, Molly.” “And you got the costumes?” “Matching dinosaur onesies. No one will suspect a thing.” “Great. Let’s make history, partner.” “High-fives all around then. Tomorrow, 2:45. Don't be late.” “I’ll bring the cake knife.” “For the cake, right?” “Sure, Bill. Sure.”
id
"gen-8gS76lIEGKGRwi8N2870oQmAaMAR"
total_cost
0.005375
upstream_id
"chatcmpl-9mjqH2u0XSDJ1LQCm3tM0udXJyJ9x"
created_at
"2024-07-19T15:29:57.768285+00:00"
Friday, July 19th, 2024 at 3:29:57 PM GMT+00:00
model
"openai/gpt-4o-2024-05-13"
app_id
182717
streamed
false
cancelled
false
provider_name
"OpenAI"
latency
8263
moderation_latency
172
generation_time
0
finish_reason
"stop"
tokens_prompt
68
tokens_completion
336
native_tokens_prompt
67
native_tokens_completion
336
num_media_prompt
(null)
num_media_completion
(null)
origin
"https://ncbench.com/"
usage
0.005375
Evaluation details
Result
Evaluator
Details
Meta Data
51.9255%
Matches word count
n/a
neededClean
false
words
216
36.7879%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
216
wordsDialogue
216
44.3567%