NC Bench
Overview
About
Tests
Models
Model
GPT-4.1
Test
Dialogue tags
Scenario
Write 200 words with 90% dialogue
Run ID
2025-04-14-01JRTSY5T6AVX7GC7YEFY74YSA
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“You ready for this, Bill?” “As ready as I’ll ever be to rob a gaggle of sugar-high seven-year-olds, Molly.” “Stop thinking of them as children. Think of them as short, sticky security guards.” “They’re unpredictable, Molly. Last time, one of them bit me. Drew blood.” “You wore a balloon hat, Bill. You stood out.” “It was for camouflage! I blended in.” “With the clown, maybe. This time, subtlety. We take out the magician, grab the presents, and escape.” “The magician has a dove, Molly. What if it attacks?” “You’re six feet two, Bill. Don’t let a dove ruin our criminal career.” “I’m not scared. Birds freak me out. Anyway, what’s the plan?” “We show up with the party planners, act professional, distract with face paint. While I turn little Timmy into Batman, you slip into present territory.” “And the piñata?” “What about it?” “Kids are crazy near piñatas. It’s chaos. Good cover.” “Fine, we’ll time it with the piñata smash. Two minutes tops.” “Did you practice your balloon animals?” “No. Did you?” “I can make a worm. That’s it.” “Perfect. Everyone loves worms.” “We get caught, you blame the clown.” “I always do, Bill. Ready?” “Let’s steal some teddy bears, Molly.” “You read my mind.”
id
"gen-1744656341-1JqsdAJnTIc26K8MlA8d"
upstream_id
"chatcmpl-BMJ6LYLvXf9lqYo16RFxsZ8GcgPMn"
total_cost
0.002502
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:45:45.037157+00:00"
Monday, April 14th, 2025 at 6:45:45 PM GMT+00:00
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
474
moderation_latency
134
generation_time
2578
tokens_prompt
68
tokens_completion
295
native_tokens_prompt
67
native_tokens_completion
296
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.002502
Evaluation details
Result
Evaluator
Details
Meta Data
95.9868%
Matches word count
n/a
neededClean
false
words
208
36.7879%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
208
wordsDialogue
208
66.3873%