NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2025-04-14-01JRTSY5T6AVX7GC7YEFY74YSA
Run Details
Model:
GPT-4.1
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
3.8s
Cost:
$0.002502
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
99.38%
Matches word count
(details)
36.79%
Dialogue to Total Word Ratio
(details)
68.0824%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“You ready for this, Bill?” “As ready as I’ll ever be to rob a gaggle of sugar-high seven-year-olds, Molly.” “Stop thinking of them as children. Think of them as short, sticky security guards.” “They’re unpredictable, Molly. Last time, one of them bit me. Drew blood.” “You wore a balloon hat, Bill. You stood out.” “It was for camouflage! I blended in.” “With the clown, maybe. This time, subtlety. We take out the magician, grab the presents, and escape.” “The magician has a dove, Molly. What if it attacks?” “You’re six feet two, Bill. Don’t let a dove ruin our criminal career.” “I’m not scared. Birds freak me out. Anyway, what’s the plan?” “We show up with the party planners, act professional, distract with face paint. While I turn little Timmy into Batman, you slip into present territory.” “And the piñata?” “What about it?” “Kids are crazy near piñatas. It’s chaos. Good cover.” “Fine, we’ll time it with the piñata smash. Two minutes tops.” “Did you practice your balloon animals?” “No. Did you?” “I can make a worm. That’s it.” “Perfect. Everyone loves worms.” “We get caught, you blame the clown.” “I always do, Bill. Ready?” “Let’s steal some teddy bears, Molly.” “You read my mind.”
id
"gen-1744656341-1JqsdAJnTIc26K8MlA8d"
upstream_id
"chatcmpl-BMJ6LYLvXf9lqYo16RFxsZ8GcgPMn"
total_cost
0.002502
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:45:45.037157+00:00"
4/14/2025, 6:45:45 PM
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
474
moderation_latency
134
generation_time
2578
tokens_prompt
68
tokens_completion
295
native_tokens_prompt
67
native_tokens_completion
296
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.002502
Evaluation details
Result
Evaluator
Details
Meta Data
99.38%
Matches word count
n/a
neededClean
false
words
205
36.79%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
208
wordsDialogue
208
68.0824%